Skip to main content
Every analysis starts from an uploaded video. After upload completion, transcript, vision, and retake-removal commands are asynchronous: a start request returns quickly, and the result endpoint returns the current status plus any completed outputs.

1. Upload source video

Create an upload project:
curl -X POST "https://api.nanoclip.ai/v1/projects/upload" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "demo.mp4",
    "content_type": "video/mp4"
  }'
The response includes:
FieldDescription
project_idProject ID used for transcript and vision requests.
sourceCustomer-facing metadata for the uploaded video, such as filename, content type, and size when known.
upload_urlSigned URL that accepts the video bytes.
upload_headersHeaders to include when uploading.
upload_expires_in_secondsTime until the signed URL expires.
Upload the video bytes, then complete the upload:
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: video/mp4" \
  --data-binary "@demo.mp4"

curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/upload/complete" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"
Analysis commands return 409 Conflict until upload completion succeeds. The completed upload response includes source metadata so you can confirm which video the project points to.

2. Transcript

Start transcript analysis after the project has a completed source upload.
curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/transcript" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "en",
    "requested_outputs": ["text", "speakers", "utterances", "words"]
  }'
{
  "project_id": "proj_abc123",
  "analysis": "transcript",
  "status": "queued",
  "requested_outputs": ["text", "speakers", "utterances", "words"],
  "created_at": "2026-05-18T10:05:00Z"
}
Poll the transcript result endpoint:
curl "https://api.nanoclip.ai/v1/projects/proj_abc123/transcript" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"
Completed transcript responses include agent context for timestamp units, speaker IDs, and word index ranges.
{
  "project_id": "proj_abc123",
  "analysis": "transcript",
  "status": "completed",
  "language": "en",
  "agent_context": {
    "time_unit": "seconds",
    "time_origin": "start of source video",
    "speaker_ids": "stable within this transcript response",
    "word_indices": "word_start_idx is inclusive; word_end_idx is exclusive"
  },
  "text": "Welcome to the demo...",
  "words": [
    {
      "text": "Welcome",
      "start": 0.0,
      "end": 0.42,
      "speaker": 0,
      "confidence": 0.98
    }
  ],
  "utterances": [
    {
      "speaker": 0,
      "start": 0.0,
      "end": 2.1,
      "text": "Welcome to the demo...",
      "word_start_idx": 0,
      "word_end_idx": 5
    }
  ],
  "speakers": [
    {"speaker": 0, "total_duration": 12.4, "utterance_count": 3}
  ]
}

Transcript language

language is required. Use one of the supported base tags. Supported base tags:
ValueLanguage
bgBulgarian
csCzech
daDanish
deGerman
elGreek
enEnglish
esSpanish
etEstonian
fiFinnish
frFrench
heHebrew
hrCroatian
huHungarian
itItalian
ltLithuanian
lvLatvian
mtMaltese
nlDutch
plPolish
ptPortuguese
roRomanian
ruRussian
skSlovak
slSlovenian
svSwedish
ukUkrainian
Unsupported language tags return 422 Unprocessable Entity.

3. Vision

Start vision analysis with the outputs your app needs.
curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/vision" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "requested_outputs": ["faces", "face_tracks", "scenes"]
  }'
Poll the vision result endpoint:
curl "https://api.nanoclip.ai/v1/projects/proj_abc123/vision" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"
Completed vision responses include agent context, face observations over time, track summaries, and scene timestamps.
{
  "project_id": "proj_abc123",
  "analysis": "vision",
  "status": "completed",
  "error": null,
  "agent_context": {
    "coordinate_system": "normalized [x1, y1, x2, y2] in source video frame",
    "time_unit": "seconds",
    "scene_timestamp": "scene start time after t=0",
    "track_ids": "stable within this vision response"
  },
  "faces": [
    {
      "t": 0.0,
      "detections": [
        {
          "box": [0.2109, 0.2861, 0.2719, 0.4111],
          "score": 0.91,
          "track_id": 0,
          "cluster_id": 0
        }
      ]
    }
  ],
  "face_tracks": [
    {
      "track_id": 0,
      "cluster_id": 0,
      "frame_count": 13,
      "start_t": 0.0,
      "end_t": 2.4024,
      "median_box_area": 0.0067
    }
  ],
  "scenes": [
    {"timestamp": 2.6026}
  ]
}

4. Retake removal

Retake removal currently supports Hebrew (he). It analyzes transcript-like word timing and returns the word spans to remove plus keep intervals for a clean read.
curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/retake-removal" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "he"
  }'
{
  "project_id": "proj_abc123",
  "analysis": "retake_removal",
  "status": "queued",
  "created_at": "2026-06-08T10:05:00Z"
}
Poll the retake-removal result endpoint:
curl "https://api.nanoclip.ai/v1/projects/proj_abc123/retake-removal" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"
Completed responses include the original words, removal spans, and the keep intervals.
{
  "project_id": "proj_abc123",
  "status": "completed",
  "error": null,
  "words": [
    {
      "i": 0,
      "start": 0.0,
      "end": 0.42,
      "text": "שלום",
      "confidence": 0.98,
      "source": "transcript"
    }
  ],
  "remove_spans": [
    {
      "remove_word_start": 12,
      "remove_word_end": 18
    }
  ],
  "keep_intervals": [
    {
      "start": 0.0,
      "end": 7.2
    }
  ]
}

Requested outputs

AnalysisOutputDescription
TranscripttextCombined transcript text.
Transcriptagent_contextShort field notes for timing units, speaker IDs, and word index ranges.
TranscriptwordsWord-level transcript entries with text, start, end, speaker, and optional confidence.
TranscriptutterancesSpeaker turns with speaker, start, end, text, word_start_idx, and word_end_idx.
TranscriptspeakersSpeaker summaries with speaker, total_duration, and utterance_count.
VisionfacesIndividual detected face observations.
Visionagent_contextShort field notes for coordinates, timing, scene timestamps, and track IDs.
Visionface_tracksFace observations grouped across time.
VisionscenesScene-level visual segments.
Retake removalwordsWord-level inputs used by the retake-removal model.
Retake removalremove_spansWord index ranges that should be removed.
Retake removalkeep_intervalsTime ranges to keep after removing retakes.

Polling pattern

Poll until status is completed or failed.
while true; do
  response=$(curl -s "https://api.nanoclip.ai/v1/projects/proj_abc123/transcript" \
    -H "Authorization: Bearer $NANOCLIP_API_KEY")

  echo "$response"
  status=$(echo "$response" | jq -r '.status')

  if [ "$status" = "completed" ] || [ "$status" = "failed" ]; then
    break
  fi

  sleep 5
done
If an analysis is already queued, running, processing, or completed, starting the same analysis again returns the existing analysis response.