Analysis jobs - nano clip API

Every analysis starts from an uploaded video. After upload completion, transcript, vision, and retake-removal commands are asynchronous: a start request returns quickly, and the result endpoint returns the current status plus any completed outputs.

1. Upload source video

Create an upload project:

curl -X POST "https://api.nanoclip.ai/v1/projects/upload" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "demo.mp4",
    "content_type": "video/mp4"
  }'

The response includes:

Field	Description
`project_id`	Project ID used for transcript and vision requests.
`source`	Customer-facing metadata for the uploaded video, such as filename, content type, and size when known.
`upload_url`	Signed URL that accepts the video bytes.
`upload_headers`	Headers to include when uploading.
`upload_expires_in_seconds`	Time until the signed URL expires.

Upload the video bytes, then complete the upload:

curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: video/mp4" \
  --data-binary "@demo.mp4"

curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/upload/complete" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"

Analysis commands return 409 Conflict until upload completion succeeds. The completed upload response includes source metadata so you can confirm which video the project points to.

2. Transcript

Start transcript analysis after the project has a completed source upload.

curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/transcript" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "en",
    "requested_outputs": ["text", "speakers", "utterances", "words"]
  }'

{
  "project_id": "proj_abc123",
  "analysis": "transcript",
  "status": "queued",
  "requested_outputs": ["text", "speakers", "utterances", "words"],
  "created_at": "2026-05-18T10:05:00Z"
}

Poll the transcript result endpoint:

curl "https://api.nanoclip.ai/v1/projects/proj_abc123/transcript" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"

Completed transcript responses include agent context for timestamp units, speaker IDs, and word index ranges.

{
  "project_id": "proj_abc123",
  "analysis": "transcript",
  "status": "completed",
  "language": "en",
  "agent_context": {
    "time_unit": "seconds",
    "time_origin": "start of source video",
    "speaker_ids": "stable within this transcript response",
    "word_indices": "word_start_idx is inclusive; word_end_idx is exclusive"
  },
  "text": "Welcome to the demo...",
  "words": [
    {
      "text": "Welcome",
      "start": 0.0,
      "end": 0.42,
      "speaker": 0,
      "confidence": 0.98
    }
  ],
  "utterances": [
    {
      "speaker": 0,
      "start": 0.0,
      "end": 2.1,
      "text": "Welcome to the demo...",
      "word_start_idx": 0,
      "word_end_idx": 5
    }
  ],
  "speakers": [
    {"speaker": 0, "total_duration": 12.4, "utterance_count": 3}
  ]
}

Transcript language

language is required. Use one of the supported base tags. Supported base tags:

Value	Language
`bg`	Bulgarian
`cs`	Czech
`da`	Danish
`de`	German
`el`	Greek
`en`	English
`es`	Spanish
`et`	Estonian
`fi`	Finnish
`fr`	French
`he`	Hebrew
`hr`	Croatian
`hu`	Hungarian
`it`	Italian
`lt`	Lithuanian
`lv`	Latvian
`mt`	Maltese
`nl`	Dutch
`pl`	Polish
`pt`	Portuguese
`ro`	Romanian
`ru`	Russian
`sk`	Slovak
`sl`	Slovenian
`sv`	Swedish
`uk`	Ukrainian

Unsupported language tags return 422 Unprocessable Entity.

3. Vision

Start vision analysis with the outputs your app needs.

curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/vision" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "requested_outputs": ["faces", "face_tracks", "scenes"]
  }'

Poll the vision result endpoint:

curl "https://api.nanoclip.ai/v1/projects/proj_abc123/vision" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"

Completed vision responses include agent context, face observations over time, track summaries, and scene timestamps.

{
  "project_id": "proj_abc123",
  "analysis": "vision",
  "status": "completed",
  "error": null,
  "agent_context": {
    "coordinate_system": "normalized [x1, y1, x2, y2] in source video frame",
    "time_unit": "seconds",
    "scene_timestamp": "scene start time after t=0",
    "track_ids": "stable within this vision response"
  },
  "faces": [
    {
      "t": 0.0,
      "detections": [
        {
          "box": [0.2109, 0.2861, 0.2719, 0.4111],
          "score": 0.91,
          "track_id": 0,
          "cluster_id": 0
        }
      ]
    }
  ],
  "face_tracks": [
    {
      "track_id": 0,
      "cluster_id": 0,
      "frame_count": 13,
      "start_t": 0.0,
      "end_t": 2.4024,
      "median_box_area": 0.0067
    }
  ],
  "scenes": [
    {"timestamp": 2.6026}
  ]
}

4. Retake removal

Retake removal currently supports Hebrew (he). It analyzes transcript-like word timing and returns the word spans to remove plus keep intervals for a clean read.

curl -X POST "https://api.nanoclip.ai/v1/projects/proj_abc123/retake-removal" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "he"
  }'

{
  "project_id": "proj_abc123",
  "analysis": "retake_removal",
  "status": "queued",
  "created_at": "2026-06-08T10:05:00Z"
}

Poll the retake-removal result endpoint:

curl "https://api.nanoclip.ai/v1/projects/proj_abc123/retake-removal" \
  -H "Authorization: Bearer $NANOCLIP_API_KEY"

Completed responses include the original words, removal spans, and the keep intervals.

{
  "project_id": "proj_abc123",
  "status": "completed",
  "error": null,
  "words": [
    {
      "i": 0,
      "start": 0.0,
      "end": 0.42,
      "text": "שלום",
      "confidence": 0.98,
      "source": "transcript"
    }
  ],
  "remove_spans": [
    {
      "remove_word_start": 12,
      "remove_word_end": 18
    }
  ],
  "keep_intervals": [
    {
      "start": 0.0,
      "end": 7.2
    }
  ]
}

Requested outputs

Analysis	Output	Description
Transcript	`text`	Combined transcript text.
Transcript	`agent_context`	Short field notes for timing units, speaker IDs, and word index ranges.
Transcript	`words`	Word-level transcript entries with `text`, `start`, `end`, `speaker`, and optional `confidence`.
Transcript	`utterances`	Speaker turns with `speaker`, `start`, `end`, `text`, `word_start_idx`, and `word_end_idx`.
Transcript	`speakers`	Speaker summaries with `speaker`, `total_duration`, and `utterance_count`.
Vision	`faces`	Individual detected face observations.
Vision	`agent_context`	Short field notes for coordinates, timing, scene timestamps, and track IDs.
Vision	`face_tracks`	Face observations grouped across time.
Vision	`scenes`	Scene-level visual segments.
Retake removal	`words`	Word-level inputs used by the retake-removal model.
Retake removal	`remove_spans`	Word index ranges that should be removed.
Retake removal	`keep_intervals`	Time ranges to keep after removing retakes.

Polling pattern

Poll until status is completed or failed.

while true; do
  response=$(curl -s "https://api.nanoclip.ai/v1/projects/proj_abc123/transcript" \
    -H "Authorization: Bearer $NANOCLIP_API_KEY")

  echo "$response"
  status=$(echo "$response" | jq -r '.status')

  if [ "$status" = "completed" ] || [ "$status" = "failed" ]; then
    break
  fi

  sleep 5
done

If an analysis is already queued, running, processing, or completed, starting the same analysis again returns the existing analysis response.

​1. Upload source video

​2. Transcript

​Transcript language

​3. Vision

​4. Retake removal

​Requested outputs

​Polling pattern