Skip to main content
Put source.mp4 in the current folder, then paste this into Claude Code.
Turn source.mp4 in this folder into a 1080x1920, 30fps, about 15s vertical clip. Output: one MP4.

Style: one short line before each section describing what you are doing, and one short line after describing what you produced. No walls of text.

Preflight: stop if missing:
- hyperframes skill not available: ask the user to install it from https://github.com/heygen-com/hyperframes and stop.
- NANOCLIP_API_KEY not set: ask the user for it (rk_live_... or rk_test_...) and stop.

1. nano clip API

Before: "Uploading and queuing transcript + vision, usually 30-90s."

Run with base https://api.nanoclip.ai/v1 and Bearer $NANOCLIP_API_KEY:
  POST /projects/upload {"filename":"source.mp4","content_type":"video/mp4"}
  PUT upload_url with returned headers, --data-binary @source.mp4
  POST /projects/{id}/upload/complete
  POST /projects/{id}/transcript {"language":"en","requested_outputs":["text","speakers","utterances","words"]}
  POST /projects/{id}/vision {"requested_outputs":["faces","face_tracks","scenes"]}
  Poll GET endpoints until status=="completed". Save transcript.json and vision.json.

After: one line with duration, scene count, speaker count, word count, and 1-2 candidate windows (start-end, why).

2. HyperFrames render

Before: "Installing presets, authoring index.html, then rendering, usually 1-3 min."

Setup (Node >= 22, FFmpeg):
  npx skills add heygen-com/hyperframes
  npx hyperframes init my-video && cd my-video && npx hyperframes doctor

Install presets up front. Do not hand-build what a preset already does:
  npx hyperframes add caption-pill-karaoke
  npx hyperframes add flash-through-white
  npx hyperframes add instagram-follow
  npx hyperframes add grain-overlay
  npx hyperframes add vignette

Rules:
- Window starts exactly on a scene cut from scenes[] or t=0. Snap data-media-start accordingly.
- Captions: use the installed preset. Use 3-4 words per block, 8 max. Show from first-word-start to last-word-end or next cut. Never use single-word reveal.
- Emphasis, about 1 in 6: proper noun, interjection, profanity, or short standalone phrase. One effect only: color shift, scale bump, or physics drop.
- Hook: punchiest line in scene 1, top of frame. Only in scene 1. Exit at min(3s, scene_1_end).
- Auto-crop: per scene, fix 9:16 crop center at median face center of active speaker track. Within a scene, scale only, up to 8% punch-in on emphasis. At cuts, hard-cut. Never translate X/Y inside a scene.
- Scene cuts are hard resets. Clamp every tween to min(natural_end, next_cut). No crossfades. Use a transition preset for the 1-2 most editorially significant cuts.
- CTA outro: mount instagram-follow in the last about 3s. Customize avatar and handle.
- Finishing: grain-overlay and vignette over the whole clip.
- Face safe-zone: no overlay intersects face bbox plus 8% margin.
- Active-speaker face box: thin rounded rectangle in crop-space, 3px stroke, about 12px radius, soft shadow, follows face per-frame. Entry: scale 0.9 to 1.0 and opacity 0 to 1, back.out(2), 0.25s. One at a time.
- Speaker label is literally "name " plus (speaker_id + 1).

Per-frame seek model: HyperFrames renders frame-by-frame. Use timeline.set() or tl.fromTo at absolute positions. Implement physics as pure functions of (t - word.start).

Workflow:
1. Filter JSONs to target window. Print a 1-2 line summary.
2. Install presets up front.
3. Author index.html. Mount each sub-composition at its offset and drive captions from words[].
4. Run npx -y hyperframes lint.
5. Say "Lint passed, rendering, usually 1-3 min." Then run npx -y hyperframes render --output clip.mp4.

After: print the MP4 path on one line, plus one short line inviting iteration.