Problem
VHS demos require Sleep values that match TTS audio length. Today this is manual trial-and-error; docgen compose enforces a max freeze ratio when video is shorter than audio.
Proposal
Add a command (e.g. docgen sync-vhs or docgen tape-sync) that:
- Reads
animations/timing.json produced by docgen timestamps (Whisper segments on each audio/*.mp3).
- Parses each
terminal/*.tape for Type / Enter / Sleep blocks after the first Show (preamble unchanged).
- Partitions the narrated time span into equal wall-clock windows (one per block) and sets each
Sleep to window_duration - estimated_typing_time (typing estimate from Type payload length × configurable ms/char, capped).
Prior art
Course Builder implements this as a standalone script you can lift or vendor:
https://github.com/jmjava/course-builder/blob/main/docs/demos/scripts/sync_vhs_sleep_from_timing.py
Acceptance criteria
Notes
This does per-block alignment to the audio timeline, not word-level karaoke. Manim remains the path for frame-accurate visuals.
Problem
VHS demos require
Sleepvalues that match TTS audio length. Today this is manual trial-and-error;docgen composeenforces a max freeze ratio when video is shorter than audio.Proposal
Add a command (e.g.
docgen sync-vhsordocgen tape-sync) that:animations/timing.jsonproduced bydocgen timestamps(Whisper segments on eachaudio/*.mp3).terminal/*.tapeforType/Enter/Sleepblocks after the firstShow(preamble unchanged).Sleeptowindow_duration - estimated_typing_time(typing estimate fromTypepayload length × configurable ms/char, capped).Prior art
Course Builder implements this as a standalone script you can lift or vendor:
https://github.com/jmjava/course-builder/blob/main/docs/demos/scripts/sync_vhs_sleep_from_timing.pyAcceptance criteria
timing.json+.tapelayout;--dry-runand--segment(stem) filtersgenerate-all/rebuild-after-audioaftertimestampswhen config flag setNotes
This does per-block alignment to the audio timeline, not word-level karaoke. Manim remains the path for frame-accurate visuals.