Context
Discovered during Course Builder demo video generation. The docgen generate-all pipeline fails at the compose stage on the first full run because Manim scenes are much shorter than the audio.
Problem
The pipeline stages execute in order: tts → timestamps → manim → vhs → compose. However, Manim scenes that use wait_until() for audio synchronization depend on timing.json data that is generated during the timestamps stage of the same run.
On the first run:
timestamps generates timing.json from the new TTS audio
manim renders scenes — but the scenes use timing data from timing.json for wait_until() calls, and if the scene was cached or the timing data format doesn't match expectations, the scenes render at their natural animation pace (~20-30s)
compose compares audio duration (259s) to video duration (22s) and raises FREEZE GUARD: 92% frozen
This creates a chicken-and-egg problem: scenes need timing data to render at the right length, but timing data comes from audio that was just generated.
What we had to do
- Run
generate-all once — TTS + timestamps succeed, Manim scenes render short, compose fails
- Clear Manim cache:
rm -rf animations/media/
- Re-render Manim with
docgen manim — now timing.json is populated, scenes render at correct duration
- Run
docgen compose — succeeds
Recommendations
Severity
High — every first-time generate-all run will fail at compose if scenes use timing-based pacing. The error message ("Re-render the visual source to be longer") doesn't explain the root cause.
Context
Discovered during Course Builder demo video generation. The
docgen generate-allpipeline fails at the compose stage on the first full run because Manim scenes are much shorter than the audio.Problem
The pipeline stages execute in order:
tts → timestamps → manim → vhs → compose. However, Manim scenes that usewait_until()for audio synchronization depend ontiming.jsondata that is generated during thetimestampsstage of the same run.On the first run:
timestampsgeneratestiming.jsonfrom the new TTS audiomanimrenders scenes — but the scenes use timing data fromtiming.jsonforwait_until()calls, and if the scene was cached or the timing data format doesn't match expectations, the scenes render at their natural animation pace (~20-30s)composecompares audio duration (259s) to video duration (22s) and raisesFREEZE GUARD: 92% frozenThis creates a chicken-and-egg problem: scenes need timing data to render at the right length, but timing data comes from audio that was just generated.
What we had to do
generate-allonce — TTS + timestamps succeed, Manim scenes render short, compose failsrm -rf animations/media/docgen manim— now timing.json is populated, scenes render at correct durationdocgen compose— succeedsRecommendations
generate-allshould detect this condition and automatically re-render Manim after timestamps are generated, if scene durations don't match audio durationswait_until()pacingdocgen manimshould always clear the cache for scenes whosetiming.jsonhas changed since last render--retry-manimflag togenerate-allthat auto-retries Manim if compose fails with FREEZE GUARDwait_until()requiretiming.json— rundocgen timestampsbeforedocgen manim"Severity
High — every first-time
generate-allrun will fail at compose if scenes use timing-based pacing. The error message ("Re-render the visual source to be longer") doesn't explain the root cause.