Orchestrator aws caching fix - gracefully keep cache valid if OOM crash occurs#823
Orchestrator aws caching fix - gracefully keep cache valid if OOM crash occurs#823frostebite wants to merge 1 commit intomainfrom
Conversation
The build workflow created empty tar files (via `tar --files-from /dev/null`) as placeholders before remote-cli-post-build runs. When the container is OOM-killed, only these ~10KB empty tars survive and get uploaded to S3. On the next build, the pull-cache hook downloads them and extracts an empty Library, providing zero caching benefit. Changes: - Remove empty placeholder tar creation in build-automation-workflow.ts - Keep mkdir -p for cache directories (hooks need them) - Add size check in aws-s3-upload-cache hook to delete tar files < 1KB before uploading, as a safety net against stale stubs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe changes optimize cache handling by removing empty placeholder tar archive generation during build preparation and adding logic to delete small tar artifacts (< 1KB) before uploading to S3, preventing cache pollution and wasted storage. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/model/orchestrator/services/hooks/container-hook-service.ts (1)
158-161: Safety-net for AWS S3 looks good, but consider adding the same protection torclone-upload-cache.The
find ... -size -1k -deleteapproach is a solid defense-in-depth measure for the AWS S3 upload hook. However, therclone-upload-cachehook (lines 261-276) lacks this same safety-net, which could lead to the same cache pollution issue for users who use rclone instead of AWS S3.♻️ Suggested fix: Add consistent safety-net to rclone-upload-cache hook
Add the same find/delete commands to the
rclone-upload-cachehook around line 265:- name: rclone-upload-cache image: rclone/rclone hook: after commands: | if command -v rclone > /dev/null 2>&1; then + # Skip uploading empty or near-empty tar files (< 1KB) + find /data/cache/$CACHE_KEY/lfs -name "*.tar*" -size -1k -delete 2>/dev/null || true + find /data/cache/$CACHE_KEY/Library -name "*.tar*" -size -1k -delete 2>/dev/null || true rclone copy /data/cache/$CACHE_KEY/lfs ${🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/model/orchestrator/services/hooks/container-hook-service.ts` around lines 158 - 161, The rclone-upload-cache hook in container-hook-service.ts is missing the defensive deletion of tiny/empty tar files that the AWS S3 upload hook uses; update the rclone-upload-cache hook to run the same find ... -name "*.tar*" -size -1k -delete (for both /data/cache/$CACHE_KEY/lfs and /data/cache/$CACHE_KEY/Library) before performing the rclone upload so near-empty stubs are removed, mirroring the logic already present in the AWS S3 upload hook.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/model/orchestrator/services/hooks/container-hook-service.ts`:
- Around line 158-161: The rclone-upload-cache hook in container-hook-service.ts
is missing the defensive deletion of tiny/empty tar files that the AWS S3 upload
hook uses; update the rclone-upload-cache hook to run the same find ... -name
"*.tar*" -size -1k -delete (for both /data/cache/$CACHE_KEY/lfs and
/data/cache/$CACHE_KEY/Library) before performing the rclone upload so
near-empty stubs are removed, mirroring the logic already present in the AWS S3
upload hook.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 49e58d38-36dd-4f4b-b487-e5a950ee3418
⛔ Files ignored due to path filters (2)
dist/index.jsis excluded by!**/dist/**dist/index.js.mapis excluded by!**/dist/**,!**/*.map
📒 Files selected for processing (2)
src/model/orchestrator/services/hooks/container-hook-service.tssrc/model/orchestrator/workflows/build-automation-workflow.ts
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #823 +/- ##
=======================================
Coverage 31.23% 31.23%
=======================================
Files 84 84
Lines 4565 4565
Branches 1103 1054 -49
=======================================
Hits 1426 1426
Misses 3139 3139
🚀 New features to boost your workflow:
|

Summary
tar --files-from /dev/null) inbuild-automation-workflow.tsthat was creating ~10KB stub files beforeremote-cli-post-buildrunsaws-s3-upload-cachehook to delete tar files < 1KB before S3 uploadmkdir -pfor cache directories so hooks don't error on missing dirsProblem
When a build container is OOM-killed before
remote-cli-post-buildcan create real Library/build cache tars, only the empty placeholder tars (~10KB) survive. These get uploaded to S3 by the upload hooks. On the next build,aws-s3-pull-cachedownloads and extracts these empty tars, giving Unity an empty Library folder — zero caching benefit, full reimport every time.This was reported by a user with a 20GB project running with
containerMemory: 4096— every build took 5+ hours because the Library cache was never populated.Test plan
orchestrator-end2end-caching.test.ts) still passes — it expects real Library tars fromPushToCache(), not the removed placeholdersproviderStrategy: awsand confirm real cache tars are created byremote-cli-post-buildand uploaded to S3🤖 Generated with Claude Code
Summary by CodeRabbit