Skip to content

Orchestrator aws caching fix - gracefully keep cache valid if OOM crash occurs#823

Open
frostebite wants to merge 1 commit intomainfrom
fix/skip-empty-placeholder-cache-tars
Open

Orchestrator aws caching fix - gracefully keep cache valid if OOM crash occurs#823
frostebite wants to merge 1 commit intomainfrom
fix/skip-empty-placeholder-cache-tars

Conversation

@frostebite
Copy link
Copy Markdown
Member

@frostebite frostebite commented Mar 27, 2026

Summary

  • Remove empty placeholder tar creation (tar --files-from /dev/null) in build-automation-workflow.ts that was creating ~10KB stub files before remote-cli-post-build runs
  • Add safety-net size check in aws-s3-upload-cache hook to delete tar files < 1KB before S3 upload
  • Keep mkdir -p for cache directories so hooks don't error on missing dirs

Problem

When a build container is OOM-killed before remote-cli-post-build can create real Library/build cache tars, only the empty placeholder tars (~10KB) survive. These get uploaded to S3 by the upload hooks. On the next build, aws-s3-pull-cache downloads and extracts these empty tars, giving Unity an empty Library folder — zero caching benefit, full reimport every time.

This was reported by a user with a 20GB project running with containerMemory: 4096 — every build took 5+ hours because the Library cache was never populated.

Test plan

  • Verify existing e2e caching test (orchestrator-end2end-caching.test.ts) still passes — it expects real Library tars from PushToCache(), not the removed placeholders
  • Run a build with providerStrategy: aws and confirm real cache tars are created by remote-cli-post-build and uploaded to S3
  • Simulate OOM (low memory) and confirm no empty tars are uploaded to S3

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Cache upload process now automatically removes small temporary cache files under 1KB before storing to remote buckets, eliminating unnecessary clutter
    • Enhanced cache preparation by eliminating empty placeholder archive file generation, resulting in cleaner storage and more efficient cache management

The build workflow created empty tar files (via `tar --files-from /dev/null`)
as placeholders before remote-cli-post-build runs. When the container is
OOM-killed, only these ~10KB empty tars survive and get uploaded to S3.
On the next build, the pull-cache hook downloads them and extracts an empty
Library, providing zero caching benefit.

Changes:
- Remove empty placeholder tar creation in build-automation-workflow.ts
- Keep mkdir -p for cache directories (hooks need them)
- Add size check in aws-s3-upload-cache hook to delete tar files < 1KB
  before uploading, as a safety net against stale stubs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

📝 Walkthrough

Walkthrough

The changes optimize cache handling by removing empty placeholder tar archive generation during build preparation and adding logic to delete small tar artifacts (< 1KB) before uploading to S3, preventing cache pollution and wasted storage.

Changes

Cohort / File(s) Summary
Cache cleanup operations
src/model/orchestrator/services/hooks/container-hook-service.ts
Added deletion of small tar artifacts (< 1KB) in /data/cache/$CACHE_KEY/lfs and /data/cache/$CACHE_KEY/Library directories immediately before S3 upload to prevent polluting the cache with minimal-size files.
Cache preparation logic
src/model/orchestrator/workflows/build-automation-workflow.ts
Removed conditional tar archive creation for placeholder files during cache preparation. Now only ensures required directories exist, deferring real cache archive generation to post-build S3 upload hooks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Tidying the cache, oh what a sight!
No empty tars cluttering the night,
Small scraps deleted before they're sent,
S3 storage wisely spent!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'gracefully keep cache valid if OOM crash occurs,' but the actual changes do not implement graceful handling or crash recovery—they prevent empty tar creation and add cleanup of small artifacts. Update the title to better reflect the actual changes, such as 'Stop uploading empty placeholder cache tars and clean up small artifacts' or 'Remove empty placeholder tars and add safety-net size check for cache uploads.'
Description check ❓ Inconclusive The description covers the problem, solution, and test plan comprehensively. However, it is missing the 'Changes', 'Related Issues', 'Related PRs', and 'Successful Workflow Run Link' sections from the template, and the checklist is partially incomplete. Add the missing template sections: explicitly list changes with bullet points under 'Changes', reference related issues/PRs if applicable, provide a successful workflow run link, and complete the 'Docs' and 'Tests' checklist items.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skip-empty-placeholder-cache-tars

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/model/orchestrator/services/hooks/container-hook-service.ts (1)

158-161: Safety-net for AWS S3 looks good, but consider adding the same protection to rclone-upload-cache.

The find ... -size -1k -delete approach is a solid defense-in-depth measure for the AWS S3 upload hook. However, the rclone-upload-cache hook (lines 261-276) lacks this same safety-net, which could lead to the same cache pollution issue for users who use rclone instead of AWS S3.

♻️ Suggested fix: Add consistent safety-net to rclone-upload-cache hook

Add the same find/delete commands to the rclone-upload-cache hook around line 265:

 - name: rclone-upload-cache
   image: rclone/rclone
   hook: after
   commands: |
     if command -v rclone > /dev/null 2>&1; then
+      # Skip uploading empty or near-empty tar files (< 1KB)
+      find /data/cache/$CACHE_KEY/lfs -name "*.tar*" -size -1k -delete 2>/dev/null || true
+      find /data/cache/$CACHE_KEY/Library -name "*.tar*" -size -1k -delete 2>/dev/null || true
       rclone copy /data/cache/$CACHE_KEY/lfs ${
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/model/orchestrator/services/hooks/container-hook-service.ts` around lines
158 - 161, The rclone-upload-cache hook in container-hook-service.ts is missing
the defensive deletion of tiny/empty tar files that the AWS S3 upload hook uses;
update the rclone-upload-cache hook to run the same find ... -name "*.tar*"
-size -1k -delete (for both /data/cache/$CACHE_KEY/lfs and
/data/cache/$CACHE_KEY/Library) before performing the rclone upload so
near-empty stubs are removed, mirroring the logic already present in the AWS S3
upload hook.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/model/orchestrator/services/hooks/container-hook-service.ts`:
- Around line 158-161: The rclone-upload-cache hook in container-hook-service.ts
is missing the defensive deletion of tiny/empty tar files that the AWS S3 upload
hook uses; update the rclone-upload-cache hook to run the same find ... -name
"*.tar*" -size -1k -delete (for both /data/cache/$CACHE_KEY/lfs and
/data/cache/$CACHE_KEY/Library) before performing the rclone upload so
near-empty stubs are removed, mirroring the logic already present in the AWS S3
upload hook.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 49e58d38-36dd-4f4b-b487-e5a950ee3418

📥 Commits

Reviewing files that changed from the base of the PR and between ce7ce7a and 71a0700.

⛔ Files ignored due to path filters (2)
  • dist/index.js is excluded by !**/dist/**
  • dist/index.js.map is excluded by !**/dist/**, !**/*.map
📒 Files selected for processing (2)
  • src/model/orchestrator/services/hooks/container-hook-service.ts
  • src/model/orchestrator/workflows/build-automation-workflow.ts

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 31.23%. Comparing base (ce7ce7a) to head (71a0700).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #823   +/-   ##
=======================================
  Coverage   31.23%   31.23%           
=======================================
  Files          84       84           
  Lines        4565     4565           
  Branches     1103     1054   -49     
=======================================
  Hits         1426     1426           
  Misses       3139     3139           
Files with missing lines Coverage Δ
...hestrator/services/hooks/container-hook-service.ts 76.23% <ø> (ø)
...rchestrator/workflows/build-automation-workflow.ts 10.44% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

Cat Gif

@frostebite frostebite requested a review from GabLeRoux March 29, 2026 00:30
@frostebite frostebite changed the title fix: stop uploading empty placeholder cache tars to S3 Orchestrator aws caching fix Mar 29, 2026
@frostebite frostebite changed the title Orchestrator aws caching fix Orchestrator aws caching fix - gracefully keep cache valid if OOM crash occurs Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant