Skip to content

feat: stream snapshots directly to HTTP response on cache miss#169

Merged
worstell merged 1 commit intomainfrom
stream-snapshot-on-cache-miss
Mar 10, 2026
Merged

feat: stream snapshots directly to HTTP response on cache miss#169
worstell merged 1 commit intomainfrom
stream-snapshot-on-cache-miss

Conversation

@worstell
Copy link
Contributor

@worstell worstell commented Mar 9, 2026

Problem

When a client requests a snapshot (/git/{host}/{repo}/snapshot.tar.zst) and the cache is empty, cachew currently clones the repo, generates a full tar.zst, uploads it to the cache backend, re-opens it from cache, then streams it to the client. For large repos (e.g. ~13GB), the client waits the entire time with zero bytes received and often times out.

Solution

On cache miss, after the clone is ready, stream the tar+zstd output directly to the HTTP response writer. The client gets bytes immediately once tar+zstd starts producing output. The cache is populated asynchronously via a background goroutine.

Changes

  • internal/snapshot/snapshot.go — Add StreamTo() that pipes tar | zstd to an arbitrary io.Writer without uploading to any cache backend
  • internal/strategy/git/snapshot.go — Add streamSnapshotAndBackfillCache() for the cache-miss path: clone from mirror under read lock, stream directly to client, trigger background cache upload, schedule periodic refresh

What's unchanged

  • Cache-hit path (cache.Open succeeds) is completely unchanged
  • Periodic snapshot refresh scheduling still happens after first generation
  • snapshotMutexFor locking still prevents concurrent snapshot generation for the same repo
  • repo.WithReadLock still excludes concurrent fetches during the clone step

@worstell worstell requested a review from a team as a code owner March 9, 2026 23:33
@worstell worstell requested review from stuartwdouglas and removed request for a team March 9, 2026 23:33
@worstell worstell force-pushed the stream-snapshot-on-cache-miss branch 3 times, most recently from 39cf7f5 to 894aac1 Compare March 10, 2026 23:33
On cache miss, instead of generating the full tar.zst, uploading it to
the cache backend, then re-opening it for the client, stream the tar+zstd
output directly to the HTTP response writer. This eliminates the long
wait with zero bytes that caused client timeouts on large repositories.

The cache is populated asynchronously via a background goroutine that
calls generateAndUploadSnapshot after the client stream completes. The
periodic snapshot refresh job is still scheduled as before.

The cache-hit path (cache.Open succeeds) is unchanged.

Changes:
- Add snapshot.StreamTo() that pipes tar|zstd to an arbitrary io.Writer
  without uploading to any cache backend
- Add streamSnapshotAndBackfillCache() to handle the cache-miss path:
  clone from mirror, stream to client, trigger background cache upload
- Defer snapshot dir cleanup to handle all exit paths including panics

Co-authored-by: amp-agent[bot] <amp-agent[bot]@users.noreply.github.com>
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019cda1a-365b-770c-b8ed-5b5dd69594e9
@worstell worstell force-pushed the stream-snapshot-on-cache-miss branch from 894aac1 to 4c21f7c Compare March 10, 2026 23:56
@worstell worstell merged commit ce6b182 into main Mar 10, 2026
5 checks passed
@worstell worstell deleted the stream-snapshot-on-cache-miss branch March 10, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants