feat: stream snapshots directly to HTTP response on cache miss#169
Merged
feat: stream snapshots directly to HTTP response on cache miss#169
Conversation
39cf7f5 to
894aac1
Compare
stuartwdouglas
approved these changes
Mar 10, 2026
On cache miss, instead of generating the full tar.zst, uploading it to the cache backend, then re-opening it for the client, stream the tar+zstd output directly to the HTTP response writer. This eliminates the long wait with zero bytes that caused client timeouts on large repositories. The cache is populated asynchronously via a background goroutine that calls generateAndUploadSnapshot after the client stream completes. The periodic snapshot refresh job is still scheduled as before. The cache-hit path (cache.Open succeeds) is unchanged. Changes: - Add snapshot.StreamTo() that pipes tar|zstd to an arbitrary io.Writer without uploading to any cache backend - Add streamSnapshotAndBackfillCache() to handle the cache-miss path: clone from mirror, stream to client, trigger background cache upload - Defer snapshot dir cleanup to handle all exit paths including panics Co-authored-by: amp-agent[bot] <amp-agent[bot]@users.noreply.github.com> Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019cda1a-365b-770c-b8ed-5b5dd69594e9
894aac1 to
4c21f7c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a client requests a snapshot (
/git/{host}/{repo}/snapshot.tar.zst) and the cache is empty, cachew currently clones the repo, generates a full tar.zst, uploads it to the cache backend, re-opens it from cache, then streams it to the client. For large repos (e.g. ~13GB), the client waits the entire time with zero bytes received and often times out.Solution
On cache miss, after the clone is ready, stream the
tar+zstdoutput directly to the HTTP response writer. The client gets bytes immediately once tar+zstd starts producing output. The cache is populated asynchronously via a background goroutine.Changes
internal/snapshot/snapshot.go— AddStreamTo()that pipestar | zstdto an arbitraryio.Writerwithout uploading to any cache backendinternal/strategy/git/snapshot.go— AddstreamSnapshotAndBackfillCache()for the cache-miss path: clone from mirror under read lock, stream directly to client, trigger background cache upload, schedule periodic refreshWhat's unchanged
cache.Opensucceeds) is completely unchangedsnapshotMutexForlocking still prevents concurrent snapshot generation for the same reporepo.WithReadLockstill excludes concurrent fetches during the clone step