Skip to content

fix(backend): add context cancellation to between-run listener goroutines#1065

Open
Gkrumbach07 wants to merge 1 commit intomainfrom
fix/between-run-listener-context
Open

fix(backend): add context cancellation to between-run listener goroutines#1065
Gkrumbach07 wants to merge 1 commit intomainfrom
fix/between-run-listener-context

Conversation

@Gkrumbach07
Copy link
Copy Markdown
Contributor

Summary

  • Prevents goroutine leaks when runner pods die without closing TCP connections
  • Adds context.Context to between-run listener goroutines for clean cancellation

Changes

  • Store context.CancelFunc in betweenRunListeners map (was struct{})
  • Use http.NewRequestWithContext so context cancellation closes the HTTP response body and unblocks ReadString
  • Replace time.Sleep with select{} on ctx.Done for interruptible backoff
  • cleanupStaleSessions calls the CancelFunc to terminate the goroutine
  • Check ctx.Err() at retry boundaries to exit promptly on cancellation

Why

Without context cancellation, if a runner pod dies (e.g., OOM kill) without sending EOF, the listener goroutine blocks forever on ReadString('\n'). Each leaked goroutine holds ~8KB stack + one open TCP connection. The sync.Map entry stays, preventing new listeners for that session.

Test plan

  • go build ./... passes
  • Deploy and verify goroutines exit when sessions are cleaned up

🤖 Generated with Claude Code

…ines

The between-run listener goroutine could leak if a runner pod died
without closing the TCP connection (e.g., OOM kill). The goroutine
would block forever on ReadString with no way to cancel it.

Fix:
- Store context.CancelFunc in betweenRunListeners map instead of struct{}
- Use http.NewRequestWithContext so context cancellation closes the
  HTTP response body and unblocks ReadString
- Replace time.Sleep with select{} on ctx.Done for interruptible backoff
- cleanupStaleSessions calls the CancelFunc to terminate the goroutine
- Check ctx.Err() at retry boundaries to exit promptly on cancellation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ab45a95f-5e5f-4d8d-826f-bb2bccf8d48b

📥 Commits

Reviewing files that changed from the base of the PR and between 03f9f1f and 2cc1f76.

📒 Files selected for processing (1)
  • components/backend/websocket/agui_proxy.go

Walkthrough

Refactors between-run listener lifecycle management in the websocket proxy by replacing sentinel values with context.CancelFunc storage. The listener now uses context-based cancellation for graceful shutdown and goroutine cleanup, replacing time-based backoffs with context-aware select blocks and applying context to SSE HTTP requests.

Changes

Cohort / File(s) Summary
Between-run Listener Lifecycle
components/backend/websocket/agui_proxy.go
Refactored listener lifecycle from sentinel-based to context-based cancellation. betweenRunListeners now stores context.CancelFunc instead of struct{}. ensureBetweenRunListener creates cancellable contexts and stores cancel functions; duplicate startups cancel the new context to prevent orphans. Stale session cleanup invokes LoadAndDelete on stored cancel functions. listenBetweenRunEvents accepts context.Context, creates SSE requests via http.NewRequestWithContext, replaces time.Sleep backoffs with context-aware select blocks, closes response bodies on context cancellation, and exits early when context is done.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding context cancellation to between-run listener goroutines to fix resource leaks.
Description check ✅ Passed The description is directly related to the changeset, explaining the problem, solution, specific changes, rationale, and test plan.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/between-run-listener-context

Comment @coderabbitai help to get the list of available commands and usage tips.

@ambient-code
Copy link
Copy Markdown
Contributor

ambient-code bot commented Mar 27, 2026

Review Queue Status

Check Status Detail
CI FAIL CodeQL check failed
Conflicts pass ---
Reviews pass ---

Action needed: Fix CodeQL failure

Auto-generated by Review Queue workflow. Updated when PR changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant