Skip to content

fix: add max-retry force-proceed to prevent infinite shutdown loop#3265

Merged
louisgv merged 1 commit intomainfrom
fix/issue-3261
Apr 11, 2026
Merged

fix: add max-retry force-proceed to prevent infinite shutdown loop#3265
louisgv merged 1 commit intomainfrom
fix/issue-3261

Conversation

@la14-1
Copy link
Copy Markdown
Member

@la14-1 la14-1 commented Apr 11, 2026

Why: When in-process teammates never respond to `shutdown_request`, the refactor team lead looped forever ("NEVER exit without shutting down all teammates first" + "send it again"), blocking `TeamDelete` and the non-interactive harness. This is the root cause of recurring stuck-agent incidents (#3244, #3249, #3260, #3261).

Changes

  • Lifecycle Management: Replaces the infinite retry with a 3-round max-retry policy — after 3 unanswered shutdown_requests (≈6 min), the team lead marks that teammate as non-responsive and proceeds to TeamDelete without waiting
  • Monitor Loop time budget: Fixes inconsistency where the section said "10 min warn, 12 min shutdown, 15 min force" but the actual Time Budget is 25 min (shutdown at 23)

How it prevents issue #3261

Before: team lead sends shutdown_request → no response → sends again → ... (forever)

After: team lead sends shutdown_request → no response → sends again (×3 total) → proceeds to TeamDelete regardless

Limitations

This is a prompt-level mitigation. The deeper harness issue (TeamDelete refusing to proceed when members are still "active" at the SDK level per #3154) still requires SDK-level investigation. This fix prevents the prompt from contributing to the loop.

Fixes #3261

-- refactor/issue-fixer

When in-process teammates get stuck and never respond to
shutdown_request, the team lead was previously instructed to
"NEVER exit without shutting down all teammates first" and to
"send it again" indefinitely. This creates an infinite loop that
blocks TeamDelete and the non-interactive harness.

This fix:
- Replaces "NEVER exit" with a 3-round max-retry policy
- After 3 unanswered shutdown_requests (≈6 min), mark teammate
  as non-responsive and proceed to TeamDelete without waiting
- Fixes time budget inconsistency in Monitor Loop section
  (was "10/12/15 min", now matches Time Budget "20/23/25 min")

Fixes #3261

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@louisgv louisgv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review

Verdict: APPROVED ✓

Commit: 5790bad

Summary

This PR fixes issue #3261 by adding max-retry logic to prevent infinite shutdown loops when teammates fail to respond. Changes are documentation-only (prompt file) with no security implications.

Findings

No security vulnerabilities detected.

Changes reviewed:

  • Time budget increased: 10/12/15 min → 20/23/25 min (allows more work time)
  • Max retry logic: Limits shutdown_request to 3 attempts per stuck teammate
  • Prevents infinite waiting loops that block TeamDelete

Tests

✓ All 2104 tests pass
✓ No code changes (documentation only)
✓ Addresses root cause from issue #3261

Risk Assessment

  • Command injection: N/A (no executable code)
  • Credential leaks: N/A (no secrets handling)
  • Path traversal: N/A (no file operations)
  • Logic bugs: Fix improves reliability by preventing infinite loops

-- security/pr-reviewer

@louisgv louisgv added the security-approved Security review approved label Apr 11, 2026
@louisgv louisgv merged commit 35c436b into main Apr 11, 2026
6 checks passed
@louisgv louisgv deleted the fix/issue-3261 branch April 11, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security-approved Security review approved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: refactor team agents stuck (in-process), blocking TeamDelete (cycle 2026-04-11)

2 participants