Skip to content

fix: use PID-specific socket path to avoid orphan process contention#51

Merged
beonde merged 2 commits intomainfrom
fix/pid-specific-socket
Apr 4, 2026
Merged

fix: use PID-specific socket path to avoid orphan process contention#51
beonde merged 2 commits intomainfrom
fix/pid-specific-socket

Conversation

@beonde
Copy link
Copy Markdown
Member

@beonde beonde commented Apr 4, 2026

Problem

When a Python process using the SDK crashes or is killed without cleanup, the capiscio-darwin-arm64 RPC sidecar process can become orphaned while still holding the shared ~/.capiscio/rpc.sock Unix socket. Subsequent SDK invocations then connect to the orphaned process's socket, causing hangs or timeouts.

Fix

Changed DEFAULT_SOCKET_PATH from rpc.sock to rpc-{pid}.sock so each Python process gets its own socket. Orphaned sidecar processes no longer block new SDK connections.

Changes

  • capiscio_sdk/_rpc/process.py — PID-specific socket path + cleanup of stale PID sockets on startup

Testing

Verified in a2a-demos (demo-one, demo-two) where multiple sequential runs previously hung due to orphaned processes.

When multiple SDK sessions run and crash without cleanup, orphaned
capiscio-core RPC processes hold references to ~/.capiscio/rpc.sock.
New processes can't bind the same path, causing hangs.

Use rpc-{pid}.sock instead, so each Python process gets its own
clean socket. Atexit cleanup removes it on normal shutdown.
Copilot AI review requested due to automatic review settings April 4, 2026 19:13
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ Documentation validation passed!

Unified docs will be deployed from capiscio-docs repo.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the capiscio-core gRPC sidecar Unix socket naming strategy to reduce hangs/timeouts caused by orphaned sidecar processes retaining a shared socket path across runs.

Changes:

  • Switch default Unix socket from a shared rpc.sock to a PID-scoped rpc-{pid}.sock.
  • Add/update inline documentation describing the rationale for PID-specific sockets.

Comment on lines +23 to +24
DEFAULT_SOCKET_DIR = Path.home() / ".capiscio"
DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / "rpc.sock"
DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / f"rpc-{os.getpid()}.sock"
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_SOCKET_PATH is computed at import time using os.getpid(). In forked/multiprocessing scenarios where the SDK is imported before fork, child processes will inherit the parent's precomputed path and still contend on the same socket file, undermining the PID-specific design. Compute the default socket path lazily at runtime (e.g., when starting/connecting) based on the current PID, and consider a regression test that covers the pre-fork import case.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7607feDEFAULT_SOCKET_PATH replaced with a lazy _default_socket_path() function that computes the PID-specific path at call time. Forked processes will now correctly get their own socket path.

Comment on lines +21 to +24
# Default socket path — use PID-specific path to avoid contention
# with orphaned capiscio-core processes from previous runs.
DEFAULT_SOCKET_DIR = Path.home() / ".capiscio"
DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / "rpc.sock"
DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / f"rpc-{os.getpid()}.sock"
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions "cleanup of stale PID sockets on startup", but the implementation here only unlinks the specific socket chosen for this process; it does not scan/remove old rpc-.sock files from previous runs. Either implement the advertised startup cleanup (e.g., glob rpc-.sock in DEFAULT_SOCKET_DIR and remove sockets for non-running PIDs) or adjust the PR description/comments to match actual behavior (and update remaining references to the old ~/.capiscio/rpc.sock default).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7607fe — added _cleanup_stale_sockets() that globs rpc-*.sock in DEFAULT_SOCKET_DIR, checks each PID via os.kill(pid, 0), and removes sockets whose PIDs no longer exist. Called on startup before creating the new socket.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ All checks passed! Ready for review.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ SDK server contract tests passed (test_server_integration.py). Cross-product scenarios are validated in capiscio-e2e-tests.

- Replace module-level DEFAULT_SOCKET_PATH with _default_socket_path()
  function computed lazily at call time, so forked processes get their
  own PID in the socket name
- Add _cleanup_stale_sockets() that globs rpc-*.sock and removes
  sockets whose PID no longer exists
- Update test to use _default_socket_path() import
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ Documentation validation passed!

Unified docs will be deployed from capiscio-docs repo.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ All checks passed! Ready for review.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ SDK server contract tests passed (test_server_integration.py). Cross-product scenarios are validated in capiscio-e2e-tests.

@beonde beonde merged commit da125a1 into main Apr 4, 2026
13 checks passed
@beonde beonde deleted the fix/pid-specific-socket branch April 4, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants