fix: use PID-specific socket path to avoid orphan process contention#51
fix: use PID-specific socket path to avoid orphan process contention#51
Conversation
When multiple SDK sessions run and crash without cleanup, orphaned
capiscio-core RPC processes hold references to ~/.capiscio/rpc.sock.
New processes can't bind the same path, causing hangs.
Use rpc-{pid}.sock instead, so each Python process gets its own
clean socket. Atexit cleanup removes it on normal shutdown.
|
✅ Documentation validation passed!
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR updates the capiscio-core gRPC sidecar Unix socket naming strategy to reduce hangs/timeouts caused by orphaned sidecar processes retaining a shared socket path across runs.
Changes:
- Switch default Unix socket from a shared
rpc.sockto a PID-scopedrpc-{pid}.sock. - Add/update inline documentation describing the rationale for PID-specific sockets.
capiscio_sdk/_rpc/process.py
Outdated
| DEFAULT_SOCKET_DIR = Path.home() / ".capiscio" | ||
| DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / "rpc.sock" | ||
| DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / f"rpc-{os.getpid()}.sock" |
There was a problem hiding this comment.
DEFAULT_SOCKET_PATH is computed at import time using os.getpid(). In forked/multiprocessing scenarios where the SDK is imported before fork, child processes will inherit the parent's precomputed path and still contend on the same socket file, undermining the PID-specific design. Compute the default socket path lazily at runtime (e.g., when starting/connecting) based on the current PID, and consider a regression test that covers the pre-fork import case.
There was a problem hiding this comment.
Fixed in a7607fe — DEFAULT_SOCKET_PATH replaced with a lazy _default_socket_path() function that computes the PID-specific path at call time. Forked processes will now correctly get their own socket path.
capiscio_sdk/_rpc/process.py
Outdated
| # Default socket path — use PID-specific path to avoid contention | ||
| # with orphaned capiscio-core processes from previous runs. | ||
| DEFAULT_SOCKET_DIR = Path.home() / ".capiscio" | ||
| DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / "rpc.sock" | ||
| DEFAULT_SOCKET_PATH = DEFAULT_SOCKET_DIR / f"rpc-{os.getpid()}.sock" |
There was a problem hiding this comment.
PR description mentions "cleanup of stale PID sockets on startup", but the implementation here only unlinks the specific socket chosen for this process; it does not scan/remove old rpc-.sock files from previous runs. Either implement the advertised startup cleanup (e.g., glob rpc-.sock in DEFAULT_SOCKET_DIR and remove sockets for non-running PIDs) or adjust the PR description/comments to match actual behavior (and update remaining references to the old ~/.capiscio/rpc.sock default).
There was a problem hiding this comment.
Fixed in a7607fe — added _cleanup_stale_sockets() that globs rpc-*.sock in DEFAULT_SOCKET_DIR, checks each PID via os.kill(pid, 0), and removes sockets whose PIDs no longer exist. Called on startup before creating the new socket.
|
✅ All checks passed! Ready for review. |
|
✅ SDK server contract tests passed (test_server_integration.py). Cross-product scenarios are validated in capiscio-e2e-tests. |
- Replace module-level DEFAULT_SOCKET_PATH with _default_socket_path() function computed lazily at call time, so forked processes get their own PID in the socket name - Add _cleanup_stale_sockets() that globs rpc-*.sock and removes sockets whose PID no longer exists - Update test to use _default_socket_path() import
|
✅ Documentation validation passed!
|
|
✅ All checks passed! Ready for review. |
|
✅ SDK server contract tests passed (test_server_integration.py). Cross-product scenarios are validated in capiscio-e2e-tests. |
Problem
When a Python process using the SDK crashes or is killed without cleanup, the
capiscio-darwin-arm64RPC sidecar process can become orphaned while still holding the shared~/.capiscio/rpc.sockUnix socket. Subsequent SDK invocations then connect to the orphaned process's socket, causing hangs or timeouts.Fix
Changed
DEFAULT_SOCKET_PATHfromrpc.socktorpc-{pid}.sockso each Python process gets its own socket. Orphaned sidecar processes no longer block new SDK connections.Changes
capiscio_sdk/_rpc/process.py— PID-specific socket path + cleanup of stale PID sockets on startupTesting
Verified in a2a-demos (demo-one, demo-two) where multiple sequential runs previously hung due to orphaned processes.