fix: repair TCP port-remapping slow path#3
Conversation
congwang-mk
left a comment
There was a problem hiding this comment.
Thanks for your PR. Some comments below
src/sandlock/_port_remap.py
Outdated
| _NR_PIDFD_GETFD = 438 # x86_64 and aarch64 (asm-generic) | ||
|
|
||
|
|
||
| def _pidfd_open(pid: int) -> int: |
There was a problem hiding this comment.
duplicated with the one in _context.py ?
There was a problem hiding this comment.
Yes, I refrain from importing it from _context, should we separate all syscall wrappers into a different file, like _syscalls.py
tests/test_sandbox.py
Outdated
| t1 = threading.Thread(target=run, args=(0, code_hold)) | ||
| t1.start() | ||
| import time | ||
| time.sleep(1) |
There was a problem hiding this comment.
This could be flaky, could it be improved?
|
Re: the |
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Vahagn <wanghang25@mails.tsinghua.edu.cn>
Signed-off-by: Vahagn <wanghang25@mails.tsinghua.edu.cn>
fixed in 2201c42 |
Summary
Fixes #2. Four bugs in the TCP port-remapping slow path. The slow path (triggered when the virtual port is already taken) had not been tested (all existing tests ran sequentially, so the port was always free by the time the second sandbox started).
Bugs fixed
Bug 1 -
_allocate_real_portheld socket blocked the child's bindThe parent kept the allocated real port socket open to prevent stealing, then
rewrote the child's sockaddr to that port and sent CONTINUE. The child's
bind()hitEADDRINUSEbecause the parent was already holding the sameaddress:port (only released later in
PortMap.close()).Fixed by closing it immediately, accepting the same tiny race window already mentioned in
_try_reserve_port.Bug 2 -
os.pidfd_opennot available in Python buildfixup_getsocknameusedos.pidfd_openwhich raisesAttributeErroron some builds.Fixed by using
_libc.syscall(_NR_PIDFD_OPEN)directly, consistent with therest of the codebase. In the future, it makes sense to separate all syscalls into
a different file like
_syscall.py, since an identical_NR_PIDFD_OPENwasfound in
_context.py.Bug 3 -
fixup_getsocknameleaked the duplicated fds.detach()does not callos.close(), solocal_fdwas never closed on thesuccess path. The leaked fd kept the child's socket alive after the child
exited, holding the port bound in the parent. Fixed by moving\
os.close(local_fd)into thefinallyblock.Bug 4 -
NotifSupervisor.stop()never calledport_map.close()Proxy listener sockets were left running after sandbox teardown. Fixed by
calling
port_map.close()at the end ofstop().Tests
Two new tests added to
TestPortRemap:test_slow_path_host_holds_virtual_port: parent process holds the virtualport, forcing the slow path without threads or timing dependencies.
test_slow_path_two_concurrent_sandboxes: two sandboxes run concurrently; sandbox 1 holds the port open while sandbox 2 must remap.