fix(oom): prevent RSS growth during VCR recording + sanitizer dedup#6
Open
matyas-jirat-keboola wants to merge 3 commits intomainfrom
Open
fix(oom): prevent RSS growth during VCR recording + sanitizer dedup#6matyas-jirat-keboola wants to merge 3 commits intomainfrom
matyas-jirat-keboola wants to merge 3 commits intomainfrom
Conversation
…ation - Add merge() to DefaultSanitizer, TokenSanitizer, HeaderSanitizer, QueryParamSanitizer, UrlPatternSanitizer, ResponseUrlSanitizer - Add _dedup_sanitizers() which collapses same-class sanitizers into one, so the sanitizer chain makes a single pass over each request/response instead of N passes when multiple sanitizers of the same type are present - Add pre-scan in DefaultSanitizer._sanitize_body: skip the full JSON parse when no sensitive field name or value is present in the body (fast path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: every HTTPS interaction during recording caused urllib3 to call load_default_certs() on a brand-new ssl.SSLContext, allocating ~480 KB of CA cert data into the OpenSSL C-heap (via jemalloc). jemalloc's high-watermark behaviour means this memory is never returned to the OS, producing ~7 MB RSS growth per 10 interactions and OOM for jobs with 500–2000 interactions. Three root causes, three fixes applied inside _pool_reuse_patch(): 1. Pool proliferation: components that create a new requests.Session per API call (e.g. Facebook Ads batch requests) each get their own urllib3.PoolManager and a fresh HTTPSConnectionPool, so N interactions → N pools → N ssl_context allocations. Fix: intercept PoolManager.connection_from_host and return one shared pool per (scheme, host, port) for the lifetime of the recording. 2. ssl_context per connect(): even with a single pool, urllib3 calls _new_conn() → connect() → _ssl_wrap_socket_and_match_hostname( ssl_context=None) for each interaction because requests 2.32.x never puts ssl_context into pool.conn_kw. When ssl_context is None, urllib3 creates a new context and calls load_default_certs() every time. Fix: inject one shared ssl.SSLContext (with load_default_certs() pre-called) into pool.conn_kw["ssl_context"] so all _new_conn() connections share it. 3. Connection churn: VCRConnection.is_connected delegated to real_connection.sock, which is None after Connection: close responses. urllib3 saw is_connected=False and called _new_conn() for every request. Fix: override is_connected on VCRConnection to always return True when real_connection exists (recording) or a VCRFakeSocket is present (replay). Additionally, implement VCRHTTPResponse.release_conn() so urllib3's pool queue is refilled after each response and _new_conn() is not called unnecessarily. Also: write large response body strings to the cassette file in 64 KB chunks (_write_interaction_chunked) to prevent multi-MB transient allocations from fragmenting glibc's heap during recording of APIs with large JSON payloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_vcr_is_connected uses _VCRFakeSocket via a global lookup (module-level function, not a closure). Deleting it with `del` before the property is ever called would raise NameError at runtime. Remove it from the del list so it stays available as a module-level name. Also fix ruff I001: import block sort order in sanitizers.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two self-contained commits:
1.
feat(sanitizers): merge duplicate sanitizers + pre-scan optimisationmerge()method_dedup_sanitizers()collapses same-class sanitizers into one, so the chain makes a single pass over each request/response instead of N passesDefaultSanitizer._sanitize_bodygains a pre-scan: skips the full JSON parse when no sensitive field name or value is present in the body (fast path for large unrelated responses)2.
fix(oom): prevent RSS growth during VCR recordingRoot cause: every HTTPS interaction caused urllib3 to call
load_default_certs()on a brand-newssl.SSLContext, allocating ~480 KB of CA cert data into the OpenSSL C-heap via jemalloc. jemalloc never returns this to the OS, producing ~7 MB RSS growth per 10 interactions → OOM for jobs with 500–2000 interactions.Three root causes fixed:
requests.Sessionper call each get their ownPoolManager→ N pools → N ssl_context allocations. Fix:_pool_reuse_patch()returns one shared pool per(scheme, host, port).requests2.32.x never putsssl_contextintopool.conn_kw, so urllib3 callsload_default_certs()on everyconnect(). Fix: inject a sharedSSLContextintopool.conn_kwat pool creation time.VCRConnection.is_connecteddelegated toreal_connection.sock(None afterConnection: close), causing_new_conn()on every request. Fix: overrideis_connectedto returnTruewhen a VCR connection/socket is active; implementVCRHTTPResponse.release_conn()so the pool queue is refilled.Also: response bodies written in 64 KB chunks to avoid large transient heap allocations fragmenting glibc's heap.
Confirmed working on Keboola platform — RSS growth eliminated on a 100+ interaction Facebook Ads recording run.
Test plan
uv run pytest)🤖 Generated with Claude Code