fix(oom): prevent RSS growth during VCR recording + sanitizer dedup by matyas-jirat-keboola · Pull Request #6 · keboola/python-vcr-tests

matyas-jirat-keboola · 2026-03-05T18:54:21Z

Summary

Two self-contained commits:

1. `feat(sanitizers)`: merge duplicate sanitizers + pre-scan optimisation

Each sanitizer class gains a merge() method
_dedup_sanitizers() collapses same-class sanitizers into one, so the chain makes a single pass over each request/response instead of N passes
DefaultSanitizer._sanitize_body gains a pre-scan: skips the full JSON parse when no sensitive field name or value is present in the body (fast path for large unrelated responses)

2. `fix(oom)`: prevent RSS growth during VCR recording

Root cause: every HTTPS interaction caused urllib3 to call load_default_certs() on a brand-new ssl.SSLContext, allocating ~480 KB of CA cert data into the OpenSSL C-heap via jemalloc. jemalloc never returns this to the OS, producing ~7 MB RSS growth per 10 interactions → OOM for jobs with 500–2000 interactions.

Three root causes fixed:

Pool proliferation — components creating a new requests.Session per call each get their own PoolManager → N pools → N ssl_context allocations. Fix: _pool_reuse_patch() returns one shared pool per (scheme, host, port).
ssl_context per connect() — even with one pool, requests 2.32.x never puts ssl_context into pool.conn_kw, so urllib3 calls load_default_certs() on every connect(). Fix: inject a shared SSLContext into pool.conn_kw at pool creation time.
Connection churn — VCRConnection.is_connected delegated to real_connection.sock (None after Connection: close), causing _new_conn() on every request. Fix: override is_connected to return True when a VCR connection/socket is active; implement VCRHTTPResponse.release_conn() so the pool queue is refilled.

Also: response bodies written in 64 KB chunks to avoid large transient heap allocations fragmenting glibc's heap.

Confirmed working on Keboola platform — RSS growth eliminated on a 100+ interaction Facebook Ads recording run.

Test plan

119 unit tests pass (uv run pytest)
Confirmed working on Keboola platform

🤖 Generated with Claude Code

…ation - Add merge() to DefaultSanitizer, TokenSanitizer, HeaderSanitizer, QueryParamSanitizer, UrlPatternSanitizer, ResponseUrlSanitizer - Add _dedup_sanitizers() which collapses same-class sanitizers into one, so the sanitizer chain makes a single pass over each request/response instead of N passes when multiple sanitizers of the same type are present - Add pre-scan in DefaultSanitizer._sanitize_body: skip the full JSON parse when no sensitive field name or value is present in the body (fast path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Root cause: every HTTPS interaction during recording caused urllib3 to call load_default_certs() on a brand-new ssl.SSLContext, allocating ~480 KB of CA cert data into the OpenSSL C-heap (via jemalloc). jemalloc's high-watermark behaviour means this memory is never returned to the OS, producing ~7 MB RSS growth per 10 interactions and OOM for jobs with 500–2000 interactions. Three root causes, three fixes applied inside _pool_reuse_patch(): 1. Pool proliferation: components that create a new requests.Session per API call (e.g. Facebook Ads batch requests) each get their own urllib3.PoolManager and a fresh HTTPSConnectionPool, so N interactions → N pools → N ssl_context allocations. Fix: intercept PoolManager.connection_from_host and return one shared pool per (scheme, host, port) for the lifetime of the recording. 2. ssl_context per connect(): even with a single pool, urllib3 calls _new_conn() → connect() → _ssl_wrap_socket_and_match_hostname( ssl_context=None) for each interaction because requests 2.32.x never puts ssl_context into pool.conn_kw. When ssl_context is None, urllib3 creates a new context and calls load_default_certs() every time. Fix: inject one shared ssl.SSLContext (with load_default_certs() pre-called) into pool.conn_kw["ssl_context"] so all _new_conn() connections share it. 3. Connection churn: VCRConnection.is_connected delegated to real_connection.sock, which is None after Connection: close responses. urllib3 saw is_connected=False and called _new_conn() for every request. Fix: override is_connected on VCRConnection to always return True when real_connection exists (recording) or a VCRFakeSocket is present (replay). Additionally, implement VCRHTTPResponse.release_conn() so urllib3's pool queue is refilled after each response and _new_conn() is not called unnecessarily. Also: write large response body strings to the cassette file in 64 KB chunks (_write_interaction_chunked) to prevent multi-MB transient allocations from fragmenting glibc's heap during recording of APIs with large JSON payloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_vcr_is_connected uses _VCRFakeSocket via a global lookup (module-level function, not a closure). Deleting it with `del` before the property is ever called would raise NameError at runtime. Remove it from the del list so it stays available as a module-level name. Also fix ruff I001: import block sort order in sanitizers.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

matyas-jirat-keboola and others added 3 commits March 5, 2026 19:53

matyas-jirat-keboola mentioned this pull request Mar 5, 2026

fix(oom): prevent RSS growth during VCR recording by reusing SSLContext per pool #5

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(oom): prevent RSS growth during VCR recording + sanitizer dedup#6

fix(oom): prevent RSS growth during VCR recording + sanitizer dedup#6
matyas-jirat-keboola wants to merge 3 commits intomainfrom
fix/oom-clean

matyas-jirat-keboola commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

matyas-jirat-keboola commented Mar 5, 2026

Summary

1. feat(sanitizers): merge duplicate sanitizers + pre-scan optimisation

2. fix(oom): prevent RSS growth during VCR recording

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `feat(sanitizers)`: merge duplicate sanitizers + pre-scan optimisation

2. `fix(oom)`: prevent RSS growth during VCR recording