Category takeup rerandomization by juaristi22 · Pull Request #540 · PolicyEngine/policyengine-us-data

juaristi22 · 2026-02-18T14:39:42Z

Summary

Builds on PR #538 (calibration-pipeline-improvements) to add category-dependent takeup re-randomization for takes_up_eitc and would_claim_wic during calibration cloning (issue #532, partial).

PR #531 introduced seeded takeup re-randomization for 8 "simple" variables whose rates are known from YAML alone. However, takes_up_eitc and would_claim_wic were deferred because their rates depend on entity-level categories (eitc_child_count, wic_category_str) that require simulation output to determine.

Changes

Mapper functions: _eitc_category_mapper() (clamps child count to max key) and _wic_category_mapper() (unknown categories default to 0)
CATEGORY_TAKEUP_VARS config: Declarative config for the 2 category-dependent variables, paralleling SIMPLE_TAKEUP_VARS
rerandomize_category_takeup(): Post-simulation re-randomization function that calculates category variables from the sim, maps them to per-entity rates, and draws seeded per-block takeup booleans
Two-pass _simulate_clone() flow: Added post_sim_modifier parameter to _simulate_clone() and build_matrix() — runs after the first cache clear (so category variables are computed fresh), then clears caches again before final target calculation
post_sim_modifier closure wired in run_calibration() alongside the existing sim_modifier
18 new tests including 9 integration tests that exercise rerandomize_category_takeup() end-to-end with a mock sim (no dataset required)

Deferred

is_wic_at_nutritional_risk re-randomization — depends on both wic_category_str and receives_wic, requires additional design consideration (see issue Add category-dependent takeup re-randomization (EITC, WIC) #532)

Test plan

pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py — 33 tests pass (18 new)
pytest policyengine_us_data/tests/test_calibration/ — full suite 103 tests pass
black . -l 79 --check — formatting clean
Integration tests verify bit-exact match between rerandomize_category_takeup() output and hand-computed seeded draws + rate mappings

🤖 Generated with Claude Code

@MaxGhenis

…530) Adds puf_impute.py and source_impute.py from PR #516 (by @MaxGhenis), refactors extended_cps.py to delegate to the new modules, and integrates both into the unified calibration pipeline. The core fix removes the subsample(10_000) call that dropped high-income PUF records before QRF training, which caused a hard AGI ceiling at ~$6.26M after uprating. Co-Authored-By: Max Ghenis <mghenis@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ibration loader Replace weight-proportional PUF subsample with stratified approach that force-includes top 0.5% by AGI and randomly samples rest to 20K, preserving the high-income tail the QRF needs. Remove random state assignment from SIPP and SCF in source_impute.py since these surveys lack state identifiers. Fix unified_calibration.py to handle TIME_PERIOD_ARRAYS dataset format. Add `make calibrate` target. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ter CLI - Add build-only mode to save calibration matrix as pickle package - Add target config YAML for declarative target exclusion rules - Add CLI flags for beta, lambda_l2, learning_rate hyperparameters - Add streaming subprocess output in Modal runner - Add calibration pipeline documentation - Add tests for target config filtering and CLI arg parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

EITC takeup rates vary by child count and WIC rates vary by demographic category, both requiring sim-calculated values. Introduce a two-pass flow in _simulate_clone() with a post_sim_modifier that runs after the first cache clear, computes category variables, re-randomizes takeup, then clears caches again before final target calculation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baogorek and others added 6 commits February 16, 2026 08:37

Ignore all calibration run outputs in storage/calibration/

61523d8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add category-dependent takeup re-randomization for WIC nutritional risk

5d0ecf6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Category takeup rerandomization#540

Category takeup rerandomization#540
juaristi22 wants to merge 6 commits intomainfrom
category-takeup-rerandomization

juaristi22 commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

juaristi22 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Deferred

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

juaristi22 commented Feb 18, 2026 •

edited

Loading