Skip to content

Category takeup rerandomization#540

Open
juaristi22 wants to merge 6 commits intomainfrom
category-takeup-rerandomization
Open

Category takeup rerandomization#540
juaristi22 wants to merge 6 commits intomainfrom
category-takeup-rerandomization

Conversation

@juaristi22
Copy link
Collaborator

@juaristi22 juaristi22 commented Feb 18, 2026

Summary

Builds on PR #538 (calibration-pipeline-improvements) to add category-dependent takeup re-randomization for takes_up_eitc and would_claim_wic during calibration cloning (issue #532, partial).

PR #531 introduced seeded takeup re-randomization for 8 "simple" variables whose rates are known from YAML alone. However, takes_up_eitc and would_claim_wic were deferred because their rates depend on entity-level categories (eitc_child_count, wic_category_str) that require simulation output to determine.

Changes

  • Mapper functions: _eitc_category_mapper() (clamps child count to max key) and _wic_category_mapper() (unknown categories default to 0)
  • CATEGORY_TAKEUP_VARS config: Declarative config for the 2 category-dependent variables, paralleling SIMPLE_TAKEUP_VARS
  • rerandomize_category_takeup(): Post-simulation re-randomization function that calculates category variables from the sim, maps them to per-entity rates, and draws seeded per-block takeup booleans
  • Two-pass _simulate_clone() flow: Added post_sim_modifier parameter to _simulate_clone() and build_matrix() — runs after the first cache clear (so category variables are computed fresh), then clears caches again before final target calculation
  • post_sim_modifier closure wired in run_calibration() alongside the existing sim_modifier
  • 18 new tests including 9 integration tests that exercise rerandomize_category_takeup() end-to-end with a mock sim (no dataset required)

Deferred

Test plan

  • pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py — 33 tests pass (18 new)
  • pytest policyengine_us_data/tests/test_calibration/ — full suite 103 tests pass
  • black . -l 79 --check — formatting clean
  • Integration tests verify bit-exact match between rerandomize_category_takeup() output and hand-computed seeded draws + rate mappings

🤖 Generated with Claude Code

baogorek and others added 6 commits February 16, 2026 08:37
…530)

Adds puf_impute.py and source_impute.py from PR #516 (by @MaxGhenis),
refactors extended_cps.py to delegate to the new modules, and integrates
both into the unified calibration pipeline. The core fix removes the
subsample(10_000) call that dropped high-income PUF records before QRF
training, which caused a hard AGI ceiling at ~$6.26M after uprating.

Co-Authored-By: Max Ghenis <mghenis@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ibration loader

Replace weight-proportional PUF subsample with stratified approach that
force-includes top 0.5% by AGI and randomly samples rest to 20K, preserving
the high-income tail the QRF needs. Remove random state assignment from SIPP
and SCF in source_impute.py since these surveys lack state identifiers. Fix
unified_calibration.py to handle TIME_PERIOD_ARRAYS dataset format. Add
`make calibrate` target.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ter CLI

- Add build-only mode to save calibration matrix as pickle package
- Add target config YAML for declarative target exclusion rules
- Add CLI flags for beta, lambda_l2, learning_rate hyperparameters
- Add streaming subprocess output in Modal runner
- Add calibration pipeline documentation
- Add tests for target config filtering and CLI arg parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EITC takeup rates vary by child count and WIC rates vary by demographic
category, both requiring sim-calculated values. Introduce a two-pass flow
in _simulate_clone() with a post_sim_modifier that runs after the first
cache clear, computes category variables, re-randomizes takeup, then
clears caches again before final target calculation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments