Open
Conversation
…530) Adds puf_impute.py and source_impute.py from PR #516 (by @MaxGhenis), refactors extended_cps.py to delegate to the new modules, and integrates both into the unified calibration pipeline. The core fix removes the subsample(10_000) call that dropped high-income PUF records before QRF training, which caused a hard AGI ceiling at ~$6.26M after uprating. Co-Authored-By: Max Ghenis <mghenis@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ibration loader Replace weight-proportional PUF subsample with stratified approach that force-includes top 0.5% by AGI and randomly samples rest to 20K, preserving the high-income tail the QRF needs. Remove random state assignment from SIPP and SCF in source_impute.py since these surveys lack state identifiers. Fix unified_calibration.py to handle TIME_PERIOD_ARRAYS dataset format. Add `make calibrate` target. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ter CLI - Add build-only mode to save calibration matrix as pickle package - Add target config YAML for declarative target exclusion rules - Add CLI flags for beta, lambda_l2, learning_rate hyperparameters - Add streaming subprocess output in Modal runner - Add calibration pipeline documentation - Add tests for target config filtering and CLI arg parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EITC takeup rates vary by child count and WIC rates vary by demographic category, both requiring sim-calculated values. Introduce a two-pass flow in _simulate_clone() with a post_sim_modifier that runs after the first cache clear, computes category variables, re-randomizes takeup, then clears caches again before final target calculation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on PR #538 (
calibration-pipeline-improvements) to add category-dependent takeup re-randomization fortakes_up_eitcandwould_claim_wicduring calibration cloning (issue #532, partial).PR #531 introduced seeded takeup re-randomization for 8 "simple" variables whose rates are known from YAML alone. However,
takes_up_eitcandwould_claim_wicwere deferred because their rates depend on entity-level categories (eitc_child_count,wic_category_str) that require simulation output to determine.Changes
_eitc_category_mapper()(clamps child count to max key) and_wic_category_mapper()(unknown categories default to 0)CATEGORY_TAKEUP_VARSconfig: Declarative config for the 2 category-dependent variables, parallelingSIMPLE_TAKEUP_VARSrerandomize_category_takeup(): Post-simulation re-randomization function that calculates category variables from the sim, maps them to per-entity rates, and draws seeded per-block takeup booleans_simulate_clone()flow: Addedpost_sim_modifierparameter to_simulate_clone()andbuild_matrix()— runs after the first cache clear (so category variables are computed fresh), then clears caches again before final target calculationpost_sim_modifierclosure wired inrun_calibration()alongside the existingsim_modifierrerandomize_category_takeup()end-to-end with a mock sim (no dataset required)Deferred
is_wic_at_nutritional_riskre-randomization — depends on bothwic_category_strandreceives_wic, requires additional design consideration (see issue Add category-dependent takeup re-randomization (EITC, WIC) #532)Test plan
pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py— 33 tests pass (18 new)pytest policyengine_us_data/tests/test_calibration/— full suite 103 tests passblack . -l 79 --check— formatting cleanrerandomize_category_takeup()output and hand-computed seeded draws + rate mappings🤖 Generated with Claude Code