Clean up align_partial_results, eliminate double alignment, and remove Data round-trip#5110
Open
ltiao wants to merge 1 commit intofacebook:mainfrom
Open
Clean up align_partial_results, eliminate double alignment, and remove Data round-trip#5110ltiao wants to merge 1 commit intofacebook:mainfrom
ltiao wants to merge 1 commit intofacebook:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5110 +/- ##
=======================================
Coverage 96.39% 96.40%
=======================================
Files 613 613
Lines 68345 68322 -23
=======================================
- Hits 65880 65863 -17
+ Misses 2465 2459 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e Data round-trip Summary: Profiling-driven cleanup and performance improvements to the early stopping pipeline. **1. Remove redundant validations from `align_partial_results` (utils.py)** `align_partial_results` contained three validation/logging blocks that are redundant with upstream checks already performed by `_lookup_and_validate` in `base.py`: - **Missing metrics check + `raise ValueError`:** `_lookup_and_validate` already verifies each metric signature exists in the data and returns `None` before `align_partial_results` is ever called. - **Per-metric logging loop:** The "no data" branch is unreachable because the upstream checks guarantee data exists for each metric. Debug logging about MAP_KEY ranges is misplaced for a pure alignment function. - **Trial-to-arm uniqueness check:** Redundant with `is_eligible_any` which already rejects `BatchTrial` -- the only way a trial gets multiple arms. The arm-to-trial uniqueness check was moved (not removed) to `_lookup_and_validate`, where it properly guards all downstream consumers, not just alignment. The `isin` filter (`df = df[df['metric_signature'].isin(metrics)]`) was kept -- it is part of `align_partial_results`'s own contract since the function accepts a `metrics` argument and callers depend on it filtering to those metrics. After cleanup, `align_partial_results` is a focused alignment function: filter to requested metrics -> drop `arm_name` -> drop duplicates -> sort -> pivot -> interpolate. **2. Eliminate double data alignment in `PercentileEarlyStoppingStrategy` (percentile.py)** When `check_safe=True`, the base class `should_stop_trials_early` calls `_is_harmful()` which calls `_prepare_aligned_frames()`, then `_should_stop_trials_early()` calls `_prepare_aligned_frames()` again -- identical data lookup + alignment running twice. Fix: override `should_stop_trials_early` to call `_default_objective_and_direction()` and `_prepare_aligned_frames()` once, passing results as required arguments (`metric_signature`, `minimize`, `aligned_frames`) to both `_is_harmful` and `_should_stop_trials_early`. **3. Eliminate the `Data` round-trip in `_lookup_and_validate` (base.py, threshold.py)** `_lookup_and_validate` (formerly `_lookup_and_validate_data`) used to wrap its filtered DataFrame back into `Data(df=filtered_df)` at the end, purely to satisfy its `Data | None` return type. The sole consumer (`_prepare_aligned_frames`) immediately called `.full_df` to get the DataFrame back. This triggered a pointless df-to-DataRow-to-df round-trip on every call: ``` DataFrame -> itertuples -> list[DataRow] -> from_records -> regex sort -> cast -> DataFrame ``` Profiling shows this round-trip is **100% overhead** at every scale: | Scale | Rows | Round-trip cost | Replacement (df.copy) | |-----------------|---------|----------------|-----------------------| | tiny (5x10) | 50 | 4ms | <0.1ms | | typical (20x100)| 2,000 | 15ms | <0.1ms | | large (50x200) | 10,000 | 60ms | <0.1ms | | xlarge (100x200)| 20,000 | 160ms | <0.1ms | | huge (200x500) | 100,000 | 733ms | <0.1ms | The cost comes from `Data.__init__` iterating every row via `itertuples()` to build `list[DataRow]` (~35ms at 10k rows), then `Data.full_df` reconstructing the DataFrame via `from_records()` (~12ms) and running regex-based arm name parsing in `sort_by_trial_index_and_arm_name()` (~19ms) -- none of which is needed since we already had the DataFrame. Fix: change `_lookup_and_validate` to return `pd.DataFrame | None` directly. Update `ModelBasedEarlyStoppingStrategy` and `ThresholdEarlyStoppingStrategy` accordingly. End-to-end profiling of `should_stop_trials_early` at the 50x200 scale (10k rows) shows a ~19% speedup (323ms -> ~263ms), with the benefit growing at larger scales (up to ~41% at 100k rows). **4. Naming cleanup** Renamed methods and variables to disambiguate between `Data` objects and `pd.DataFrame`: - `_lookup_and_validate_data` -> `_lookup_and_validate` (returns `pd.DataFrame | None`) - `_prepare_aligned_data` -> `_prepare_aligned_frames` (returns a tuple of DataFrames) - Variables holding `_lookup_and_validate` results: `data`/`map_data`/`data_lookup` -> `df` - Parameter `aligned_data` -> `aligned_frames` **5. Tests** - Removed tests for moved/removed `align_partial_results` validations. - Added arm-to-trial uniqueness check as a subtest in `test_is_eligible`. - Added profiling notebook at `ax/early_stopping/profiling.ipynb`. Reviewed By: saitcakmak Differential Revision: D98544835
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Profiling-driven cleanup and performance improvements to the early stopping pipeline.
1. Remove redundant validations from
align_partial_results(utils.py)align_partial_resultscontained three validation/logging blocks that are redundant with upstream checks already performed by_lookup_and_validateinbase.py:raise ValueError:_lookup_and_validatealready verifies each metric signature exists in the data and returnsNonebeforealign_partial_resultsis ever called.is_eligible_anywhich already rejectsBatchTrial-- the only way a trial gets multiple arms.The arm-to-trial uniqueness check was moved (not removed) to
_lookup_and_validate, where it properly guards all downstream consumers, not just alignment.The
isinfilter (df = df[df['metric_signature'].isin(metrics)]) was kept -- it is part ofalign_partial_results's own contract since the function accepts ametricsargument and callers depend on it filtering to those metrics.After cleanup,
align_partial_resultsis a focused alignment function: filter to requested metrics -> droparm_name-> drop duplicates -> sort -> pivot -> interpolate.2. Eliminate double data alignment in
PercentileEarlyStoppingStrategy(percentile.py)When
check_safe=True, the base classshould_stop_trials_earlycalls_is_harmful()which calls_prepare_aligned_frames(), then_should_stop_trials_early()calls_prepare_aligned_frames()again -- identical data lookup + alignment running twice.Fix: override
should_stop_trials_earlyto call_default_objective_and_direction()and_prepare_aligned_frames()once, passing results as required arguments (metric_signature,minimize,aligned_frames) to both_is_harmfuland_should_stop_trials_early.3. Eliminate the
Dataround-trip in_lookup_and_validate(base.py, threshold.py)_lookup_and_validate(formerly_lookup_and_validate_data) used to wrap its filtered DataFrame back intoData(df=filtered_df)at the end, purely to satisfy itsData | Nonereturn type. The sole consumer (_prepare_aligned_frames) immediately called.full_dfto get the DataFrame back. This triggered a pointless df-to-DataRow-to-df round-trip on every call:Profiling shows this round-trip is 100% overhead at every scale:
The cost comes from
Data.__init__iterating every row viaitertuples()to buildlist[DataRow](~35ms at 10k rows), thenData.full_dfreconstructing the DataFrame viafrom_records()(~12ms) and running regex-based arm name parsing insort_by_trial_index_and_arm_name()(~19ms) -- none of which is needed since we already had the DataFrame.Fix: change
_lookup_and_validateto returnpd.DataFrame | Nonedirectly. UpdateModelBasedEarlyStoppingStrategyandThresholdEarlyStoppingStrategyaccordingly.End-to-end profiling of
should_stop_trials_earlyat the 50x200 scale (10k rows) shows a ~19% speedup (323ms -> ~263ms), with the benefit growing at larger scales (up to ~41% at 100k rows).4. Naming cleanup
Renamed methods and variables to disambiguate between
Dataobjects andpd.DataFrame:_lookup_and_validate_data->_lookup_and_validate(returnspd.DataFrame | None)_prepare_aligned_data->_prepare_aligned_frames(returns a tuple of DataFrames)_lookup_and_validateresults:data/map_data/data_lookup->dfaligned_data->aligned_frames5. Tests
align_partial_resultsvalidations.test_is_eligible.ax/early_stopping/profiling.ipynb.Reviewed By: saitcakmak
Differential Revision: D98544835