Clean up align_partial_results, eliminate double alignment, and remove Data round-trip by ltiao · Pull Request #5110 · facebook/Ax

ltiao · 2026-03-30T21:20:15Z

Summary:
Profiling-driven cleanup and performance improvements to the early stopping pipeline.

1. Remove redundant validations from align_partial_results (utils.py)

align_partial_results contained three validation/logging blocks that are redundant with upstream checks already performed by _lookup_and_validate in base.py:

Missing metrics check + raise ValueError: _lookup_and_validate already verifies each metric signature exists in the data and returns None before align_partial_results is ever called.
Per-metric logging loop: The "no data" branch is unreachable because the upstream checks guarantee data exists for each metric. Debug logging about MAP_KEY ranges is misplaced for a pure alignment function.
Trial-to-arm uniqueness check: Redundant with is_eligible_any which already rejects BatchTrial -- the only way a trial gets multiple arms.

The arm-to-trial uniqueness check was moved (not removed) to _lookup_and_validate, where it properly guards all downstream consumers, not just alignment.

The isin filter (df = df[df['metric_signature'].isin(metrics)]) was kept -- it is part of align_partial_results's own contract since the function accepts a metrics argument and callers depend on it filtering to those metrics.

After cleanup, align_partial_results is a focused alignment function: filter to requested metrics -> drop arm_name -> drop duplicates -> sort -> pivot -> interpolate.

2. Eliminate double data alignment in PercentileEarlyStoppingStrategy (percentile.py)

When check_safe=True, the base class should_stop_trials_early calls _is_harmful() which calls _prepare_aligned_frames(), then _should_stop_trials_early() calls _prepare_aligned_frames() again -- identical data lookup + alignment running twice.

Fix: override should_stop_trials_early to call _default_objective_and_direction() and _prepare_aligned_frames() once, passing results as required arguments (metric_signature, minimize, aligned_frames) to both _is_harmful and _should_stop_trials_early.

3. Eliminate the Data round-trip in _lookup_and_validate (base.py, threshold.py)

_lookup_and_validate (formerly _lookup_and_validate_data) used to wrap its filtered DataFrame back into Data(df=filtered_df) at the end, purely to satisfy its Data | None return type. The sole consumer (_prepare_aligned_frames) immediately called .full_df to get the DataFrame back. This triggered a pointless df-to-DataRow-to-df round-trip on every call:

DataFrame -> itertuples -> list[DataRow] -> from_records -> regex sort -> cast -> DataFrame

Profiling shows this round-trip is 100% overhead at every scale:

Scale	Rows	Round-trip cost	Replacement (df.copy)
tiny (5x10)	50	4ms	<0.1ms
typical (20x100)	2,000	15ms	<0.1ms
large (50x200)	10,000	60ms	<0.1ms
xlarge (100x200)	20,000	160ms	<0.1ms
huge (200x500)	100,000	733ms	<0.1ms

The cost comes from Data.__init__ iterating every row via itertuples() to build list[DataRow] (~35ms at 10k rows), then Data.full_df reconstructing the DataFrame via from_records() (~12ms) and running regex-based arm name parsing in sort_by_trial_index_and_arm_name() (~19ms) -- none of which is needed since we already had the DataFrame.

Fix: change _lookup_and_validate to return pd.DataFrame | None directly. Update ModelBasedEarlyStoppingStrategy and ThresholdEarlyStoppingStrategy accordingly.

End-to-end profiling of should_stop_trials_early at the 50x200 scale (10k rows) shows a ~19% speedup (323ms -> ~263ms), with the benefit growing at larger scales (up to ~41% at 100k rows).

4. Naming cleanup

Renamed methods and variables to disambiguate between Data objects and pd.DataFrame:

_lookup_and_validate_data -> _lookup_and_validate (returns pd.DataFrame | None)
_prepare_aligned_data -> _prepare_aligned_frames (returns a tuple of DataFrames)
Variables holding _lookup_and_validate results: data/map_data/data_lookup -> df
Parameter aligned_data -> aligned_frames

5. Tests

Removed tests for moved/removed align_partial_results validations.
Added arm-to-trial uniqueness check as a subtest in test_is_eligible.
Added profiling notebook at ax/early_stopping/profiling.ipynb.

Reviewed By: saitcakmak

Differential Revision: D98544835

meta-codesync · 2026-03-30T21:20:23Z

@ltiao has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98544835.

codecov-commenter · 2026-03-30T21:55:46Z

Codecov Report

❌ Patch coverage is 90.76923% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.40%. Comparing base (c08c4a9) to head (c6a49ee).

Files with missing lines	Patch %	Lines
ax/early_stopping/strategies/base.py	73.33%	4 Missing ⚠️
ax/early_stopping/strategies/percentile.py	80.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5110   +/-   ##
=======================================
  Coverage   96.39%   96.40%           
=======================================
  Files         613      613           
  Lines       68345    68322   -23     
=======================================
- Hits        65880    65863   -17     
+ Misses       2465     2459    -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…e Data round-trip Summary: Profiling-driven cleanup and performance improvements to the early stopping pipeline. **1. Remove redundant validations from `align_partial_results` (utils.py)** `align_partial_results` contained three validation/logging blocks that are redundant with upstream checks already performed by `_lookup_and_validate` in `base.py`: - **Missing metrics check + `raise ValueError`:** `_lookup_and_validate` already verifies each metric signature exists in the data and returns `None` before `align_partial_results` is ever called. - **Per-metric logging loop:** The "no data" branch is unreachable because the upstream checks guarantee data exists for each metric. Debug logging about MAP_KEY ranges is misplaced for a pure alignment function. - **Trial-to-arm uniqueness check:** Redundant with `is_eligible_any` which already rejects `BatchTrial` -- the only way a trial gets multiple arms. The arm-to-trial uniqueness check was moved (not removed) to `_lookup_and_validate`, where it properly guards all downstream consumers, not just alignment. The `isin` filter (`df = df[df['metric_signature'].isin(metrics)]`) was kept -- it is part of `align_partial_results`'s own contract since the function accepts a `metrics` argument and callers depend on it filtering to those metrics. After cleanup, `align_partial_results` is a focused alignment function: filter to requested metrics -> drop `arm_name` -> drop duplicates -> sort -> pivot -> interpolate. **2. Eliminate double data alignment in `PercentileEarlyStoppingStrategy` (percentile.py)** When `check_safe=True`, the base class `should_stop_trials_early` calls `_is_harmful()` which calls `_prepare_aligned_frames()`, then `_should_stop_trials_early()` calls `_prepare_aligned_frames()` again -- identical data lookup + alignment running twice. Fix: override `should_stop_trials_early` to call `_default_objective_and_direction()` and `_prepare_aligned_frames()` once, passing results as required arguments (`metric_signature`, `minimize`, `aligned_frames`) to both `_is_harmful` and `_should_stop_trials_early`. **3. Eliminate the `Data` round-trip in `_lookup_and_validate` (base.py, threshold.py)** `_lookup_and_validate` (formerly `_lookup_and_validate_data`) used to wrap its filtered DataFrame back into `Data(df=filtered_df)` at the end, purely to satisfy its `Data | None` return type. The sole consumer (`_prepare_aligned_frames`) immediately called `.full_df` to get the DataFrame back. This triggered a pointless df-to-DataRow-to-df round-trip on every call: ``` DataFrame -> itertuples -> list[DataRow] -> from_records -> regex sort -> cast -> DataFrame ``` Profiling shows this round-trip is **100% overhead** at every scale: | Scale | Rows | Round-trip cost | Replacement (df.copy) | |-----------------|---------|----------------|-----------------------| | tiny (5x10) | 50 | 4ms | <0.1ms | | typical (20x100)| 2,000 | 15ms | <0.1ms | | large (50x200) | 10,000 | 60ms | <0.1ms | | xlarge (100x200)| 20,000 | 160ms | <0.1ms | | huge (200x500) | 100,000 | 733ms | <0.1ms | The cost comes from `Data.__init__` iterating every row via `itertuples()` to build `list[DataRow]` (~35ms at 10k rows), then `Data.full_df` reconstructing the DataFrame via `from_records()` (~12ms) and running regex-based arm name parsing in `sort_by_trial_index_and_arm_name()` (~19ms) -- none of which is needed since we already had the DataFrame. Fix: change `_lookup_and_validate` to return `pd.DataFrame | None` directly. Update `ModelBasedEarlyStoppingStrategy` and `ThresholdEarlyStoppingStrategy` accordingly. End-to-end profiling of `should_stop_trials_early` at the 50x200 scale (10k rows) shows a ~19% speedup (323ms -> ~263ms), with the benefit growing at larger scales (up to ~41% at 100k rows). **4. Naming cleanup** Renamed methods and variables to disambiguate between `Data` objects and `pd.DataFrame`: - `_lookup_and_validate_data` -> `_lookup_and_validate` (returns `pd.DataFrame | None`) - `_prepare_aligned_data` -> `_prepare_aligned_frames` (returns a tuple of DataFrames) - Variables holding `_lookup_and_validate` results: `data`/`map_data`/`data_lookup` -> `df` - Parameter `aligned_data` -> `aligned_frames` **5. Tests** - Removed tests for moved/removed `align_partial_results` validations. - Added arm-to-trial uniqueness check as a subtest in `test_is_eligible`. - Added profiling notebook at `ax/early_stopping/profiling.ipynb`. Reviewed By: saitcakmak Differential Revision: D98544835

meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 30, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 30, 2026

ltiao force-pushed the export-D98544835 branch from a01af83 to fb5afd8 Compare March 31, 2026 16:04

ltiao force-pushed the export-D98544835 branch from fb5afd8 to c6a49ee Compare April 6, 2026 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up align_partial_results, eliminate double alignment, and remove Data round-trip#5110

Clean up align_partial_results, eliminate double alignment, and remove Data round-trip#5110
ltiao wants to merge 1 commit intofacebook:mainfrom
ltiao:export-D98544835

ltiao commented Mar 30, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Mar 30, 2026

Uh oh!

codecov-commenter commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ltiao commented Mar 30, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Mar 30, 2026

Uh oh!

codecov-commenter commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ltiao commented Mar 30, 2026 •

edited by meta-codesync bot

Loading

codecov-commenter commented Mar 30, 2026 •

edited

Loading