feat(ci): dynamic test duration tracking for pytest-split#1151
feat(ci): dynamic test duration tracking for pytest-split#1151
Conversation
Record per-test durations from every CI run and cache them so subsequent runs use the freshest data for splitting. The committed .test_durations remains as a fallback when no cache exists. - Add pytest hooks in test/conftest.py to record durations per group - Update scripts/ci.sh to prefer cached durations over committed baseline - Add cache restore, artifact upload, and consolidate-durations job to the GitHub Actions workflow
There was a problem hiding this comment.
Pull request overview
This PR adds CI-driven feedback loops for pytest-split by collecting per-test durations during CI runs, consolidating them across split groups, and caching the merged result so future CI runs can split tests using the most recent timings.
Changes:
- Add pytest session hooks to record durations into per-group
.test_durations_group_*files (CI-only viaGROUP). - Update
scripts/ci.shto prefer a cached durations file (.test_durations_cached) over the committed baseline (.test_durations). - Extend the GitHub Actions workflow to restore/save the cache and add a consolidation job that merges per-group artifacts into the cached durations file.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
test/conftest.py |
Adds pytest hooks to collect and write per-test duration JSON for the current CI split group. |
scripts/ci.sh |
Uses a cached durations file when available and passes it to pytest-split via --durations-path. |
.github/workflows/run-tests.yaml |
Restores durations cache for test jobs, uploads per-group duration artifacts, and consolidates/saves merged durations back to cache. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def pytest_runtest_logreport(report): | ||
| if report.when == "call": | ||
| _test_durations[report.nodeid] = round(report.duration, 2) |
There was a problem hiding this comment.
pytest_runtest_logreport records durations for every local pytest run, even though the PR description says this should be gated on GROUP (CI-only). Consider checking os.environ.get("GROUP") (or a config flag) before updating _test_durations so local runs don't incur the hook overhead or generate unexpected in-memory state.
| def pytest_runtest_logreport(report): | ||
| if report.when == "call": | ||
| _test_durations[report.nodeid] = round(report.duration, 2) |
There was a problem hiding this comment.
The durations you persist are only for the call phase (report.when == "call"), but the PR description calls these "per-test wall-clock durations". For tests with expensive fixture setup/teardown (or skips/errors during setup), this will significantly undercount and can lead to poorer splitting; consider accumulating setup + call + teardown durations per nodeid (or otherwise clarifying/aligning what "wall-clock" means here).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1151 +/- ##
=======================================
Coverage 56.75% 56.75%
=======================================
Files 40 40
Lines 3908 3908
=======================================
Hits 2218 2218
Misses 1690 1690 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
upload-artifact@v4 defaults to include-hidden-files: false, which silently excludes .test_durations_group_* files. Also removes debug logging from conftest.
Summary
test/conftest.pyto record per-test wall-clock durations, gated onGROUPenv var (CI only)scripts/ci.shto prefer cached durations over the committed.test_durationsbaselineconsolidate-durationsjob to.github/workflows/run-tests.yamlthat merges per-group artifacts into a single cached fileHow it works
Per-run data flow:
Fallback chain (priority order):
master.test_durationsfile (unchanged, serves as baseline)Test plan
pytest test/test_dataclass_validations.py -vwithGROUP=1— verified.test_durations_group_1is created with correct node IDs matching.test_durationsformat