Reduce microbenchmark runtime and fix pr-comment retry behavior#5481
Reduce microbenchmark runtime and fix pr-comment retry behavior#5481
Conversation
- Reduce REPETITIONS from 6 to 4 in benchmarks/execution.yml - Change microbenchmarks-pr-comment to when: on_success + allow_failure: true Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 920366b | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
BenchmarksBenchmark execution time: 2026-03-19 20:05:05 Comparing candidate commit 920366b in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.
|
What does this PR do?
Two targeted improvements to microbenchmark CI reliability. Full investigation at https://github.com/p-datadog/datadog-docs/blob/master/microbenchmarks-ci-investigation.md.
1. Reduce REPETITIONS from 6 to 4 (
benchmarks/execution.yml)The
[other]benchmark group (error_tracking, tracing, DI, gem loading) currently takes ~34 min per run. The cost model is1.5 + 5.33 × REPETITIONS, so reducing repetitions has a meaningful impact. At REPETITIONS=4 the[other]job runs in ~23 min — 32% faster and still 7 min under the old pre-parallelization single-job runtime of 30 min.PR #5313 (which introduced parallelization) validated stability at 6 and 10 reps with CPU isolation, showing 0/46 unstable metrics. No data exists at 4 reps, but CPU isolation independently reduces variance and is unchanged here. This PR's own benchmark report serves as the stability validation — if unstable metrics appear we can adjust before merging. Reducing further to 3 (18 min) is worth considering if the report looks clean.
Shorter runs also reduce the window during which a job can be cancelled by a new push on active PR branches.
2. Improve
microbenchmarks-pr-commentretry behavior (.gitlab/benchmarks.yml)Changed
when: always→when: on_successand addedallow_failure: true.With
when: always, if an upstream benchmark job is cancelled,pr-commentstill attempts to run with partial/missing artifacts and fails — resulting in two jobs needing attention instead of one. It's not immediately obvious which job to retry first (the upstream benchmark, not the downstream comment job).With
when: on_success:pr-commentis skipped (not failed), making it clear which job needs a retrypr-commentauto-triggers vianeeds:— no manual retry needed (behavior confirmed in pipeline 102673493)allow_failure: trueensures a comment-posting issue (network blip, bp-runner bug) never blocks the pipeline3. Per-benchmark completion timestamps (DataDog/benchmarking-platform#249)
Companion PR in benchmarking-platform adds a completion timestamp log line after each parallel benchmark finishes. This will help identify which of the 4
[other]benchmarks takes the longest — useful context for any future splitting or rebalancing of the group.Motivation:
Microbenchmarks are currently the longest-running CI job (~34 min) and run on every PR regardless of what changed. On active branches, they can be cancelled mid-run due to
interruptible: true, which sometimes leads to a multi-step retry process. These two changes reduce runtime and simplify recovery when cancellations do happen.Change log entry
None.
How to test the change?
The benchmark report from this PR's CI run is the stability validation for REPETITIONS=4. Check the report for "unstable metrics" — a no-change comparison should show 0 improvements, 0 regressions, 0 unstable metrics. The
pr-commentbehavior change can be verified by observing that a cancelled[other]job leavespr-commentinskippedstate rather thanfailed/cancelled.