Fix flaky CpuAndWallTimeWorker sampling test on macOS by p-datadog · Pull Request #5482 · DataDog/dd-trace-rb

p-datadog · 2026-03-19T20:03:36Z

What does this PR do?

Fixes a flaky profiling test by increasing the sampling window from 100ms to 200ms.

Motivation:

The test CpuAndWallTimeWorker#start when main thread is sleeping but a background thread is working failed on Test (macos-15, 3.0) in PR #5481 with sample_count: 4 (threshold: 5). The profiler only managed 4 trigger_sample_attempts in the 100ms window due to startup overhead on macOS ARM64 + Ruby 3.0 runners.

How I Reproduced the Issue

The CI failure on PR #5481 shows the test got exactly 4 samples in 100ms instead of the required 5. The stats (trigger_sample_attempts=>4) confirm the profiler didn't even attempt enough samples — it's not a signal delivery issue, but insufficient time.

Root Cause

The test sleeps for only 100ms and expects ≥5 samples at a target rate of 100 samples/sec. On macOS-15 ARM64 CI runners with Ruby 3.0, profiler startup overhead and thread scheduling variability reduce the effective sampling window below what's needed for 5 samples. The margin between expected (10 samples) and threshold (5) is too thin for this environment.

Fix

Increase sleep from 0.1s to 0.2s, matching the duration used by similar tests in the same file (lines 474 and 605). At 100 samples/sec, 200ms gives ~20 expected samples — well above the threshold of 5.

Change log entry

None.

How to test the change?

CI should pass on Test (macos-15, 3.0) which was previously failing.

Root cause: The test sleeps for only 100ms and expects ≥5 samples at 100 samples/sec. On macOS-15 ARM64 + Ruby 3.0 CI runners, profiler startup overhead and thread scheduling variability reduce the effective sampling window, resulting in only 4 samples (just below the threshold). Increase sleep from 0.1s to 0.2s to provide more margin for sample collection, matching the duration used by similar tests in the same file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

datadog-official · 2026-03-19T20:22:44Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 95.13% (+0.00%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: c01965c | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

pr-commenter · 2026-03-19T20:46:31Z

Benchmarks

Benchmark execution time: 2026-03-19 20:46:29

Comparing candidate commit c01965c in PR branch investigate/flaky-cpu-wall-time-worker-spec with baseline commit e87e284 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

p-datadog added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Mar 19, 2026

github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky CpuAndWallTimeWorker sampling test on macOS#5482

Fix flaky CpuAndWallTimeWorker sampling test on macOS#5482
p-datadog wants to merge 1 commit intomasterfrom
investigate/flaky-cpu-wall-time-worker-spec

p-datadog commented Mar 19, 2026

Uh oh!

datadog-official bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Mar 19, 2026

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

p-datadog commented Mar 19, 2026

How I Reproduced the Issue

Root Cause

Fix

Uh oh!

datadog-official bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Mar 19, 2026

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

datadog-official bot commented Mar 19, 2026 •

edited

Loading