Skip to content

Implement stable session identifier headers for telemetry#295

Open
khanayan123 wants to merge 19 commits intomainfrom
ayan.khan/stable-session-id-headers
Open

Implement stable session identifier headers for telemetry#295
khanayan123 wants to merge 19 commits intomainfrom
ayan.khan/stable-session-id-headers

Conversation

@khanayan123
Copy link
Copy Markdown
Contributor

@khanayan123 khanayan123 commented Mar 20, 2026

Summary

Implements the Stable Service Instance Identifier RFC for the C++ SDK.

  • Adds DD-Session-ID (= runtime_id) and conditional DD-Root-Session-ID headers to all telemetry requests (app-started, heartbeats, app-closing, etc.)
  • DD-Root-Session-ID is only sent when it differs from DD-Session-ID (i.e., in child processes)
  • Root session ID is read from _DD_ROOT_CPP_SESSION_ID via the config registry (environment::lookup) or defaults to the current runtime_id
  • environment::set(_DD_ROOT_CPP_SESSION_ID, ...) is called at tracer init so exec'd children inherit it automatically (no-op if already set)
  • Fork propagation is automatic — memory survives fork, so the root session ID persists in child processes

Integration testing note

System-tests for cpp_nginx and cpp_httpd cannot validate session headers until the downstream modules (nginx-datadog, httpd-datadog) release a version that includes these dd-trace-cpp changes. The system-tests manifest entries remain missing_feature until then. Unit tests in this PR should validate the implementation directly.

Companion system-tests PR: DataDog/system-tests#6510

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 20, 2026

Benchmarks

Benchmark execution time: 2026-03-26 19:14:04

Comparing candidate commit f124ca3 in PR branch ayan.khan/stable-session-id-headers with baseline commit 910e3d5 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:BM_TraceTinyCCSource

  • 🟥 execution_time [+3.745ms; +3.941ms] or [+4.997%; +5.259%]

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 bot commented Mar 23, 2026

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 90.93% (+0.04%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: b8a48ff | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Adds DD-Session-ID and DD-Root-Session-ID HTTP headers to all telemetry
requests per the Stable Service Instance Identifier RFC.

- DD-Session-ID is always set to the tracer's runtime_id
- DD-Root-Session-ID is only set when it differs from the session ID
  (i.e. when running as a child process)
- Root session ID is propagated to exec'd children via
  _DD_ROOT_CPP_SESSION_ID env var, read in finalize_config()
- _DD_ROOT_CPP_SESSION_ID registered in the environment variable
  registry and supported-configurations.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@khanayan123 khanayan123 force-pushed the ayan.khan/stable-session-id-headers branch from 85d241a to e40dd5d Compare March 25, 2026 14:35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@khanayan123 khanayan123 marked this pull request as ready for review March 26, 2026 19:12
@khanayan123 khanayan123 requested review from a team as code owners March 26, 2026 19:12
@khanayan123 khanayan123 requested review from dmehala and removed request for a team March 26, 2026 19:12
Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com>
final_config.runtime_id = user_config.runtime_id;
}

if (auto val = lookup(environment::_DD_ROOT_CPP_SESSION_ID)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the lookup(environment::_DD_ROOT_CPP_SESSION_ID) call into the load_tracer_env_config function above, which will populate the env_config variable that is constantly referenced in this function

Copy link
Copy Markdown
Contributor Author

@khanayan123 khanayan123 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done moved the lookup call into load_tracer_env_config so it populates env_cfg alongside the other env lookups.

khanayan123 and others added 2 commits March 31, 2026 14:51
- Change _DD_ROOT_CPP_SESSION_ID default from nullptr to "" in
  environment.h
- Move _DD_ROOT_CPP_SESSION_ID lookup into load_tracer_env_config()
  so it follows the same pattern as other env var loading
- Add root_session_id field to TracerConfig struct
- Update supported-configurations.json to implementation B with
  default ""
- No changes to config-inversion tool

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Since _DD_ROOT_CPP_SESSION_ID now defaults to "" (not nullptr), the
config-inversion tool doesn't need the nullptr specialization. Revert
to original to_string_any and fix implementation marker to "A".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
return StringView{value};
}

void set(Variable variable, StringView value) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xlamorlette-datadog this feature requires that we set an environment variable, so that child processes will receive it. However, this means now our tracer is both reading (std::getenv) and writing (setenv) environment variables. Will we need to add a lock to both operations to ensure only one thread is getting/setting an environment variable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in our NGINX module, there are multiple worker threads handling requests. Each thread instantiates its own thread-local tracer instance to trace requests concurrently

Copy link
Copy Markdown
Collaborator

@xlamorlette-datadog xlamorlette-datadog Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can indeed have a race condition, on Windows, between the getenv() and putenv() calls.

Actually, I'm a bit confused: What is the purpose of the 'root session id'? For example, for Nginx, is it to identify an Nginx instance?
I need time to study the RFC.

Also, I think we should check the return values of putenv() and setenv().

…ndows setenv

- Revert whitespace-only changes (const char * style) in environment.cpp
  and environment.h to keep the diff focused on the feature.
- Extract set_session_headers() helper in telemetry_impl.cpp to avoid
  duplicating session header logic in app_started() and send_payload().
- Fix Windows _putenv_s path to check getenv() first, matching the
  POSIX setenv(..., 0) "don't overwrite" semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
baggage_extraction_enabled_(false),
tracing_enabled_(config.tracing_enabled),
resource_renaming_mode_(config.resource_renaming_mode) {
environment::set(environment::_DD_ROOT_CPP_SESSION_ID,
Copy link
Copy Markdown
Contributor

@zacharycmontoya zacharycmontoya Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there will be an edge case here. Since the C++ tracer doesn't have a singleton tracer instance, we may be creating one tracer instance per worker thread (e.g. web server) and if we do not assign a runtime ID at tracer construction time then each worker thread will get a unique runtime ID and (race condition) a unique root session ID. Other times we may get the same root session ID since the other thread could read the environment variable set by this line of code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, the race exists if multiple Tracers are constructed concurrently before any of them sets the env var.

Would you prefer we fix this with a std::call_once approach? Something like a get_or_init_root_session_id(fallback) function that atomically reads-or-sets the env var on first call, so all Tracers in the process get the same root session ID. That would also let us remove the config plumbing since the Tracer would resolve it directly.

Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com>
Copy link
Copy Markdown
Contributor

@zacharycmontoya zacharycmontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this change looks good. Before approving I'd like to get an opinion from another C++ dev on the thread-safety of the environment variable accesses. Once that's resolved, I think this PR is good to go

The suggestion to use client->clear() instead of
client->request_headers.items.clear() exposed that clear() only
reset request_body. Update it to also clear request_headers so the
heartbeat session-header test validates fresh headers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khanayan123
Copy link
Copy Markdown
Contributor Author

Overall this change looks good. Before approving I'd like to get an opinion from another C++ dev on the thread-safety of the environment variable accesses. Once that's resolved, I think this PR is good to go

@dmehala I was wondering if you had any thoughts?

Copy link
Copy Markdown
Collaborator

@xlamorlette-datadog xlamorlette-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review on-going…


// Set the specified environment `variable` to `value`. Does not overwrite if
// already set (equivalent to setenv(..., 0) on POSIX).
void set(Variable variable, StringView value);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could rename this function set_if_not_set().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Collaborator

@xlamorlette-datadog xlamorlette-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review still on-going…

khanayan123 and others added 6 commits April 2, 2026 10:53
- Rename environment::set to set_if_not_set for clarity
- Add config registry link and comment for _DD_ROOT_CPP_SESSION_ID
- Add backward-compatible 3-param TracerSignature constructor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…structor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

void set_session_headers(DictWriter& headers,
const tracing::TracerSignature& sig) {
headers.set("DD-Session-ID", sig.runtime_id.string());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, opt: store sig.runtime_id.string() in a variable to make intent clearer:

const std::string& session_id = sig.runtime_id.string();
headers.set("DD-Session-ID", session_id);
if (sig.root_session_id != session_id) {
  headers.set("DD-Root-Session-ID", sig.root_session_id);
}

@@ -302,13 +310,15 @@ void Telemetry::app_started() {
auto payload = app_started_payload();

auto on_headers = [payload_size = payload.size(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code lines 312-356 is very similar to the one lines 368-409.
The on_response handlers differ, and more counters are incremented in send_payload(), but I think that all what is in send_payload() is a more complete version of what is in app_started().
So I think that app_started() could be hugely simplified by simply calling send_payload() as follows:

void Telemetry::app_started() {
  send_payload("app-started", app_started_payload());
}

return StringView{value};
}

void set(Variable variable, StringView value) {
Copy link
Copy Markdown
Collaborator

@xlamorlette-datadog xlamorlette-datadog Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can indeed have a race condition, on Windows, between the getenv() and putenv() calls.

Actually, I'm a bit confused: What is the purpose of the 'root session id'? For example, for Nginx, is it to identify an Nginx instance?
I need time to study the RFC.

Also, I think we should check the return values of putenv() and setenv().

auto url = HTTPClient::URL::parse("http://localhost:8000");

SECTION("root process: DD-Session-ID present, DD-Root-Session-ID absent") {
auto rid = RuntimeID::generate();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggest renaming rid in session_rid (or even session_runtime_id), to distinguish it clearly from the root one (same below).


TracerSignature() = delete;
TracerSignature(RuntimeID id, std::string service, std::string environment)
TracerSignature(RuntimeID id, std::string root_session, std::string service,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the same names for the parameters than for the members (idruntime_id, root_sessionroot_session_id, servicedefault_service, environmentdefault_environment).


const TracerSignature signature(RuntimeID::generate(), "testsvc", "test");
auto rid = RuntimeID::generate();
const TracerSignature signature(rid, rid.string(), "testsvc", "test");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With the new three parameters constructor, you can go back to the previous version.


SECTION("root process: DD-Session-ID present, DD-Root-Session-ID absent") {
auto rid = RuntimeID::generate();
const TracerSignature sig{rid, rid.string(), "testsvc", "test"};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can use the three parameters constructor. Same below.

auto runtime_id = RuntimeID::generate();
const TracerSignature tracer_signature{
/* runtime_id = */ RuntimeID::generate(),
/* runtime_id = */ runtime_id,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With the new three parameters constructor, you can go back to the previous version. Same below.

- Rename constructor params to match member names
- Rename rid to session_rid in session header tests
- Use 3-param TracerSignature constructor where root_session == runtime_id

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants