Implement stable session identifier headers for telemetry#295
Implement stable session identifier headers for telemetry#295khanayan123 wants to merge 19 commits intomainfrom
Conversation
BenchmarksBenchmark execution time: 2026-03-26 19:14:04 Comparing candidate commit f124ca3 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 0 metrics, 0 unstable metrics.
|
|
🎯 Code Coverage (details) 🔗 Commit SHA: b8a48ff | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
a3b7684 to
0e65a27
Compare
Adds DD-Session-ID and DD-Root-Session-ID HTTP headers to all telemetry requests per the Stable Service Instance Identifier RFC. - DD-Session-ID is always set to the tracer's runtime_id - DD-Root-Session-ID is only set when it differs from the session ID (i.e. when running as a child process) - Root session ID is propagated to exec'd children via _DD_ROOT_CPP_SESSION_ID env var, read in finalize_config() - _DD_ROOT_CPP_SESSION_ID registered in the environment variable registry and supported-configurations.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
85d241a to
e40dd5d
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com>
src/datadog/tracer_config.cpp
Outdated
| final_config.runtime_id = user_config.runtime_id; | ||
| } | ||
|
|
||
| if (auto val = lookup(environment::_DD_ROOT_CPP_SESSION_ID)) { |
There was a problem hiding this comment.
Please move the lookup(environment::_DD_ROOT_CPP_SESSION_ID) call into the load_tracer_env_config function above, which will populate the env_config variable that is constantly referenced in this function
There was a problem hiding this comment.
Done moved the lookup call into load_tracer_env_config so it populates env_cfg alongside the other env lookups.
- Change _DD_ROOT_CPP_SESSION_ID default from nullptr to "" in environment.h - Move _DD_ROOT_CPP_SESSION_ID lookup into load_tracer_env_config() so it follows the same pattern as other env var loading - Add root_session_id field to TracerConfig struct - Update supported-configurations.json to implementation B with default "" - No changes to config-inversion tool Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Since _DD_ROOT_CPP_SESSION_ID now defaults to "" (not nullptr), the config-inversion tool doesn't need the nullptr specialization. Revert to original to_string_any and fix implementation marker to "A". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
src/datadog/environment.cpp
Outdated
| return StringView{value}; | ||
| } | ||
|
|
||
| void set(Variable variable, StringView value) { |
There was a problem hiding this comment.
@xlamorlette-datadog this feature requires that we set an environment variable, so that child processes will receive it. However, this means now our tracer is both reading (std::getenv) and writing (setenv) environment variables. Will we need to add a lock to both operations to ensure only one thread is getting/setting an environment variable?
There was a problem hiding this comment.
For example, in our NGINX module, there are multiple worker threads handling requests. Each thread instantiates its own thread-local tracer instance to trace requests concurrently
There was a problem hiding this comment.
I think we can indeed have a race condition, on Windows, between the getenv() and putenv() calls.
Actually, I'm a bit confused: What is the purpose of the 'root session id'? For example, for Nginx, is it to identify an Nginx instance?
I need time to study the RFC.
Also, I think we should check the return values of putenv() and setenv().
…ndows setenv - Revert whitespace-only changes (const char * style) in environment.cpp and environment.h to keep the diff focused on the feature. - Extract set_session_headers() helper in telemetry_impl.cpp to avoid duplicating session header logic in app_started() and send_payload(). - Fix Windows _putenv_s path to check getenv() first, matching the POSIX setenv(..., 0) "don't overwrite" semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
src/datadog/tracer.cpp
Outdated
| baggage_extraction_enabled_(false), | ||
| tracing_enabled_(config.tracing_enabled), | ||
| resource_renaming_mode_(config.resource_renaming_mode) { | ||
| environment::set(environment::_DD_ROOT_CPP_SESSION_ID, |
There was a problem hiding this comment.
FYI there will be an edge case here. Since the C++ tracer doesn't have a singleton tracer instance, we may be creating one tracer instance per worker thread (e.g. web server) and if we do not assign a runtime ID at tracer construction time then each worker thread will get a unique runtime ID and (race condition) a unique root session ID. Other times we may get the same root session ID since the other thread could read the environment variable set by this line of code
There was a problem hiding this comment.
Good call, the race exists if multiple Tracers are constructed concurrently before any of them sets the env var.
Would you prefer we fix this with a std::call_once approach? Something like a get_or_init_root_session_id(fallback) function that atomically reads-or-sets the env var on first call, so all Tracers in the process get the same root session ID. That would also let us remove the config plumbing since the Tracer would resolve it directly.
Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com>
zacharycmontoya
left a comment
There was a problem hiding this comment.
Overall this change looks good. Before approving I'd like to get an opinion from another C++ dev on the thread-safety of the environment variable accesses. Once that's resolved, I think this PR is good to go
The suggestion to use client->clear() instead of client->request_headers.items.clear() exposed that clear() only reset request_body. Update it to also clear request_headers so the heartbeat session-header test validates fresh headers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dmehala I was wondering if you had any thoughts? |
xlamorlette-datadog
left a comment
There was a problem hiding this comment.
Review on-going…
include/datadog/environment.h
Outdated
|
|
||
| // Set the specified environment `variable` to `value`. Does not overwrite if | ||
| // already set (equivalent to setenv(..., 0) on POSIX). | ||
| void set(Variable variable, StringView value); |
There was a problem hiding this comment.
I think we could rename this function set_if_not_set().
xlamorlette-datadog
left a comment
There was a problem hiding this comment.
Review still on-going…
- Rename environment::set to set_if_not_set for clarity - Add config registry link and comment for _DD_ROOT_CPP_SESSION_ID - Add backward-compatible 3-param TracerSignature constructor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…structor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| void set_session_headers(DictWriter& headers, | ||
| const tracing::TracerSignature& sig) { | ||
| headers.set("DD-Session-ID", sig.runtime_id.string()); |
There was a problem hiding this comment.
nit, opt: store sig.runtime_id.string() in a variable to make intent clearer:
const std::string& session_id = sig.runtime_id.string();
headers.set("DD-Session-ID", session_id);
if (sig.root_session_id != session_id) {
headers.set("DD-Root-Session-ID", sig.root_session_id);
}
| @@ -302,13 +310,15 @@ void Telemetry::app_started() { | |||
| auto payload = app_started_payload(); | |||
|
|
|||
| auto on_headers = [payload_size = payload.size(), | |||
There was a problem hiding this comment.
The code lines 312-356 is very similar to the one lines 368-409.
The on_response handlers differ, and more counters are incremented in send_payload(), but I think that all what is in send_payload() is a more complete version of what is in app_started().
So I think that app_started() could be hugely simplified by simply calling send_payload() as follows:
void Telemetry::app_started() {
send_payload("app-started", app_started_payload());
}
src/datadog/environment.cpp
Outdated
| return StringView{value}; | ||
| } | ||
|
|
||
| void set(Variable variable, StringView value) { |
There was a problem hiding this comment.
I think we can indeed have a race condition, on Windows, between the getenv() and putenv() calls.
Actually, I'm a bit confused: What is the purpose of the 'root session id'? For example, for Nginx, is it to identify an Nginx instance?
I need time to study the RFC.
Also, I think we should check the return values of putenv() and setenv().
test/telemetry/test_telemetry.cpp
Outdated
| auto url = HTTPClient::URL::parse("http://localhost:8000"); | ||
|
|
||
| SECTION("root process: DD-Session-ID present, DD-Root-Session-ID absent") { | ||
| auto rid = RuntimeID::generate(); |
There was a problem hiding this comment.
nit: I suggest renaming rid in session_rid (or even session_runtime_id), to distinguish it clearly from the root one (same below).
include/datadog/tracer_signature.h
Outdated
|
|
||
| TracerSignature() = delete; | ||
| TracerSignature(RuntimeID id, std::string service, std::string environment) | ||
| TracerSignature(RuntimeID id, std::string root_session, std::string service, |
There was a problem hiding this comment.
nit: use the same names for the parameters than for the members (id → runtime_id, root_session → root_session_id, service → default_service, environment → default_environment).
test/test_datadog_agent.cpp
Outdated
|
|
||
| const TracerSignature signature(RuntimeID::generate(), "testsvc", "test"); | ||
| auto rid = RuntimeID::generate(); | ||
| const TracerSignature signature(rid, rid.string(), "testsvc", "test"); |
There was a problem hiding this comment.
nit: With the new three parameters constructor, you can go back to the previous version.
test/telemetry/test_telemetry.cpp
Outdated
|
|
||
| SECTION("root process: DD-Session-ID present, DD-Root-Session-ID absent") { | ||
| auto rid = RuntimeID::generate(); | ||
| const TracerSignature sig{rid, rid.string(), "testsvc", "test"}; |
There was a problem hiding this comment.
nit: You can use the three parameters constructor. Same below.
| auto runtime_id = RuntimeID::generate(); | ||
| const TracerSignature tracer_signature{ | ||
| /* runtime_id = */ RuntimeID::generate(), | ||
| /* runtime_id = */ runtime_id, |
There was a problem hiding this comment.
nit: With the new three parameters constructor, you can go back to the previous version. Same below.
- Rename constructor params to match member names - Rename rid to session_rid in session header tests - Use 3-param TracerSignature constructor where root_session == runtime_id Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Implements the Stable Service Instance Identifier RFC for the C++ SDK.
DD-Session-ID(=runtime_id) and conditionalDD-Root-Session-IDheaders to all telemetry requests (app-started, heartbeats, app-closing, etc.)DD-Root-Session-IDis only sent when it differs fromDD-Session-ID(i.e., in child processes)_DD_ROOT_CPP_SESSION_IDvia the config registry (environment::lookup) or defaults to the currentruntime_idenvironment::set(_DD_ROOT_CPP_SESSION_ID, ...)is called at tracer init so exec'd children inherit it automatically (no-op if already set)Integration testing note
System-tests for
cpp_nginxandcpp_httpdcannot validate session headers until the downstream modules (nginx-datadog, httpd-datadog) release a version that includes these dd-trace-cpp changes. The system-tests manifest entries remainmissing_featureuntil then. Unit tests in this PR should validate the implementation directly.Companion system-tests PR: DataDog/system-tests#6510