Skip to content

fix: improve Chrome trace event format correctness and metric naming#3688

Draft
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:improve-tracing
Draft

fix: improve Chrome trace event format correctness and metric naming#3688
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:improve-tracing

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

Improve tracing feature.

What changes are included in this PR?

  • Write memory counter values in bytes instead of MB, with _bytes suffix on the args key, so trace viewers display correct values
  • Include the process ID in the trace filename (comet-event-trace-{pid}.json) so each executor writes its own file instead of corrupting a shared one
  • Use the actual process ID in trace events instead of hardcoded 1
  • Replace .expect() panics with graceful error handling: the writer becomes None on failure so tracing silently disables itself
  • Add Drop impl to flush the BufWriter on shutdown so buffered events are not lost at process exit
  • Replace fragile ThreadId debug-string parsing with ThreadId::as_u64().get() (stable since Rust 1.74)
  • Rename CometUnsafeShuffleWriter trace event to snake_case comet_unsafe_shuffle_writer for consistency
  • Fix "comet_shuffle_" metric name (trailing underscore, no thread ID) in CometBypassMergeSortShuffleWriter to match the pattern used in CometUnsafeShuffleWriter
  • Rename jvm_heapUsed metric to jvm_heap_used (snake_case)
  • Emit shuffle_spilled_bytes counter event after each shuffle spill

How are these changes tested?

- Write memory counter values in bytes instead of MB, with `_bytes`
  suffix on the args key, so trace viewers display correct values
- Include the process ID in the trace filename
  (`comet-event-trace-{pid}.json`) so each executor writes its own
  file instead of corrupting a shared one
- Use the actual process ID in trace events instead of hardcoded `1`
- Replace `.expect()` panics with graceful error handling: the writer
  becomes `None` on failure so tracing silently disables itself
- Add `Drop` impl to flush the `BufWriter` on shutdown so buffered
  events are not lost at process exit
- Replace fragile `ThreadId` debug-string parsing with
  `ThreadId::as_u64().get()` (stable since Rust 1.74)
- Rename `CometUnsafeShuffleWriter` trace event to snake_case
  `comet_unsafe_shuffle_writer` for consistency
- Fix `"comet_shuffle_"` metric name (trailing underscore, no thread
  ID) in `CometBypassMergeSortShuffleWriter` to match the pattern
  used in `CometUnsafeShuffleWriter`
- Rename `jvm_heapUsed` metric to `jvm_heap_used` (snake_case)
- Emit `shuffle_spilled_bytes` counter event after each shuffle spill
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant