-
Notifications
You must be signed in to change notification settings - Fork 172
trace2: add macOS and Windows process ancestry tracing #2040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In 353d3d7 (trace2: collect Windows-specific process information) Windows-specific process ancestry information was added as a data_json event to TRACE2. Furthermore in 2f732bf (tr2: log parent process name) similar functionality was added for Linux-based systems, using procfs. Teach Git to also log process ancestry on macOS using the sysctl with KERN_PROC to get process information (PPID and process name). Like the Linux implementation, we use the cmd_ancestry TRACE2 event rather than using a data_json event and creating another custom data point. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Include an implementation of trace2_collect_process_info for macOS. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
In 353d3d7 (trace2: collect Windows-specific process information) we added process ancestry information for Windows to TRACE2 via a data_json event. It was only later in 2f732bf (tr2: log parent process name) that the specific cmd_ancestry event was added to TRACE2. In a future commit we will emit the ancestry information with the newer cmd_ancestry TRACE2 event. Right now, we rework this implementation of trace2_collect_process_info to separate the calculation of ancestors from building and emiting the JSON array via a data_json event. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Since 2f732bf (tr2: log parent process name) it is now possible to emit a specific process ancestry event in TRACE2. We should emit the Windows process ancestry data with the correct event type. To not break existing consumers of the data_json "windows/ancestry" event, we continue to emit the ancestry data as a JSON event. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
|
/preview |
|
Preview email sent as pull.2040.git.1770121400.gitgitgadget@gmail.com |
|
@mjcheetham don't forget to |
|
/preview |
|
Preview email sent as pull.2040.git.1770202075.gitgitgadget@gmail.com |
dscho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I cannot speak for the correctness of the macOS-specific code, of course, but then, I don't think that any reviewer on the Git mailing list will go into that much depth (it's not Linux, after all...).
|
/submit |
|
Submitted as pull.2040.git.1770307510.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
|
||
| switch (reason) { | ||
| case TRACE2_PROCESS_INFO_STARTUP: | ||
| get_is_being_debugged(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kristoffer Haugsbakk" wrote on the Git mailing list (how to reply to this email):
On Thu, Feb 5, 2026, at 17:05, Matthew John Cheetham via GitGitGadget wrote:
> From: Matthew John Cheetham <mjcheetham@outlook.com>
>
> Since 2f732bf1 (tr2: log parent process name) it is now possible to emit
The usual way to refer to commits is to use `git show -s
--pretty=reference`. (Or maybe with `--abbrev=8` as well
which seems to be the case here.) That also adds the date.
See `SubmittingPatches`.
> a specific process ancestry event in TRACE2. We should emit the Windows
> process ancestry data with the correct event type.
>
> To not break existing consumers of the data_json "windows/ancestry"
> event, we continue to emit the ancestry data as a JSON event.
>
> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> ---
>[snip]|
User |
|
This patch series was integrated into seen via git@2e0d24d. |
|
This branch is now known as |
|
This patch series was integrated into seen via git@b5a6e6b. |
|
There was a status update in the "New Topics" section about the branch Add process ancestry data to trace2 on macOS to match what we already do on Linux and Windows. Also adjust the way Windows implementation reports this information to match the other two. Needs review. source: <pull.2040.git.1770307510.gitgitgadget@gmail.com> |
|
This patch series was integrated into seen via git@bbbc95e. |
|
This patch series was integrated into seen via git@51bc36f. |
|
There was a status update in the "Cooking" section about the branch Add process ancestry data to trace2 on macOS to match what we already do on Linux and Windows. Also adjust the way Windows implementation reports this information to match the other two. Needs review. source: <pull.2040.git.1770307510.gitgitgadget@gmail.com> |
|
This patch series was integrated into seen via git@f36c2d1. |
|
This patch series was integrated into seen via git@709e782. |
| @@ -0,0 +1,99 @@ | |||
| #define USE_THE_REPOSITORY_VARIABLE | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derrick Stolee wrote on the Git mailing list (how to reply to this email):
On 2/5/2026 11:05 AM, Matthew John Cheetham via GitGitGadget wrote:
> Teach Git to also log process ancestry on macOS using the sysctl with
> KERN_PROC to get process information (PPID and process name).
> Like the Linux implementation, we use the cmd_ancestry TRACE2 event
> rather than using a data_json event and creating another custom data
> point.
> +#define USE_THE_REPOSITORY_VARIABLE
If we are creating a new file, then it would be best if we avoid this
macro, which is intended for older code to still work until it can be
fixed.
But also it seems that you don't use the_repository anywhere, so this
can be deleted without consequence!
> +/*
> + * Recursively push process names onto the ancestry array.
> + * We guard against cycles by limiting the depth to NR_PIDS_LIMIT.
> + */
> +static void push_ancestry_name(struct strvec *names, pid_t pid, int depth)
> +{
> + struct strbuf name = STRBUF_INIT;
> + pid_t ppid;
> +
> + if (depth >= NR_PIDS_LIMIT)
> + return;
Here is the recursion limit check.
> + if (pid <= 0)
> + return;
> +
> + if (get_proc_info(pid, &name, &ppid) < 0)
> + goto cleanup;
> +
> + strvec_push(names, name.buf);
This is copying the buffer, which is why you release it later.
Question: could we stop copying here and use strbuf_detach() at this
point? That would be a very minor improvement, so feel free to ignore!
I took a look and rediscovered that strvecs do not have an option to not
copy. I'm thinking about string_list. I'm not sure if there is any value
in converting your code just to avoid some string duplication at this
scale.
> + /*
> + * Recurse to the parent process. Stop if ppid is 0 or 1
> + * (init/launchd) or if we've reached ourselves (cycle).
> + */
> + if (ppid > 1 && ppid != pid)
> + push_ancestry_name(names, ppid, depth + 1);
This kind of tail recursion could be easily converted into a loop. I
usually prefer loops to recursion when possible, in case we want to allow
an unlimited number of parents in the future.
> +cleanup:
> + strbuf_release(&name);
> +}
I got a little confused by the lack of a .h file, but that's probably due
to the extra magic being done at compile time to pick this file on a per-
platform basis.
Indeed, trace2_collect_process_info() is defined in trace2.h.
Thanks,
-StoleeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matthew John Cheetham wrote on the Git mailing list (how to reply to this email):
On 09/02/2026 14:36, Derrick Stolee wrote:
> On 2/5/2026 11:05 AM, Matthew John Cheetham via GitGitGadget wrote:
>> Teach Git to also log process ancestry on macOS using the sysctl with
>> KERN_PROC to get process information (PPID and process name).
>> Like the Linux implementation, we use the cmd_ancestry TRACE2 event
>> rather than using a data_json event and creating another custom data
>> point.
> >> +#define USE_THE_REPOSITORY_VARIABLE
> > If we are creating a new file, then it would be best if we avoid this
> macro, which is intended for older code to still work until it can be
> fixed.
> > But also it seems that you don't use the_repository anywhere, so this
> can be deleted without consequence!
You are correct - I neglected to remove this macro when preparing to
submit the series. I can remove on the next iteration.
>> +/*
>> + * Recursively push process names onto the ancestry array.
>> + * We guard against cycles by limiting the depth to NR_PIDS_LIMIT.
>> + */
>> +static void push_ancestry_name(struct strvec *names, pid_t pid, int depth)
>> +{
>> + struct strbuf name = STRBUF_INIT;
>> + pid_t ppid;
>> +
>> + if (depth >= NR_PIDS_LIMIT)
>> + return;
> > Here is the recursion limit check.
> >> + if (pid <= 0)
>> + return;
>> +
>> + if (get_proc_info(pid, &name, &ppid) < 0)
>> + goto cleanup;
>> +
>> + strvec_push(names, name.buf);
> > This is copying the buffer, which is why you release it later.
> > Question: could we stop copying here and use strbuf_detach() at this
> point? That would be a very minor improvement, so feel free to ignore!
> > I took a look and rediscovered that strvecs do not have an option to not
> copy. I'm thinking about string_list. I'm not sure if there is any value
> in converting your code just to avoid some string duplication at this
> scale.
> >> + /*
>> + * Recurse to the parent process. Stop if ppid is 0 or 1
>> + * (init/launchd) or if we've reached ourselves (cycle).
>> + */
>> + if (ppid > 1 && ppid != pid)
>> + push_ancestry_name(names, ppid, depth + 1);
> > This kind of tail recursion could be easily converted into a loop. I
> usually prefer loops to recursion when possible, in case we want to allow
> an unlimited number of parents in the future.
I had based this on the compat/linux/procinfo.c implementation which
also uses recursion to walk the parent processes (and also defines an
upper limit to the number of processes to walk).
If I were to transform this to a loop, would we not also be wanting to
update linux/procinfo.c too?
>> +cleanup:
>> + strbuf_release(&name);
>> +}
> > I got a little confused by the lack of a .h file, but that's probably due
> to the extra magic being done at compile time to pick this file on a per-
> platform basis.
> > Indeed, trace2_collect_process_info() is defined in trace2.h.
> > Thanks,
> -Stolee
Thanks,
Matthew| @@ -148,6 +148,8 @@ ifeq ($(uname_S),Darwin) | |||
| HAVE_NS_GET_EXECUTABLE_PATH = YesPlease | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derrick Stolee wrote on the Git mailing list (how to reply to this email):
On 2/5/2026 11:05 AM, Matthew John Cheetham via GitGitGadget wrote:
> config.mak.uname | 2 ++
> contrib/buildsystems/CMakeLists.txt | 2 ++
> meson.build | 2 ++
So many build systems!
Each logic you included seems to be correct and matches the patterns from
the Linux case.
Thanks,
-Stolee| @@ -3,6 +3,7 @@ | |||
| #include "../../git-compat-util.h" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derrick Stolee wrote on the Git mailing list (how to reply to this email):
On 2/5/2026 11:05 AM, Matthew John Cheetham via GitGitGadget wrote:
> diff --git a/compat/win32/trace2_win32_process_info.c b/compat/win32/trace2_win32_process_info.c
...
> #include "../../json-writer.h"
Are we able to delete this after your change?
> pid = GetCurrentProcessId();
> while (find_pid(pid, hSnapshot, &pe32)) {
> - /* Only report parents. Omit self from the JSON output. */
> + /* Only report parents. Omit self from the output. */
> if (nr_pids)
> - jw_array_string(jw, pe32.szExeFile);
> + strvec_push(names, pe32.szExeFile);
>
> /* Check for cycle in snapshot. (Yes, it happened.) */
> for (k = 0; k < nr_pids; k++)
> if (pid == pid_list[k]) {
> - jw_array_string(jw, "(cycle)");
> + strvec_push(names, "(cycle)");
> return;
> }
>
> if (nr_pids == NR_PIDS_LIMIT) {
> - jw_array_string(jw, "(truncated)");
> + strvec_push(names, "(truncated)");
> return;
> }
Nice replacement of JSON with strvec logic.
> @@ -105,24 +101,14 @@ static void get_processes(struct json_writer *jw, HANDLE hSnapshot)
> }
>
> /*
> - * Emit JSON data for the current and parent processes. Individual
> - * trace2 targets can decide how to actually print it.
> + * Collect the list of parent process names.
> */
> -static void get_ancestry(void)
> +static void get_ancestry(struct strvec *names)
> {
> HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
>
> if (hSnapshot != INVALID_HANDLE_VALUE) {
> - struct json_writer jw = JSON_WRITER_INIT;
> -
> - jw_array_begin(&jw, 0);
> - get_processes(&jw, hSnapshot);
> - jw_end(&jw);
> -
> - trace2_data_json("process", the_repository, "windows/ancestry",
> - &jw);
> -
> - jw_release(&jw);
> + get_processes(names, hSnapshot);
> CloseHandle(hSnapshot);
Nice simplification!
> void trace2_collect_process_info(enum trace2_process_info_reason reason)
> {
> + struct strvec names = STRVEC_INIT;
> +
> if (!trace2_is_enabled())
> return;
>
> switch (reason) {
> case TRACE2_PROCESS_INFO_STARTUP:
> get_is_being_debugged();
> - get_ancestry();
> + get_ancestry(&names);
> + if (names.nr) {
> + struct json_writer jw = JSON_WRITER_INIT;
> + jw_array_begin(&jw, 0);
> + for (size_t i = 0; i < names.nr; i++)
> + jw_array_string(&jw, names.v[i]);
> + jw_end(&jw);
> + trace2_data_json("process", the_repository,
> + "windows/ancestry", &jw);
> + jw_release(&jw);
Ah, you still have JSON logic at this point.
I see that in your next patch you export the names vector itself _and_
this older JSON version. We should consider a future where we drop this
JSON altogether, but it's nice to have both for a few versions so tool
makers have time to respond.
Thanks,
-Stolee|
|
||
| switch (reason) { | ||
| case TRACE2_PROCESS_INFO_STARTUP: | ||
| get_is_being_debugged(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derrick Stolee wrote on the Git mailing list (how to reply to this email):
On 2/5/2026 11:05 AM, Matthew John Cheetham via GitGitGadget wrote:
> To not break existing consumers of the data_json "windows/ancestry"
> event, we continue to emit the ancestry data as a JSON event.
This is the important compatibility statement. When this merges, we should
make a claim that in two major versions we will drop this compatibility.
Thanks,
-Stolee|
Derrick Stolee wrote on the Git mailing list (how to reply to this email): On 2/5/2026 11:05 AM, Matthew John Cheetham via GitGitGadget wrote:
> In 353d3d77 (trace2: collect Windows-specific process information)
> Windows-specific process ancestry information was added as a data_json event
> to TRACE2. Furthermore in 2f732bf1 (tr2: log parent process name) similar
> functionality was added for Linux-based systems, using procfs.
>
> Let's teach Git on macOS to also gather process ancestry information, and
> emit it as a cmd_ancestry TRACE2 event.
This is done in patches 1-2. I notice that there are no tests validating
that this works.
I see that in t/t0210-trace2-normal.sh there is a scrub_normal() helper
that removes the cmd_ancestry event due to compatibility reasons.
I'd be interested to see if we could enable these tests to demonstrate
your changes here. Of course, Windows, Linux, and macOS are not the _only_
platforms we support. They are the only ones we check in CI, though.
> Furthermore, let's refactor the Windows implementation to align with the
> Linux and macOS versions - by emitting the ancestry information as a
> cmd_ancestry event. We keep the older, custom data_json event type on
> Windows for compatibility for consumers of the TRACE2 data that use the
> older event.
I appreciate this compatibility approach. I mention in my patch-by-patch
review that we should eventually drop the old mechanism, say in two major
versions.
The code looks good to me, just a question about the testing and some very
minor nitpicks around recursion and strvecs.
Thanks,
-Stolee |
|
User |
|
Junio C Hamano wrote on the Git mailing list (how to reply to this email): Derrick Stolee <stolee@gmail.com> writes:
>> Furthermore, let's refactor the Windows implementation to align with the
>> Linux and macOS versions - by emitting the ancestry information as a
>> cmd_ancestry event. We keep the older, custom data_json event type on
>> Windows for compatibility for consumers of the TRACE2 data that use the
>> older event.
>
> I appreciate this compatibility approach. I mention in my patch-by-patch
> review that we should eventually drop the old mechanism, say in two major
> versions.
>
> The code looks good to me, just a question about the testing and some very
> minor nitpicks around recursion and strvecs.
Thanks for a prompt review (and thanks for the series author for
writing the patches, of course).
Will mark as "expecting a hopefully minor and final reroll". |
|
This patch series was integrated into seen via git@00926fd. |
In 353d3d7 (trace2: collect Windows-specific process information)
Windows-specific process ancestry information was added as a data_json
event to TRACE2. Furthermore in 2f732bf (tr2: log parent process name)
similar functionality was added for Linux-based systems, using procfs.
Let's teach Git on macOS to also gather process ancestry information,
and emit it as a cmd_ancestry TRACE2 event.
Furthermore, let's refactor the Windows implementation to align with
the Linux and macOS versions - by emitting the ancestry information as
a cmd_ancestry event. We keep the older, custom data_json event type
on Windows for compatibility for consumers of the TRACE2 data that use
the older event.
Thanks,
Matthew
cc: gitster@pobox.com
cc: stolee@gmail.com
cc: johannes.schindelin@gmx.de
cc: "Kristoffer Haugsbakk" kristofferhaugsbakk@fastmail.com
cc: Matthew John Cheetham mjcheetham@outlook.com