From: Ingo Molnar <mingo@kernel.org>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com, mhocko@kernel.org, rostedt@goodmis.org,
oleg@redhat.com, brauner@kernel.org, glider@google.com,
mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
akpm@linux-foundation.org
Subject: Re: [PATCH v2] exit: move and extend sched_process_exit() tracepoint
Date: Thu, 3 Apr 2025 15:54:22 +0200 [thread overview]
Message-ID: <Z-6TDh1MUT49lvjk@gmail.com> (raw)
In-Reply-To: <20250402180925.90914-1-andrii@kernel.org>
* Andrii Nakryiko <andrii@kernel.org> wrote:
> It is useful to be able to access current->mm at task exit to, say,
> record a bunch of VMA information right before the task exits (e.g., for
> stack symbolization reasons when dealing with short-lived processes that
> exit in the middle of profiling session). Currently,
> trace_sched_process_exit() is triggered after exit_mm() which resets
> current->mm to NULL making this tracepoint unsuitable for inspecting
> and recording task's mm_struct-related data when tracing process
> lifetimes.
>
> There is a particularly suitable place, though, right after
> taskstats_exit() is called, but before we do exit_mm() and other
> exit_*() resource teardowns. taskstats performs a similar kind of
> accounting that some applications do with BPF, and so co-locating them
> seems like a good fit. So that's where trace_sched_process_exit() is
> moved with this patch.
>
> Also, existing trace_sched_process_exit() tracepoint is notoriously
> missing `group_dead` flag that is certainly useful in practice and some
> of our production applications have to work around this. So plumb
> `group_dead` through while at it, to have a richer and more complete
> tracepoint.
>
> Note that we can't use sched_process_template anymore, and so we use
> TRACE_EVENT()-based tracepoint definition.
But all the field names and
> order, as well as assign and output logic remain intact. We just add one
> extra field at the end in backwards-compatible way.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> include/trace/events/sched.h | 28 +++++++++++++++++++++++++---
> kernel/exit.c | 2 +-
> 2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 8994e97d86c1..05a14f2b35c3 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -328,9 +328,31 @@ DEFINE_EVENT(sched_process_template, sched_process_free,
> /*
> * Tracepoint for a task exiting:
> */
> -DEFINE_EVENT(sched_process_template, sched_process_exit,
> - TP_PROTO(struct task_struct *p),
> - TP_ARGS(p));
> +TRACE_EVENT(sched_process_exit,
> +
> + TP_PROTO(struct task_struct *p, bool group_dead),
> +
> + TP_ARGS(p, group_dead),
> +
> + TP_STRUCT__entry(
> + __array( char, comm, TASK_COMM_LEN )
> + __field( pid_t, pid )
> + __field( int, prio )
> + __field( bool, group_dead )
> + ),
> +
> + TP_fast_assign(
> + memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
> + __entry->pid = p->pid;
> + __entry->prio = p->prio; /* XXX SCHED_DEADLINE */
> + __entry->group_dead = group_dead;
> + ),
> +
> + TP_printk("comm=%s pid=%d prio=%d group_dead=%s",
> + __entry->comm, __entry->pid, __entry->prio,
> + __entry->group_dead ? "true" : "false"
> + )
This feels really fragile, could you please at least add a comment that
points out that this is basically an extension of
sched_process_template, and that it should remain a subset of it, or
something to that end?
Thanks,
Ingo
next prev parent reply other threads:[~2025-04-03 13:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-02 18:09 [PATCH v2] exit: move and extend sched_process_exit() tracepoint Andrii Nakryiko
2025-04-02 18:36 ` Steven Rostedt
2025-04-02 19:29 ` Oleg Nesterov
2025-04-03 13:54 ` Ingo Molnar [this message]
2025-04-03 15:06 ` Steven Rostedt
2025-04-03 15:11 ` Ingo Molnar
2025-04-03 16:01 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-6TDh1MUT49lvjk@gmail.com \
--to=mingo@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=glider@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).