All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, mhocko@kernel.org, rostedt@goodmis.org,
	oleg@redhat.com, brauner@kernel.org, glider@google.com,
	mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH v2] exit: move and extend sched_process_exit() tracepoint
Date: Thu, 3 Apr 2025 15:54:22 +0200	[thread overview]
Message-ID: <Z-6TDh1MUT49lvjk@gmail.com> (raw)
In-Reply-To: <20250402180925.90914-1-andrii@kernel.org>


* Andrii Nakryiko <andrii@kernel.org> wrote:

> It is useful to be able to access current->mm at task exit to, say,
> record a bunch of VMA information right before the task exits (e.g., for
> stack symbolization reasons when dealing with short-lived processes that
> exit in the middle of profiling session). Currently,
> trace_sched_process_exit() is triggered after exit_mm() which resets
> current->mm to NULL making this tracepoint unsuitable for inspecting
> and recording task's mm_struct-related data when tracing process
> lifetimes.
> 
> There is a particularly suitable place, though, right after
> taskstats_exit() is called, but before we do exit_mm() and other
> exit_*() resource teardowns. taskstats performs a similar kind of
> accounting that some applications do with BPF, and so co-locating them
> seems like a good fit. So that's where trace_sched_process_exit() is
> moved with this patch.
> 
> Also, existing trace_sched_process_exit() tracepoint is notoriously
> missing `group_dead` flag that is certainly useful in practice and some
> of our production applications have to work around this. So plumb
> `group_dead` through while at it, to have a richer and more complete
> tracepoint.
> 
> Note that we can't use sched_process_template anymore, and so we use
> TRACE_EVENT()-based tracepoint definition.

 But all the field names and
> order, as well as assign and output logic remain intact. We just add one
> extra field at the end in backwards-compatible way.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/trace/events/sched.h | 28 +++++++++++++++++++++++++---
>  kernel/exit.c                |  2 +-
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 8994e97d86c1..05a14f2b35c3 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -328,9 +328,31 @@ DEFINE_EVENT(sched_process_template, sched_process_free,
>  /*
>   * Tracepoint for a task exiting:
>   */
> -DEFINE_EVENT(sched_process_template, sched_process_exit,
> -	     TP_PROTO(struct task_struct *p),
> -	     TP_ARGS(p));
> +TRACE_EVENT(sched_process_exit,
> +
> +	TP_PROTO(struct task_struct *p, bool group_dead),
> +
> +	TP_ARGS(p, group_dead),
> +
> +	TP_STRUCT__entry(
> +		__array(	char,	comm,	TASK_COMM_LEN	)
> +		__field(	pid_t,	pid			)
> +		__field(	int,	prio			)
> +		__field(	bool,	group_dead		)
> +	),
> +
> +	TP_fast_assign(
> +		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
> +		__entry->pid		= p->pid;
> +		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
> +		__entry->group_dead	= group_dead;
> +	),
> +
> +	TP_printk("comm=%s pid=%d prio=%d group_dead=%s",
> +		  __entry->comm, __entry->pid, __entry->prio,
> +		  __entry->group_dead ? "true" : "false"
> +	)

This feels really fragile, could you please at least add a comment that 
points out that this is basically an extension of 
sched_process_template, and that it should remain a subset of it, or 
something to that end?

Thanks,

	Ingo

  parent reply	other threads:[~2025-04-03 13:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-02 18:09 [PATCH v2] exit: move and extend sched_process_exit() tracepoint Andrii Nakryiko
2025-04-02 18:36 ` Steven Rostedt
2025-04-02 19:29 ` Oleg Nesterov
2025-04-03 13:54 ` Ingo Molnar [this message]
2025-04-03 15:06   ` Steven Rostedt
2025-04-03 15:11     ` Ingo Molnar
2025-04-03 16:01       ` Andrii Nakryiko
2025-04-04  8:37 ` [tip: sched/core] sched/tracepoints: Move and extend the " tip-bot2 for Andrii Nakryiko
2025-04-04 15:49   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-6TDh1MUT49lvjk@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=glider@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.