linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] exit: add trace_task_exit() tracepoint before current->mm is reset
@ 2025-04-01 18:40 Andrii Nakryiko
  2025-04-01 21:32 ` Steven Rostedt
  2025-04-02  8:27 ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2025-04-01 18:40 UTC (permalink / raw)
  To: linux-trace-kernel, peterz, mingo
  Cc: bpf, linux-kernel, kernel-team, mhocko, rostedt, oleg, brauner,
	glider, mhiramat, mathieu.desnoyers, akpm, Andrii Nakryiko

It is useful to be able to access current->mm to, say, record a bunch of
VMA information right before the task exits (e.g., for stack
symbolization reasons when dealing with short-lived processes that exit
in the middle of profiling session). We currently do have
trace_sched_process_exit() in the exit path, but it is called a bit too
late, after exit_mm() resets current->mm to NULL, which makes it
unsuitable for inspecting and recording task's mm_struct-related data
when tracing process lifetimes.

There is a particularly suitable place, though, right after
taskstats_exit() is called, but before we do exit_mm(). taskstats
performs a similar kind of accounting that some applications do with
BPF, and so co-locating them seems like a good fit.

Moving trace_sched_process_exit() a bit earlier would solve this problem
as well, and I'm open to that. But this might potentially change its
semantics a little, and so instead of risking that, I went for adding
a new trace_task_exit() tracepoint instead. Tracepoints have zero
overhead at runtime, unless actively traced, so this seems acceptable.

Also, existing trace_sched_process_exit() tracepoint is notoriously
missing `group_dead` flag that is certainly useful in practice and some
of our production applications have to work around this. So plumb
`group_dead` through while at it, to have a richer and more complete
tracepoint.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/trace/events/task.h | 24 ++++++++++++++++++++++++
 kernel/exit.c               |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index af535b053033..98f4ec060073 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -53,6 +53,30 @@ TRACE_EVENT(task_rename,
 		  __entry->oldcomm, __entry->newcomm, __entry->oom_score_adj)
 );
 
+TRACE_EVENT(task_exit,
+
+	TP_PROTO(struct task_struct *task, bool group_dead),
+
+	TP_ARGS(task, group_dead),
+
+	TP_STRUCT__entry(
+		__field(	pid_t,	pid)
+		__array(	char,	comm, TASK_COMM_LEN)
+		__field(	bool,	group_dead)
+	),
+
+	TP_fast_assign(
+		__entry->pid = task->pid;
+		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+		__entry->group_dead = group_dead;
+	),
+
+	TP_printk("pid=%d comm=%s group_dead=%s",
+		__entry->pid, __entry->comm,
+		__entry->group_dead ? "true" : "false"
+	)
+);
+
 /**
  * task_prctl_unknown - called on unknown prctl() option
  * @option:	option passed
diff --git a/kernel/exit.c b/kernel/exit.c
index c2e6c7b7779f..8496fc07f9c8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -54,6 +54,7 @@
 #include <linux/init_task.h>
 #include <linux/perf_event.h>
 #include <trace/events/sched.h>
+#include <trace/events/task.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/oom.h>
 #include <linux/writeback.h>
@@ -937,6 +938,7 @@ void __noreturn do_exit(long code)
 
 	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
+	trace_task_exit(tsk, group_dead);
 
 	exit_mm();
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-04-02 15:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 18:40 [PATCH] exit: add trace_task_exit() tracepoint before current->mm is reset Andrii Nakryiko
2025-04-01 21:32 ` Steven Rostedt
2025-04-01 21:34   ` Steven Rostedt
2025-04-02  7:18     ` Michal Hocko
2025-04-01 22:04   ` Andrii Nakryiko
2025-04-01 22:13     ` Steven Rostedt
2025-04-01 22:17       ` Andrii Nakryiko
2025-04-02 10:27         ` Jiri Olsa
2025-04-02  7:20     ` Michal Hocko
2025-04-02 13:58       ` Steven Rostedt
2025-04-02 15:56       ` Andrii Nakryiko
2025-04-02  8:27 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).