* [for-linus][PATCH 0/6] tracing: Fixes for v7.0
@ 2026-03-04 22:03 Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 1/6] fgraph: Fix thresh_return clear per-task notrace Steven Rostedt
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
tracing fixes for v7.0:
- Fix thresh_return of function graph tracer
The update to store data on the shadow stack removed the abuse of
using the task recursion word as a way to keep track of what functions
to ignore. The trace_graph_return() was updated to handle this, but
when function_graph tracer is using a threshold (only trace functions
that took longer than a specified time), it uses
trace_graph_thresh_return() instead. This function was still incorrectly
using the task struct recursion word causing the function graph tracer to
permanently set all functions to "notrace"
- Fix thresh_return nosleep accounting
When the calltime was moved to the shadow stack storage instead of being
on the fgraph descriptor, the calculations for the amount of sleep time
was updated. The calculation was done in the trace_graph_thresh_return()
function, which also called the trace_graph_return(), which did the
calculation again, causing the time to be doubled.
Remove the call to trace_graph_return() as what it needed to do wasn't
that much, and just do the work in trace_graph_thresh_return().
- Fix syscall trace event activation on boot up
The syscall trace events are pseudo events attached to the raw_syscall
tracepoints. When the first syscall event is enabled, it enables the
raw_syscall tracepoint and doesn't need to do anything when a second
syscall event is also enabled.
When events are enabled via the kernel command line, syscall events
are partially enabled as the enabling is called before rcu_init.
After rcu_init, they are disabled and re-enabled so that they can
be fully enabled. The problem happened is that this "disable-enable"
is done one at a time. If more than one syscall event is specified on the
command line, by disabling them one at a time never makes the counter go
to zero, and the raw_syscall is not disabled and enabled.
Instead, disable all events and re-enabled them all, as that will ensure
the raw_syscall event is also disabled and re-enabled.
- Disable preemption in ftrace pid filtering
The ftrace pid filtering attaches to the fork and exit tracepoints to
add or remove pids that should be traced. They access variables protected
by RCU (preemption disabed). Now that tracepoint callbacks are called with
preemption, this protection needs to be added explicitly, and not depend
on the functions being called with preemption disabled.
- Disable preemption in event pid filtering
The event pid filtering needs the same preemption disabling guards as
ftrace pid filtering.
- Fix accounting of the memory mapped ring buffer on fork
Memory mapping the ftrace ring buffer sets the vm_flags to DONTCOPY. But
this does not prevent the application from calling madvise(MADVISE_DOFORK).
This causes the mapping to be copied on fork. After the first tasks exits,
the mapping is considered unmapped by everyone. But when he second task
exits, the counter goes below zero and triggers a WARN_ON.
Since nothing prevents two separate tasks from mmapping the ftrace ring
buffer (although weird the two will mess each other up), there's no reason
to stop the memory from being copied on fork.
Update the vm_operations to have an ".open" handler to update the
accounting and let the ring buffer know someone else has it mapped.
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/fixes
Head SHA1: e39bb9e02b68942f8e9359d2a3efe7d37ae6be0e
Huiwen He (1):
tracing: Fix syscall events activation by ensuring refcount hits zero
Masami Hiramatsu (Google) (1):
tracing: Disable preemption in the tracepoint callbacks handling filtered pids
Qing Wang (1):
tracing: Fix WARN_ON in tracing_buffers_mmap_close
Shengming Hu (2):
fgraph: Fix thresh_return clear per-task notrace
fgraph: Fix thresh_return nosleeptime double-adjust
Steven Rostedt (1):
ftrace: Disable preemption in the tracepoint callbacks handling filtered pids
----
include/linux/ring_buffer.h | 1 +
kernel/trace/ftrace.c | 2 ++
kernel/trace/ring_buffer.c | 21 ++++++++++++++
kernel/trace/trace.c | 13 +++++++++
kernel/trace/trace_events.c | 54 ++++++++++++++++++++++++++----------
kernel/trace/trace_functions_graph.c | 19 +++++++++----
6 files changed, 89 insertions(+), 21 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [for-linus][PATCH 1/6] fgraph: Fix thresh_return clear per-task notrace
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
@ 2026-03-04 22:03 ` Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 2/6] fgraph: Fix thresh_return nosleeptime double-adjust Steven Rostedt
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Shengming Hu
From: Shengming Hu <hu.shengming@zte.com.cn>
When tracing_thresh is enabled, function graph tracing uses
trace_graph_thresh_return() as the return handler. Unlike
trace_graph_return(), it did not clear the per-task TRACE_GRAPH_NOTRACE
flag set by the entry handler for set_graph_notrace addresses. This could
leave the task permanently in "notrace" state and effectively disable
function graph tracing for that task.
Mirror trace_graph_return()'s per-task notrace handling by clearing
TRACE_GRAPH_NOTRACE and returning early when set.
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260221113007819YgrZsMGABff4Rc-O_fZxL@zte.com.cn
Fixes: b84214890a9bc ("function_graph: Move graph notrace bit to shadow stack global var")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_functions_graph.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 3d8239fee004..817d0f1696b6 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -400,14 +400,15 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops,
struct ftrace_regs *fregs)
{
+ unsigned long *task_var = fgraph_get_task_var(gops);
struct fgraph_times *ftimes;
struct trace_array *tr;
int size;
ftrace_graph_addr_finish(gops, trace);
- if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
- trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+ if (*task_var & TRACE_GRAPH_NOTRACE) {
+ *task_var &= ~TRACE_GRAPH_NOTRACE;
return;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-linus][PATCH 2/6] fgraph: Fix thresh_return nosleeptime double-adjust
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 1/6] fgraph: Fix thresh_return clear per-task notrace Steven Rostedt
@ 2026-03-04 22:03 ` Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 3/6] tracing: Fix syscall events activation by ensuring refcount hits zero Steven Rostedt
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Shengming Hu
From: Shengming Hu <hu.shengming@zte.com.cn>
trace_graph_thresh_return() called handle_nosleeptime() and then delegated
to trace_graph_return(), which calls handle_nosleeptime() again. When
sleep-time accounting is disabled this double-adjusts calltime and can
produce bogus durations (including underflow).
Fix this by computing rettime once, applying handle_nosleeptime() only
once, using the adjusted calltime for threshold comparison, and writing
the return event directly via __trace_graph_return() when the threshold is
met.
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260221113314048jE4VRwIyZEALiYByGK0My@zte.com.cn
Fixes: 3c9880f3ab52b ("ftrace: Use a running sleeptime instead of saving on shadow stack")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_functions_graph.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 817d0f1696b6..0d2d3a2ea7dd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -403,8 +403,12 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
unsigned long *task_var = fgraph_get_task_var(gops);
struct fgraph_times *ftimes;
struct trace_array *tr;
+ unsigned int trace_ctx;
+ u64 calltime, rettime;
int size;
+ rettime = trace_clock_local();
+
ftrace_graph_addr_finish(gops, trace);
if (*task_var & TRACE_GRAPH_NOTRACE) {
@@ -419,11 +423,13 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
tr = gops->private;
handle_nosleeptime(tr, trace, ftimes, size);
- if (tracing_thresh &&
- (trace_clock_local() - ftimes->calltime < tracing_thresh))
+ calltime = ftimes->calltime;
+
+ if (tracing_thresh && (rettime - calltime < tracing_thresh))
return;
- else
- trace_graph_return(trace, gops, fregs);
+
+ trace_ctx = tracing_gen_ctx();
+ __trace_graph_return(tr, trace, trace_ctx, calltime, rettime);
}
static struct fgraph_ops funcgraph_ops = {
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-linus][PATCH 3/6] tracing: Fix syscall events activation by ensuring refcount hits zero
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 1/6] fgraph: Fix thresh_return clear per-task notrace Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 2/6] fgraph: Fix thresh_return nosleeptime double-adjust Steven Rostedt
@ 2026-03-04 22:03 ` Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 4/6] ftrace: Disable preemption in the tracepoint callbacks handling filtered pids Steven Rostedt
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Huiwen He
From: Huiwen He <hehuiwen@kylinos.cn>
When multiple syscall events are specified in the kernel command line
(e.g., trace_event=syscalls:sys_enter_openat,syscalls:sys_enter_close),
they are often not captured after boot, even though they appear enabled
in the tracing/set_event file.
The issue stems from how syscall events are initialized. Syscall
tracepoints require the global reference count (sys_tracepoint_refcount)
to transition from 0 to 1 to trigger the registration of the syscall
work (TIF_SYSCALL_TRACEPOINT) for tasks, including the init process (pid 1).
The current implementation of early_enable_events() with disable_first=true
used an interleaved sequence of "Disable A -> Enable A -> Disable B -> Enable B".
If multiple syscalls are enabled, the refcount never drops to zero,
preventing the 0->1 transition that triggers actual registration.
Fix this by splitting early_enable_events() into two distinct phases:
1. Disable all events specified in the buffer.
2. Enable all events specified in the buffer.
This ensures the refcount hits zero before re-enabling, allowing syscall
events to be properly activated during early boot.
The code is also refactored to use a helper function to avoid logic
duplication between the disable and enable phases.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260224023544.1250787-1-hehuiwen@kylinos.cn
Fixes: ce1039bd3a89 ("tracing: Fix enabling of syscall events on the command line")
Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_events.c | 52 ++++++++++++++++++++++++++-----------
1 file changed, 37 insertions(+), 15 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9928da636c9d..9c7f26cbe171 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -4668,26 +4668,22 @@ static __init int event_trace_memsetup(void)
return 0;
}
-__init void
-early_enable_events(struct trace_array *tr, char *buf, bool disable_first)
+/*
+ * Helper function to enable or disable a comma-separated list of events
+ * from the bootup buffer.
+ */
+static __init void __early_set_events(struct trace_array *tr, char *buf, bool enable)
{
char *token;
- int ret;
-
- while (true) {
- token = strsep(&buf, ",");
-
- if (!token)
- break;
+ while ((token = strsep(&buf, ","))) {
if (*token) {
- /* Restarting syscalls requires that we stop them first */
- if (disable_first)
+ if (enable) {
+ if (ftrace_set_clr_event(tr, token, 1))
+ pr_warn("Failed to enable trace event: %s\n", token);
+ } else {
ftrace_set_clr_event(tr, token, 0);
-
- ret = ftrace_set_clr_event(tr, token, 1);
- if (ret)
- pr_warn("Failed to enable trace event: %s\n", token);
+ }
}
/* Put back the comma to allow this to be called again */
@@ -4696,6 +4692,32 @@ early_enable_events(struct trace_array *tr, char *buf, bool disable_first)
}
}
+/**
+ * early_enable_events - enable events from the bootup buffer
+ * @tr: The trace array to enable the events in
+ * @buf: The buffer containing the comma separated list of events
+ * @disable_first: If true, disable all events in @buf before enabling them
+ *
+ * This function enables events from the bootup buffer. If @disable_first
+ * is true, it will first disable all events in the buffer before enabling
+ * them.
+ *
+ * For syscall events, which rely on a global refcount to register the
+ * SYSCALL_WORK_SYSCALL_TRACEPOINT flag (especially for pid 1), we must
+ * ensure the refcount hits zero before re-enabling them. A simple
+ * "disable then enable" per-event is not enough if multiple syscalls are
+ * used, as the refcount will stay above zero. Thus, we need a two-phase
+ * approach: disable all, then enable all.
+ */
+__init void
+early_enable_events(struct trace_array *tr, char *buf, bool disable_first)
+{
+ if (disable_first)
+ __early_set_events(tr, buf, false);
+
+ __early_set_events(tr, buf, true);
+}
+
static __init int event_trace_enable(void)
{
struct trace_array *tr = top_trace_array();
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-linus][PATCH 4/6] ftrace: Disable preemption in the tracepoint callbacks handling filtered pids
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
` (2 preceding siblings ...)
2026-03-04 22:03 ` [for-linus][PATCH 3/6] tracing: Fix syscall events activation by ensuring refcount hits zero Steven Rostedt
@ 2026-03-04 22:03 ` Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 5/6] tracing: " Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 6/6] tracing: Fix WARN_ON in tracing_buffers_mmap_close Steven Rostedt
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
When function trace PID filtering is enabled, the function tracer will
attach a callback to the fork tracepoint as well as the exit tracepoint
that will add the forked child PID to the PID filtering list as well as
remove the PID that is exiting.
Commit a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of
__DO_TRACE_CALL() with SRCU-fast") removed the disabling of preemption
when calling tracepoint callbacks.
The callbacks used for the PID filtering accounting depended on preemption
being disabled, and now the trigger a "suspicious RCU usage" warning message.
Make them explicitly disable preemption.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260302213546.156e3e4f@gandalf.local.home
Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/ftrace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 827fb9a0bf0d..2f72af0357e5 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -8611,6 +8611,7 @@ ftrace_pid_follow_sched_process_fork(void *data,
struct trace_pid_list *pid_list;
struct trace_array *tr = data;
+ guard(preempt)();
pid_list = rcu_dereference_sched(tr->function_pids);
trace_filter_add_remove_task(pid_list, self, task);
@@ -8624,6 +8625,7 @@ ftrace_pid_follow_sched_process_exit(void *data, struct task_struct *task)
struct trace_pid_list *pid_list;
struct trace_array *tr = data;
+ guard(preempt)();
pid_list = rcu_dereference_sched(tr->function_pids);
trace_filter_add_remove_task(pid_list, NULL, task);
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-linus][PATCH 5/6] tracing: Disable preemption in the tracepoint callbacks handling filtered pids
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
` (3 preceding siblings ...)
2026-03-04 22:03 ` [for-linus][PATCH 4/6] ftrace: Disable preemption in the tracepoint callbacks handling filtered pids Steven Rostedt
@ 2026-03-04 22:03 ` Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 6/6] tracing: Fix WARN_ON in tracing_buffers_mmap_close Steven Rostedt
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Filtering PIDs for events triggered the following during selftests:
[37] event tracing - restricts events based on pid notrace filtering
[ 155.874095]
[ 155.874869] =============================
[ 155.876037] WARNING: suspicious RCU usage
[ 155.877287] 7.0.0-rc1-00004-g8cd473a19bc7 #7 Not tainted
[ 155.879263] -----------------------------
[ 155.882839] kernel/trace/trace_events.c:1057 suspicious rcu_dereference_check() usage!
[ 155.889281]
[ 155.889281] other info that might help us debug this:
[ 155.889281]
[ 155.894519]
[ 155.894519] rcu_scheduler_active = 2, debug_locks = 1
[ 155.898068] no locks held by ftracetest/4364.
[ 155.900524]
[ 155.900524] stack backtrace:
[ 155.902645] CPU: 1 UID: 0 PID: 4364 Comm: ftracetest Not tainted 7.0.0-rc1-00004-g8cd473a19bc7 #7 PREEMPT(lazy)
[ 155.902648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[ 155.902651] Call Trace:
[ 155.902655] <TASK>
[ 155.902659] dump_stack_lvl+0x67/0x90
[ 155.902665] lockdep_rcu_suspicious+0x154/0x1a0
[ 155.902672] event_filter_pid_sched_process_fork+0x9a/0xd0
[ 155.902678] kernel_clone+0x367/0x3a0
[ 155.902689] __x64_sys_clone+0x116/0x140
[ 155.902696] do_syscall_64+0x158/0x460
[ 155.902700] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 155.902702] ? trace_irq_disable+0x1d/0xc0
[ 155.902709] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 155.902711] RIP: 0033:0x4697c3
[ 155.902716] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 89 c2 85 c0 75 2c 64 48 8b 04 25 10 00 00
[ 155.902718] RSP: 002b:00007ffc41150428 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[ 155.902721] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004697c3
[ 155.902722] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[ 155.902724] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000003fccf990
[ 155.902725] R10: 000000003fccd690 R11: 0000000000000246 R12: 0000000000000001
[ 155.902726] R13: 000000003fce8103 R14: 0000000000000001 R15: 0000000000000000
[ 155.902733] </TASK>
[ 155.902747]
The tracepoint callbacks recently were changed to allow preemption. The
event PID filtering callbacks that were attached to the fork and exit
tracepoints expected preemption disabled in order to access the RCU
protected PID lists.
Add a guard(preempt)() to protect the references to the PID list.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260303215738.6ab275af@fedora
Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast")
Link: https://patch.msgid.link/20260303131706.96057f61a48a34c43ce1e396@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_events.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9c7f26cbe171..b7343fdfd7b0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1039,6 +1039,7 @@ event_filter_pid_sched_process_exit(void *data, struct task_struct *task)
struct trace_pid_list *pid_list;
struct trace_array *tr = data;
+ guard(preempt)();
pid_list = rcu_dereference_raw(tr->filtered_pids);
trace_filter_add_remove_task(pid_list, NULL, task);
@@ -1054,6 +1055,7 @@ event_filter_pid_sched_process_fork(void *data,
struct trace_pid_list *pid_list;
struct trace_array *tr = data;
+ guard(preempt)();
pid_list = rcu_dereference_sched(tr->filtered_pids);
trace_filter_add_remove_task(pid_list, self, task);
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-linus][PATCH 6/6] tracing: Fix WARN_ON in tracing_buffers_mmap_close
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
` (4 preceding siblings ...)
2026-03-04 22:03 ` [for-linus][PATCH 5/6] tracing: " Steven Rostedt
@ 2026-03-04 22:03 ` Steven Rostedt
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-03-04 22:03 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Vincent Donnefort, Lorenzo Stoakes,
syzbot+3b5dd2030fe08afdf65d, Qing Wang
From: Qing Wang <wangqing7171@gmail.com>
When a process forks, the child process copies the parent's VMAs but the
user_mapped reference count is not incremented. As a result, when both the
parent and child processes exit, tracing_buffers_mmap_close() is called
twice. On the second call, user_mapped is already 0, causing the function to
return -ENODEV and triggering a WARN_ON.
Normally, this isn't an issue as the memory is mapped with VM_DONTCOPY set.
But this is only a hint, and the application can call
madvise(MADVISE_DOFORK) which resets the VM_DONTCOPY flag. When the
application does that, it can trigger this issue on fork.
Fix it by incrementing the user_mapped reference count without re-mapping
the pages in the VMA's open callback.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://patch.msgid.link/20260227025842.1085206-1-wangqing7171@gmail.com
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Reported-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3b5dd2030fe08afdf65d
Tested-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 21 +++++++++++++++++++++
kernel/trace/trace.c | 13 +++++++++++++
3 files changed, 35 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 876358cfe1b1..d862fa610270 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -248,6 +248,7 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
int ring_buffer_map(struct trace_buffer *buffer, int cpu,
struct vm_area_struct *vma);
+void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu);
int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
#endif /* _LINUX_RING_BUFFER_H */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f16f053ef77d..17d0ea0cc3e6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7310,6 +7310,27 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
return err;
}
+/*
+ * This is called when a VMA is duplicated (e.g., on fork()) to increment
+ * the user_mapped counter without remapping pages.
+ */
+void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+
+ if (WARN_ON(!cpumask_test_cpu(cpu, buffer->cpumask)))
+ return;
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ guard(mutex)(&cpu_buffer->mapping_lock);
+
+ if (cpu_buffer->user_mapped)
+ __rb_inc_dec_mapped(cpu_buffer, true);
+ else
+ WARN(1, "Unexpected buffer stat, it should be mapped");
+}
+
int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 23de3719f495..1e7c032a72d2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8213,6 +8213,18 @@ static inline int get_snapshot_map(struct trace_array *tr) { return 0; }
static inline void put_snapshot_map(struct trace_array *tr) { }
#endif
+/*
+ * This is called when a VMA is duplicated (e.g., on fork()) to increment
+ * the user_mapped counter without remapping pages.
+ */
+static void tracing_buffers_mmap_open(struct vm_area_struct *vma)
+{
+ struct ftrace_buffer_info *info = vma->vm_file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ ring_buffer_map_dup(iter->array_buffer->buffer, iter->cpu_file);
+}
+
static void tracing_buffers_mmap_close(struct vm_area_struct *vma)
{
struct ftrace_buffer_info *info = vma->vm_file->private_data;
@@ -8232,6 +8244,7 @@ static int tracing_buffers_may_split(struct vm_area_struct *vma, unsigned long a
}
static const struct vm_operations_struct tracing_buffers_vmops = {
+ .open = tracing_buffers_mmap_open,
.close = tracing_buffers_mmap_close,
.may_split = tracing_buffers_may_split,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-04 22:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 22:03 [for-linus][PATCH 0/6] tracing: Fixes for v7.0 Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 1/6] fgraph: Fix thresh_return clear per-task notrace Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 2/6] fgraph: Fix thresh_return nosleeptime double-adjust Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 3/6] tracing: Fix syscall events activation by ensuring refcount hits zero Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 4/6] ftrace: Disable preemption in the tracepoint callbacks handling filtered pids Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 5/6] tracing: " Steven Rostedt
2026-03-04 22:03 ` [for-linus][PATCH 6/6] tracing: Fix WARN_ON in tracing_buffers_mmap_close Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox