[GIT PULL] tracing: Fixes for 7.0

All of lore.kernel.org
 help / color / mirror / Atom feed

* [GIT PULL] tracing: Fixes for 7.0
@ 2026-02-19 21:01 Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2026-02-19 21:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland,
	Daniil Dulov, Petr Pavlu


Linus,

Tracing fixes for 7.0:

- Fix possible dereference of uninitialized pointer

  When validating the persistent ring buffer on boot up, if the first
  validation fails, a reference to "head_page" is performed in the
  error path, but it skips over the initialization of that variable.
  Move the initialization before the first validation check.

- Fix use of event length in validation of persistent ring buffer

  On boot up, the persistent ring buffer is checked to see if it is
  valid by several methods. One being to walk all the events in the
  memory location to make sure they are all valid. The length of the
  event is used to move to the next event. This length is determined
  by the data in the buffer. If that length is corrupted, it could
  possibly make the next event to check located at a bad memory location.

  Validate the length field of the event when doing the event walk.

- Fix function graph on archs that do not support use of ftrace_ops

  When an architecture defines HAVE_DYNAMIC_FTRACE_WITH_ARGS, it means
  that its function graph tracer uses the ftrace_ops of the function
  tracer to call its callbacks. This allows a single registered callback
  to be called directly instead of checking the callback's meta data's
  hash entries against the function being traced.

  For architectures that do not support this feature, it must always
  call the loop function that tests each registered callback (even if
  there's only one). The loop function tests each callback's meta data
  against its hash of functions and will call its callback if the
  function being traced is in its hash map.

  The issue was that there was no check against this and the direct
  function was being called even if the architecture didn't support it.
  This meant that if function tracing was enabled at the same time
  as a callback was registered with the function graph tracer, its
  callback would be called for every function that the function tracer
  also traced, even if the callback's meta data only wanted to be
  called back for a small subset of functions.

  Prevent the direct calling for those architectures that do not support
  it.

- Fix references to trace_event_file for hist files

  The hist files used event_file_data() to get a reference to the
  associated trace_event_file the histogram was attached to. This
  would return a pointer even if the trace_event_file is about to
  be freed (via RCU). Instead it should use the event_file_file()
  helper that returns NULL if the trace_event_file is marked to be
  freed so that no new references are added to it.

- Wake up hist poll readers when an event is being freed

  When polling on a hist file, the task is only awoken when a hist
  trigger is triggered. This means that if an event is being freed
  while there's a task waiting on its hist file, it will need to wait
  until the hist trigger occurs to wake it up and allow the freeing
  to happen. Note, the event will not be completely freed until all
  references are removed, and a hist poller keeps a reference. But
  it should still be woken when the event is being freed.


Please pull the latest trace-v7.0-2 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace-v7.0-2

Tag SHA1: d356215e804503d1ba58c3dfbeda19d549635c29
Head SHA1: 9678e53179aa7e907360f5b5b275769008a69b80


Daniil Dulov (1):
      ring-buffer: Fix possible dereference of uninitialized pointer

Masami Hiramatsu (Google) (1):
      tracing: ring-buffer: Fix to check event length before using

Petr Pavlu (2):
      tracing: Fix checking of freed trace_event_file for hist files
      tracing: Wake up poll waiters for hist files when removing an event

Steven Rostedt (1):
      fgraph: Do not call handlers direct when not using ftrace_ops

----
 include/linux/ftrace.h           | 13 ++++++++++---
 include/linux/trace_events.h     |  5 +++++
 kernel/trace/fgraph.c            | 12 +++++++++++-
 kernel/trace/ring_buffer.c       |  9 +++++++--
 kernel/trace/trace_events.c      |  3 +++
 kernel/trace/trace_events_hist.c |  4 ++--
 6 files changed, 38 insertions(+), 8 deletions(-)
---------------------------
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 1a4d36fc9085..c242fe49af4c 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1092,10 +1092,17 @@ static inline bool is_ftrace_trampoline(unsigned long addr)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 #ifndef ftrace_graph_func
-#define ftrace_graph_func ftrace_stub
-#define FTRACE_OPS_GRAPH_STUB FTRACE_OPS_FL_STUB
+# define ftrace_graph_func ftrace_stub
+# define FTRACE_OPS_GRAPH_STUB FTRACE_OPS_FL_STUB
+/*
+ * The function graph is called every time the function tracer is called.
+ * It must always test the ops hash and cannot just directly call
+ * the handler.
+ */
+# define FGRAPH_NO_DIRECT	1
 #else
-#define FTRACE_OPS_GRAPH_STUB 0
+# define FTRACE_OPS_GRAPH_STUB	0
+# define FGRAPH_NO_DIRECT	0
 #endif
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 0a2b8229b999..37eb2f0f3dd8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -683,6 +683,11 @@ static inline void hist_poll_wakeup(void)
 
 #define hist_poll_wait(file, wait)	\
 	poll_wait(file, &hist_poll_wq, wait)
+
+#else
+static inline void hist_poll_wakeup(void)
+{
+}
 #endif
 
 #define __TRACE_EVENT_FLAGS(name, value)				\
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 4df766c690f9..40d373d65f9b 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -539,7 +539,11 @@ static struct fgraph_ops fgraph_stub = {
 static struct fgraph_ops *fgraph_direct_gops = &fgraph_stub;
 DEFINE_STATIC_CALL(fgraph_func, ftrace_graph_entry_stub);
 DEFINE_STATIC_CALL(fgraph_retfunc, ftrace_graph_ret_stub);
+#if FGRAPH_NO_DIRECT
+static DEFINE_STATIC_KEY_FALSE(fgraph_do_direct);
+#else
 static DEFINE_STATIC_KEY_TRUE(fgraph_do_direct);
+#endif
 
 /**
  * ftrace_graph_stop - set to permanently disable function graph tracing
@@ -843,7 +847,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
 	bitmap = get_bitmap_bits(current, offset);
 
 #ifdef CONFIG_HAVE_STATIC_CALL
-	if (static_branch_likely(&fgraph_do_direct)) {
+	if (!FGRAPH_NO_DIRECT && static_branch_likely(&fgraph_do_direct)) {
 		if (test_bit(fgraph_direct_gops->idx, &bitmap))
 			static_call(fgraph_retfunc)(&trace, fgraph_direct_gops, fregs);
 	} else
@@ -1285,6 +1289,9 @@ static void ftrace_graph_enable_direct(bool enable_branch, struct fgraph_ops *go
 	trace_func_graph_ret_t retfunc = NULL;
 	int i;
 
+	if (FGRAPH_NO_DIRECT)
+		return;
+
 	if (gops) {
 		func = gops->entryfunc;
 		retfunc = gops->retfunc;
@@ -1308,6 +1315,9 @@ static void ftrace_graph_enable_direct(bool enable_branch, struct fgraph_ops *go
 
 static void ftrace_graph_disable_direct(bool disable_branch)
 {
+	if (FGRAPH_NO_DIRECT)
+		return;
+
 	if (disable_branch)
 		static_branch_disable(&fgraph_do_direct);
 	static_call_update(fgraph_func, ftrace_graph_entry_stub);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d33103408955..1e7a34a31851 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1849,6 +1849,7 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
 	struct ring_buffer_event *event;
 	u64 ts, delta;
 	int events = 0;
+	int len;
 	int e;
 
 	*delta_ptr = 0;
@@ -1856,9 +1857,12 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
 
 	ts = dpage->time_stamp;
 
-	for (e = 0; e < tail; e += rb_event_length(event)) {
+	for (e = 0; e < tail; e += len) {
 
 		event = (struct ring_buffer_event *)(dpage->data + e);
+		len = rb_event_length(event);
+		if (len <= 0 || len > tail - e)
+			return -1;
 
 		switch (event->type_len) {
 
@@ -1919,6 +1923,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	if (!meta || !meta->head_buffer)
 		return;
 
+	orig_head = head_page = cpu_buffer->head_page;
+
 	/* Do the reader page first */
 	ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
 	if (ret < 0) {
@@ -1929,7 +1935,6 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
 	local_set(&cpu_buffer->reader_page->entries, ret);
 
-	orig_head = head_page = cpu_buffer->head_page;
 	ts = head_page->page->time_stamp;
 
 	/*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 61fe01dce7a6..b659653dc03a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1311,6 +1311,9 @@ static void remove_event_file_dir(struct trace_event_file *file)
 	free_event_filter(file->filter);
 	file->flags |= EVENT_FILE_FL_FREED;
 	event_file_put(file);
+
+	/* Wake up hist poll waiters to notice the EVENT_FILE_FL_FREED flag. */
+	hist_poll_wakeup();
 }
 
 /*
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index e6f449f53afc..768df987419e 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -5784,7 +5784,7 @@ static __poll_t event_hist_poll(struct file *file, struct poll_table_struct *wai
 
 	guard(mutex)(&event_mutex);
 
-	event_file = event_file_data(file);
+	event_file = event_file_file(file);
 	if (!event_file)
 		return EPOLLERR;
 
@@ -5822,7 +5822,7 @@ static int event_hist_open(struct inode *inode, struct file *file)
 
 	guard(mutex)(&event_mutex);
 
-	event_file = event_file_data(file);
+	event_file = event_file_file(file);
 	if (!event_file) {
 		ret = -ENODEV;
 		goto err;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL] tracing: Fixes for 7.0
@ 2026-03-05 15:39 Steven Rostedt
  2026-03-05 16:44 ` Linus Torvalds
  2026-03-05 19:43 ` pr-tracker-bot
  0 siblings, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2026-03-05 15:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Huiwen He,
	Jerome Marchand, Qing Wang, Shengming Hu


Linus,

tracing fixes for v7.0:

- Fix thresh_return of function graph tracer

  The update to store data on the shadow stack removed the abuse of
  using the task recursion word as a way to keep track of what functions
  to ignore. The trace_graph_return() was updated to handle this, but
  when function_graph tracer is using a threshold (only trace functions
  that took longer than a specified time), it uses
  trace_graph_thresh_return() instead. This function was still incorrectly
  using the task struct recursion word causing the function graph tracer to
  permanently set all functions to "notrace"

- Fix thresh_return nosleep accounting

  When the calltime was moved to the shadow stack storage instead of being
  on the fgraph descriptor, the calculations for the amount of sleep time
  was updated. The calculation was done in the trace_graph_thresh_return()
  function, which also called the trace_graph_return(), which did the
  calculation again, causing the time to be doubled.

  Remove the call to trace_graph_return() as what it needed to do wasn't
  that much, and just do the work in trace_graph_thresh_return().

- Fix syscall trace event activation on boot up

  The syscall trace events are pseudo events attached to the raw_syscall
  tracepoints. When the first syscall event is enabled, it enables the
  raw_syscall tracepoint and doesn't need to do anything when a second
  syscall event is also enabled.

  When events are enabled via the kernel command line, syscall events
  are partially enabled as the enabling is called before rcu_init.
  This is due to allow early events to be enabled immediately. Because
  kernel command line events do not distinguish between different
  types of events, the syscall events are enabled here but are not fully
  functioning. After rcu_init, they are disabled and re-enabled so that
  they can be fully enabled. The problem happened is that this
  "disable-enable" is done one at a time. If more than one syscall event
  is specified on the command line, by disabling them one at a time,
  the counter never gets to zero, and the raw_syscall is not disabled and
  enabled, keeping the syscall events in their non-fully functional state.

  Instead, disable all events and re-enabled them all, as that will ensure
  the raw_syscall event is also disabled and re-enabled.

- Disable preemption in ftrace pid filtering

  The ftrace pid filtering attaches to the fork and exit tracepoints to
  add or remove pids that should be traced. They access variables protected
  by RCU (preemption disabled). Now that tracepoint callbacks are called with
  preemption enabled, this protection needs to be added explicitly, and
  not depend on the functions being called with preemption disabled.

- Disable preemption in event pid filtering

  The event pid filtering needs the same preemption disabling guards as
  ftrace pid filtering.

- Fix accounting of the memory mapped ring buffer on fork

  Memory mapping the ftrace ring buffer sets the vm_flags to DONTCOPY. But
  this does not prevent the application from calling madvise(MADVISE_DOFORK).
  This causes the mapping to be copied on fork. After the first tasks exits,
  the mapping is considered unmapped by everyone. But when he second task
  exits, the counter goes below zero and triggers a WARN_ON.

  Since nothing prevents two separate tasks from mmapping the ftrace ring
  buffer (although two mappings may mess each other up), there's no reason
  to stop the memory from being copied on fork.

  Update the vm_operations to have an ".open" handler to update the
  accounting and let the ring buffer know someone else has it mapped.

- Add all ftrace headers in MAINTAINERS file

  The MAINTAINERS file only specifies include/linux/ftrace.h But misses
  ftrace_irq.h and ftrace_regs.h. Make the file use wildcards to get all
  *ftrace* files.


Please pull the latest trace-v7.0-rc2 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace-v7.0-rc2

Tag SHA1: eb510ef50609f82ad56503c2fddb8e40b9b8ad3c
Head SHA1: f26b098d937488e8f5c617d465760a10bfcc7f13


Huiwen He (1):
      tracing: Fix syscall events activation by ensuring refcount hits zero

Jerome Marchand (1):
      ftrace: Add MAINTAINERS entries for all ftrace headers

Masami Hiramatsu (Google) (1):
      tracing: Disable preemption in the tracepoint callbacks handling filtered pids

Qing Wang (1):
      tracing: Fix WARN_ON in tracing_buffers_mmap_close

Shengming Hu (2):
      fgraph: Fix thresh_return clear per-task notrace
      fgraph: Fix thresh_return nosleeptime double-adjust

Steven Rostedt (1):
      ftrace: Disable preemption in the tracepoint callbacks handling filtered pids

----
 MAINTAINERS                          |  2 +-
 include/linux/ring_buffer.h          |  1 +
 kernel/trace/ftrace.c                |  2 ++
 kernel/trace/ring_buffer.c           | 21 ++++++++++++++
 kernel/trace/trace.c                 | 13 +++++++++
 kernel/trace/trace_events.c          | 54 ++++++++++++++++++++++++++----------
 kernel/trace/trace_functions_graph.c | 19 +++++++++----
 7 files changed, 90 insertions(+), 22 deletions(-)
---------------------------
diff --git a/MAINTAINERS b/MAINTAINERS
index 61bf550fd37c..b8d1ad952827 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10484,7 +10484,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
 F:	Documentation/trace/ftrace*
 F:	arch/*/*/*/*ftrace*
 F:	arch/*/*/*ftrace*
-F:	include/*/ftrace.h
+F:	include/*/*ftrace*
 F:	kernel/trace/fgraph.c
 F:	kernel/trace/ftrace*
 F:	samples/ftrace
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 876358cfe1b1..d862fa610270 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -248,6 +248,7 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
 
 int ring_buffer_map(struct trace_buffer *buffer, int cpu,
 		    struct vm_area_struct *vma);
+void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu);
 int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
 int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 827fb9a0bf0d..2f72af0357e5 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -8611,6 +8611,7 @@ ftrace_pid_follow_sched_process_fork(void *data,
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = data;
 
+	guard(preempt)();
 	pid_list = rcu_dereference_sched(tr->function_pids);
 	trace_filter_add_remove_task(pid_list, self, task);
 
@@ -8624,6 +8625,7 @@ ftrace_pid_follow_sched_process_exit(void *data, struct task_struct *task)
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = data;
 
+	guard(preempt)();
 	pid_list = rcu_dereference_sched(tr->function_pids);
 	trace_filter_add_remove_task(pid_list, NULL, task);
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f16f053ef77d..17d0ea0cc3e6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7310,6 +7310,27 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
 	return err;
 }
 
+/*
+ * This is called when a VMA is duplicated (e.g., on fork()) to increment
+ * the user_mapped counter without remapping pages.
+ */
+void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+
+	if (WARN_ON(!cpumask_test_cpu(cpu, buffer->cpumask)))
+		return;
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	guard(mutex)(&cpu_buffer->mapping_lock);
+
+	if (cpu_buffer->user_mapped)
+		__rb_inc_dec_mapped(cpu_buffer, true);
+	else
+		WARN(1, "Unexpected buffer stat, it should be mapped");
+}
+
 int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 23de3719f495..1e7c032a72d2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8213,6 +8213,18 @@ static inline int get_snapshot_map(struct trace_array *tr) { return 0; }
 static inline void put_snapshot_map(struct trace_array *tr) { }
 #endif
 
+/*
+ * This is called when a VMA is duplicated (e.g., on fork()) to increment
+ * the user_mapped counter without remapping pages.
+ */
+static void tracing_buffers_mmap_open(struct vm_area_struct *vma)
+{
+	struct ftrace_buffer_info *info = vma->vm_file->private_data;
+	struct trace_iterator *iter = &info->iter;
+
+	ring_buffer_map_dup(iter->array_buffer->buffer, iter->cpu_file);
+}
+
 static void tracing_buffers_mmap_close(struct vm_area_struct *vma)
 {
 	struct ftrace_buffer_info *info = vma->vm_file->private_data;
@@ -8232,6 +8244,7 @@ static int tracing_buffers_may_split(struct vm_area_struct *vma, unsigned long a
 }
 
 static const struct vm_operations_struct tracing_buffers_vmops = {
+	.open		= tracing_buffers_mmap_open,
 	.close		= tracing_buffers_mmap_close,
 	.may_split      = tracing_buffers_may_split,
 };
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9928da636c9d..b7343fdfd7b0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1039,6 +1039,7 @@ event_filter_pid_sched_process_exit(void *data, struct task_struct *task)
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = data;
 
+	guard(preempt)();
 	pid_list = rcu_dereference_raw(tr->filtered_pids);
 	trace_filter_add_remove_task(pid_list, NULL, task);
 
@@ -1054,6 +1055,7 @@ event_filter_pid_sched_process_fork(void *data,
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = data;
 
+	guard(preempt)();
 	pid_list = rcu_dereference_sched(tr->filtered_pids);
 	trace_filter_add_remove_task(pid_list, self, task);
 
@@ -4668,26 +4670,22 @@ static __init int event_trace_memsetup(void)
 	return 0;
 }
 
-__init void
-early_enable_events(struct trace_array *tr, char *buf, bool disable_first)
+/*
+ * Helper function to enable or disable a comma-separated list of events
+ * from the bootup buffer.
+ */
+static __init void __early_set_events(struct trace_array *tr, char *buf, bool enable)
 {
 	char *token;
-	int ret;
-
-	while (true) {
-		token = strsep(&buf, ",");
-
-		if (!token)
-			break;
 
+	while ((token = strsep(&buf, ","))) {
 		if (*token) {
-			/* Restarting syscalls requires that we stop them first */
-			if (disable_first)
+			if (enable) {
+				if (ftrace_set_clr_event(tr, token, 1))
+					pr_warn("Failed to enable trace event: %s\n", token);
+			} else {
 				ftrace_set_clr_event(tr, token, 0);
-
-			ret = ftrace_set_clr_event(tr, token, 1);
-			if (ret)
-				pr_warn("Failed to enable trace event: %s\n", token);
+			}
 		}
 
 		/* Put back the comma to allow this to be called again */
@@ -4696,6 +4694,32 @@ early_enable_events(struct trace_array *tr, char *buf, bool disable_first)
 	}
 }
 
+/**
+ * early_enable_events - enable events from the bootup buffer
+ * @tr: The trace array to enable the events in
+ * @buf: The buffer containing the comma separated list of events
+ * @disable_first: If true, disable all events in @buf before enabling them
+ *
+ * This function enables events from the bootup buffer. If @disable_first
+ * is true, it will first disable all events in the buffer before enabling
+ * them.
+ *
+ * For syscall events, which rely on a global refcount to register the
+ * SYSCALL_WORK_SYSCALL_TRACEPOINT flag (especially for pid 1), we must
+ * ensure the refcount hits zero before re-enabling them. A simple
+ * "disable then enable" per-event is not enough if multiple syscalls are
+ * used, as the refcount will stay above zero. Thus, we need a two-phase
+ * approach: disable all, then enable all.
+ */
+__init void
+early_enable_events(struct trace_array *tr, char *buf, bool disable_first)
+{
+	if (disable_first)
+		__early_set_events(tr, buf, false);
+
+	__early_set_events(tr, buf, true);
+}
+
 static __init int event_trace_enable(void)
 {
 	struct trace_array *tr = top_trace_array();
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 3d8239fee004..0d2d3a2ea7dd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -400,14 +400,19 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 				      struct fgraph_ops *gops,
 				      struct ftrace_regs *fregs)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
 	struct fgraph_times *ftimes;
 	struct trace_array *tr;
+	unsigned int trace_ctx;
+	u64 calltime, rettime;
 	int size;
 
+	rettime = trace_clock_local();
+
 	ftrace_graph_addr_finish(gops, trace);
 
-	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
-		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+	if (*task_var & TRACE_GRAPH_NOTRACE) {
+		*task_var &= ~TRACE_GRAPH_NOTRACE;
 		return;
 	}
 
@@ -418,11 +423,13 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 	tr = gops->private;
 	handle_nosleeptime(tr, trace, ftimes, size);
 
-	if (tracing_thresh &&
-	    (trace_clock_local() - ftimes->calltime < tracing_thresh))
+	calltime = ftimes->calltime;
+
+	if (tracing_thresh && (rettime - calltime < tracing_thresh))
 		return;
-	else
-		trace_graph_return(trace, gops, fregs);
+
+	trace_ctx = tracing_gen_ctx();
+	__trace_graph_return(tr, trace, trace_ctx, calltime, rettime);
 }
 
 static struct fgraph_ops funcgraph_ops = {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 15:39 Steven Rostedt
@ 2026-03-05 16:44 ` Linus Torvalds
  2026-03-05 16:52   ` Steven Rostedt
                     ` (2 more replies)
  2026-03-05 19:43 ` pr-tracker-bot
  1 sibling, 3 replies; 14+ messages in thread
From: Linus Torvalds @ 2026-03-05 16:44 UTC (permalink / raw)
  To: Steven Rostedt, David Hildenbrand, Jason Gunthorpe,
	Leon Romanovsky
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma

[ Adding linux-mm and the rdma people ]

On Thu, 5 Mar 2026 at 07:39, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Fix accounting of the memory mapped ring buffer on fork
>
>   Memory mapping the ftrace ring buffer sets the vm_flags to DONTCOPY. But
>   this does not prevent the application from calling madvise(MADVISE_DOFORK).

I wonder how many other things have this assumption.

Now, many (most?) of the VM_DONTCOPY users set VM_IO too, because the
most common reason they don't want to be copied is that it's a special
mapping.

And then madvise() does nothing.

But I also get the feeling that the whole *reason* for MADV_DOFORK
existing in the first place simply doesn't exist any more.

It was added two decades ago when as a hack for the rdma people who
wanted to mix fork (with COW) and concurrent DMA, which just didn't
work reliably because the COW would break either way.

See commit f822566165dd ("[PATCH] madvise MADV_DONTFORK/MADV_DOFORK").

And that should just not be an issue any more thanks to how it's now
done with page pinning rather than with the old GUP interfaces.

So while I've pulled the tracing fix, I get the feeling that people
should at least think about just making MADV_{DO,DONT}FORK go away.

Now, Debian code search does show some users (libfabric, libibverbs),
and maybe they actually want the forking behavior for other reasons
too.

But I get the feeling that maybe we should at least limit MADV_DOFORK
only to the case where the *source* of the DONTFORK was the user, not
some kernel mapping.

                  Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 16:44 ` Linus Torvalds
@ 2026-03-05 16:52   ` Steven Rostedt
  2026-03-05 17:00   ` David Hildenbrand (Arm)
  2026-03-05 19:07   ` Jason Gunthorpe
  2 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2026-03-05 16:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Hildenbrand, Jason Gunthorpe, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma, Lorenzo Stoakes

On Thu, 5 Mar 2026 08:44:27 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> [ Adding linux-mm and the rdma people ]
> 
> On Thu, 5 Mar 2026 at 07:39, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > - Fix accounting of the memory mapped ring buffer on fork
> >
> >   Memory mapping the ftrace ring buffer sets the vm_flags to DONTCOPY. But
> >   this does not prevent the application from calling madvise(MADVISE_DOFORK).  
> 
> I wonder how many other things have this assumption.
> 
> Now, many (most?) of the VM_DONTCOPY users set VM_IO too, because the
> most common reason they don't want to be copied is that it's a special
> mapping.
> 
> And then madvise() does nothing.
> 
> But I also get the feeling that the whole *reason* for MADV_DOFORK
> existing in the first place simply doesn't exist any more.
> 
> It was added two decades ago when as a hack for the rdma people who
> wanted to mix fork (with COW) and concurrent DMA, which just didn't
> work reliably because the COW would break either way.
> 
> See commit f822566165dd ("[PATCH] madvise MADV_DONTFORK/MADV_DOFORK").
> 
> And that should just not be an issue any more thanks to how it's now
> done with page pinning rather than with the old GUP interfaces.
> 
> So while I've pulled the tracing fix, I get the feeling that people
> should at least think about just making MADV_{DO,DONT}FORK go away.
> 
> Now, Debian code search does show some users (libfabric, libibverbs),
> and maybe they actually want the forking behavior for other reasons
> too.
> 
> But I get the feeling that maybe we should at least limit MADV_DOFORK
> only to the case where the *source* of the DONTFORK was the user, not
> some kernel mapping.

Right, I was a bit confused when I saw this too. I asked the memory folks
about adding the VM_IO, and Lorenzo suggested against it. You can read the
discussion here:

   https://lore.kernel.org/all/e4deff21-2fb5-4f37-a7d3-ede5f69a4489@lucifer.local/

-- Steve


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 16:44 ` Linus Torvalds
  2026-03-05 16:52   ` Steven Rostedt
@ 2026-03-05 17:00   ` David Hildenbrand (Arm)
  2026-03-05 17:17     ` Linus Torvalds
  2026-03-05 19:07   ` Jason Gunthorpe
  2 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-05 17:00 UTC (permalink / raw)
  To: Linus Torvalds, Steven Rostedt, Jason Gunthorpe, Leon Romanovsky
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma

On 3/5/26 17:44, Linus Torvalds wrote:
> [ Adding linux-mm and the rdma people ]
> 
> On Thu, 5 Mar 2026 at 07:39, Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>> - Fix accounting of the memory mapped ring buffer on fork
>>
>>   Memory mapping the ftrace ring buffer sets the vm_flags to DONTCOPY. But
>>   this does not prevent the application from calling madvise(MADVISE_DOFORK).
> 
> I wonder how many other things have this assumption.
> 
> Now, many (most?) of the VM_DONTCOPY users set VM_IO too, because the
> most common reason they don't want to be copied is that it's a special
> mapping.
> 
> And then madvise() does nothing.
> 
> But I also get the feeling that the whole *reason* for MADV_DOFORK
> existing in the first place simply doesn't exist any more.
> 
> It was added two decades ago when as a hack for the rdma people who
> wanted to mix fork (with COW) and concurrent DMA, which just didn't
> work reliably because the COW would break either way.

Yes.

> 
> See commit f822566165dd ("[PATCH] madvise MADV_DONTFORK/MADV_DOFORK").
> 
> And that should just not be an issue any more thanks to how it's now
> done with page pinning rather than with the old GUP interfaces.
> 

There are still weird cases with O_DIRECT and concurrent fork, where we
don't use FOLL_PIN just yet (I know, I know, ... all shaky).

QEMU traditionally sets MADV_DONTFORK on guest RAM. One reason is to
speed up fork(), because it doesn't need all the guest RAM in fork'ed
child processes.

> So while I've pulled the tracing fix, I get the feeling that people
> should at least think about just making MADV_{DO,DONT}FORK go away.
> 
> Now, Debian code search does show some users (libfabric, libibverbs),
> and maybe they actually want the forking behavior for other reasons
> too.
> 

I suspect we cannot get rid of it that easily. But ...

> But I get the feeling that maybe we should at least limit MADV_DOFORK
> only to the case where the *source* of the DONTFORK was the user, not
> some kernel mapping.
... that makes sense. Forbid toggling it on something that has
VM_SPECIAL set, maybe.

Or at least forbid re-enabling it for such mappings.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 17:00   ` David Hildenbrand (Arm)
@ 2026-03-05 17:17     ` Linus Torvalds
  2026-03-05 18:59       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2026-03-05 17:17 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Steven Rostedt, Jason Gunthorpe, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma

On Thu, 5 Mar 2026 at 09:00, David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> QEMU traditionally sets MADV_DONTFORK on guest RAM. One reason is to
> speed up fork(), because it doesn't need all the guest RAM in fork'ed
> child processes.

Yes, I think the MADV_DONTFORK thing makes sense on its own - more so
than MADV_DOFORK does.

Because it's a very valid thing for user space to do exactly for that
"speed up fork()" case.

It's similar to how we also export a MADV_WIPEONFORK - for a different
use-case, where we don't want the copying behavior (typically because
we want the child to re-create its own set of data: I thin the main
reason tends to be for things like reseeding random number generation
after fork etc).

So it's just MADV_DOFORK I don't particularly like, because it had
pre-existing kernel semantics (the VM_DONTCOPY bit predates the MADV_*
bits by many many years).

Not copying on fork is always safe. But copying something that the
kernel has said "don't copy" just sounds *wrong*.

> > But I get the feeling that maybe we should at least limit MADV_DOFORK
> > only to the case where the *source* of the DONTFORK was the user, not
> > some kernel mapping.
>
> ... that makes sense. Forbid toggling it on something that has
> VM_SPECIAL set, maybe.

Yeah, I think VM_SPECIAL would be a better match than just checking
VM_IO.  At least it would also catch things like that VM_DONTEXPAND,
and PFN mappings.

So just changing the existing VM_IO test to cover all the VM_SPECIAL
bits would be a simple improvement.

Maybe I should just do that and see if anybody even notices (and
revert and re-think if somebody does)

                 Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 17:17     ` Linus Torvalds
@ 2026-03-05 18:59       ` David Hildenbrand (Arm)
  2026-03-06 10:33         ` Lorenzo Stoakes (Oracle)
  2026-03-06 16:50         ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-05 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Jason Gunthorpe, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma, Lorenzo Stoakes

On 3/5/26 18:17, Linus Torvalds wrote:
> On Thu, 5 Mar 2026 at 09:00, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> QEMU traditionally sets MADV_DONTFORK on guest RAM. One reason is to
>> speed up fork(), because it doesn't need all the guest RAM in fork'ed
>> child processes.
> 
> Yes, I think the MADV_DONTFORK thing makes sense on its own - more so
> than MADV_DOFORK does.
> 
> Because it's a very valid thing for user space to do exactly for that
> "speed up fork()" case.
> 
> It's similar to how we also export a MADV_WIPEONFORK - for a different
> use-case, where we don't want the copying behavior (typically because
> we want the child to re-create its own set of data: I thin the main
> reason tends to be for things like reseeding random number generation
> after fork etc).
> 
> So it's just MADV_DOFORK I don't particularly like, because it had
> pre-existing kernel semantics (the VM_DONTCOPY bit predates the MADV_*
> bits by many many years).
> 
> Not copying on fork is always safe. But copying something that the
> kernel has said "don't copy" just sounds *wrong*.
> 
>>> But I get the feeling that maybe we should at least limit MADV_DOFORK
>>> only to the case where the *source* of the DONTFORK was the user, not
>>> some kernel mapping.
>>
>> ... that makes sense. Forbid toggling it on something that has
>> VM_SPECIAL set, maybe.

CCing Lorenzo.

> 
> Yeah, I think VM_SPECIAL would be a better match than just checking
> VM_IO.  At least it would also catch things like that VM_DONTEXPAND,
> and PFN mappings.
> 
> So just changing the existing VM_IO test to cover all the VM_SPECIAL
> bits would be a simple improvement.

Ack.

> 
> Maybe I should just do that and see if anybody even notices (and
> revert and re-think if somebody does)

Agreed. We could think about letting it sit a bit in -next before moving
it to mainline.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 16:44 ` Linus Torvalds
  2026-03-05 16:52   ` Steven Rostedt
  2026-03-05 17:00   ` David Hildenbrand (Arm)
@ 2026-03-05 19:07   ` Jason Gunthorpe
  2 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2026-03-05 19:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, David Hildenbrand, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma

On Thu, Mar 05, 2026 at 08:44:27AM -0800, Linus Torvalds wrote:
> Now, Debian code search does show some users (libfabric, libibverbs),
> and maybe they actually want the forking behavior for other reasons
> too.

DOFORK in libibverbs is a consequence of its automatic DONTFORK, is is
explcitily there to undo a prior DONTFORK only.

This is because the library wrappers would automatically DONTFORK
regions of memory based on library usage, and then DOFORK them back to
normal after the library is done with them.

This whole path is disabled on modern kernels in favour of the fixed
fork support.

I think your point that DOFORK should only take action on previous
DONTFORK is correct.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 15:39 Steven Rostedt
  2026-03-05 16:44 ` Linus Torvalds
@ 2026-03-05 19:43 ` pr-tracker-bot
  1 sibling, 0 replies; 14+ messages in thread
From: pr-tracker-bot @ 2026-03-05 19:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, LKML, Masami Hiramatsu, Mathieu Desnoyers,
	Huiwen He, Jerome Marchand, Qing Wang, Shengming Hu

The pull request you sent on Thu, 5 Mar 2026 10:39:41 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git trace-v7.0-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/18ecff396c9edb9add34f612d9fb99bb34833cc0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 18:59       ` David Hildenbrand (Arm)
@ 2026-03-06 10:33         ` Lorenzo Stoakes (Oracle)
  2026-03-06 16:50         ` Linus Torvalds
  1 sibling, 0 replies; 14+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-06 10:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Linus Torvalds, Steven Rostedt, Jason Gunthorpe, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma, Lorenzo Stoakes

On Thu, Mar 05, 2026 at 07:59:14PM +0100, David Hildenbrand (Arm) wrote:
> On 3/5/26 18:17, Linus Torvalds wrote:
> > On Thu, 5 Mar 2026 at 09:00, David Hildenbrand (Arm) <david@kernel.org> wrote:
> >>
> >> QEMU traditionally sets MADV_DONTFORK on guest RAM. One reason is to
> >> speed up fork(), because it doesn't need all the guest RAM in fork'ed
> >> child processes.
> >
> > Yes, I think the MADV_DONTFORK thing makes sense on its own - more so
> > than MADV_DOFORK does.
> >
> > Because it's a very valid thing for user space to do exactly for that
> > "speed up fork()" case.
> >
> > It's similar to how we also export a MADV_WIPEONFORK - for a different
> > use-case, where we don't want the copying behavior (typically because
> > we want the child to re-create its own set of data: I thin the main
> > reason tends to be for things like reseeding random number generation
> > after fork etc).
> >
> > So it's just MADV_DOFORK I don't particularly like, because it had
> > pre-existing kernel semantics (the VM_DONTCOPY bit predates the MADV_*
> > bits by many many years).
> >
> > Not copying on fork is always safe. But copying something that the
> > kernel has said "don't copy" just sounds *wrong*.
> >
> >>> But I get the feeling that maybe we should at least limit MADV_DOFORK
> >>> only to the case where the *source* of the DONTFORK was the user, not
> >>> some kernel mapping.
> >>
> >> ... that makes sense. Forbid toggling it on something that has
> >> VM_SPECIAL set, maybe.

Yes, I agree. It's odd that we explicitly gate on VM_IO there, it's unusual.

>
> CCing Lorenzo.
>
> >
> > Yeah, I think VM_SPECIAL would be a better match than just checking
> > VM_IO.  At least it would also catch things like that VM_DONTEXPAND,
> > and PFN mappings.

Some of the madvise() operations explicitly work with PFN mappings, but it
really makes no sense to fiddle with them in this case.

> >
> > So just changing the existing VM_IO test to cover all the VM_SPECIAL
> > bits would be a simple improvement.
>
> Ack.


Feel free to add:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

To a patch that simply does something like:

diff --git a/mm/madvise.c b/mm/madvise.c
index c0370d9b4e23..dbb69400786d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1389,7 +1389,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
 		new_flags |= VM_DONTCOPY;
 		break;
 	case MADV_DOFORK:
-		if (new_flags & VM_IO)
+		if (new_flags & VM_SPECIAL)
 			return -EINVAL;
 		new_flags &= ~VM_DONTCOPY;
 		break;
--
2.53.0

That makes me wonder about whether we want to permit VM_DONTFORK for
MADV_DONTFORK, it's kinda a weird usecase but anyway this is the safer
change for now as I think it's pretty obviously sane.

>
> >
> > Maybe I should just do that and see if anybody even notices (and
> > revert and re-think if somebody does)
>
> Agreed. We could think about letting it sit a bit in -next before moving
> it to mainline.

I would eat my hat, board a flying pig and note the sound of several trees
falling when there's nobody around if anybody complained :)

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-05 18:59       ` David Hildenbrand (Arm)
  2026-03-06 10:33         ` Lorenzo Stoakes (Oracle)
@ 2026-03-06 16:50         ` Linus Torvalds
  2026-03-06 16:58           ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2026-03-06 16:50 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Steven Rostedt, Jason Gunthorpe, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma, Lorenzo Stoakes

On Thu, 5 Mar 2026 at 10:59, David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> Agreed. We could think about letting it sit a bit in -next before moving
> it to mainline.

Honestly, I doubt it would get any testing in -next. Yes, -next gets
some boot testing and maybe on a good day somebody runs LTP or some
other test suite on it, but almost nobody actually *uses* it.

I think I'll just apply this now while it's early, and see if anybody
notices. Lorenzo will apparently be shitting in the woods if they do.

             Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-06 16:50         ` Linus Torvalds
@ 2026-03-06 16:58           ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-06 16:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Jason Gunthorpe, Leon Romanovsky,
	Masami Hiramatsu, Mathieu Desnoyers, Huiwen He, Jerome Marchand,
	Qing Wang, Shengming Hu, Linux-MM, linux-rdma, Lorenzo Stoakes

On 3/6/26 17:50, Linus Torvalds wrote:
> On Thu, 5 Mar 2026 at 10:59, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> Agreed. We could think about letting it sit a bit in -next before moving
>> it to mainline.
> 
> Honestly, I doubt it would get any testing in -next. Yes, -next gets
> some boot testing and maybe on a good day somebody runs LTP or some
> other test suite on it, but almost nobody actually *uses* it.

It does get some testing, but yes, it's not the go-to mechanism to get
some enterprise workload-level feedback :)

> 
> I think I'll just apply this now while it's early, and see if anybody
> notices. Lorenzo will apparently be shitting in the woods if they do.

:D :D :D

Works for me!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [GIT PULL] tracing: Fixes for 7.0
@ 2026-03-22 15:47 Steven Rostedt
  2026-03-22 18:45 ` pr-tracker-bot
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2026-03-22 15:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton,
	Jiri Olsa, Xuewen Yan

Linus,

tracing fixes for 7.0:

- Revert "tracing: Remove pid in task_rename tracing output"

  A change was made to remove the pid field from the task_rename event
  because it was thought that it was always done for the current task and
  recording the pid would be redundant. This turned out to be incorrect and
  there are a few corner case where this is not true and caused some
  regressions in tooling.

- Fix the reading from user space for migration

  The reading of user space uses a seq lock type of logic where it uses a
  per-cpu temporary buffer and disables migration, then enables preemption,
  does the copy from user space, disables preemption, enables migration and
  checks if there was any schedule switches while preemption was enabled. If
  there was a context switch, then it is considered that the per-cpu buffer
  could be corrupted and it tries again. There's a protection check that
  tests if it takes a hundred tries, it issues a warning and exits out to
  prevent a live lock.

  This was triggered because the task was selected by the load balancer to
  be migrated to another CPU, every time preemption is enabled the migration
  task would schedule in try to migrate the task but can't because migration
  is disabled and let it run again. This caused the scheduler to schedule out
  the task every time it enabled preemption and made the loop never exit
  (until the 100 iteration test triggered).

  Fix this by enabling and disabling preemption and keeping migration
  enabled if the reading from user space needs to be done again. This will
  let the migration thread migrate the task and the copy from user space
  will likely pass on the next iteration.

- Fix trace_marker copy option freeing

  The "copy_trace_marker" option allows a tracing instance to get a copy of
  a write to the trace_marker file of the top level instance. This is
  managed by a link list protected by RCU. When an instance is removed, a
  check is made if the option is set, and if so synchronized_rcu() is
  called. The problem is that an iteration is made to reset all the flags to
  what they were when the instance was created (to perform clean ups) was
  done before the check of the copy_trace_marker option and that option was
  cleared, so the synchronize_rcu() was never called.

  Move the clearing of all the flags after the check of copy_trace_marker to
  do synchronize_rcu() so that the option is still set if it was before and
  the synchronization is performed.

- Fix entries setting when validating the persistent ring buffer

  When validating the persistent ring buffer on boot up, the number of
  events per sub-buffer is added to the sub-buffer meta page. The validator
  was updating cpu_buffer->head_page (the first sub-buffer of the per-cpu
  buffer) and not the "head_page" variable that was iterating the
  sub-buffers. This was causing the first sub-buffer to be assigned the
  entries for each sub-buffer and not the sub-buffer that was supposed to be
  updated.

- Use "hash" value to update the direct callers

  When updating the ftrace direct callers, it assigned a temporary callback
  to all the callback functions of the ftrace ops and not just the
  functions represented by the passed in hash. This causes an unnecessary
  slow down of the functions of the ftrace_ops that is not being modified.
  Only update the functions that are going to be modified to call the
  ftrace loop function so that the update can be made on those functions.

Please pull the latest trace-v7.0-rc4 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace-v7.0-rc4

Tag SHA1: c1d7d0804e221b6c0789184efcb354ccea104f2f
Head SHA1: 50b35c9e50a865600344ab1d8f9a8b3384d7e63d

Jiri Olsa (1):
      ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod

Masami Hiramatsu (Google) (1):
      ring-buffer: Fix to update per-subbuf entries of persistent ring buffer

Steven Rostedt (2):
      tracing: Fix failure to read user space from system call trace events
      tracing: Fix trace_marker copy link list updates

Xuewen Yan (1):
      tracing: Revert "tracing: Remove pid in task_rename tracing output"

----
 include/trace/events/task.h |  7 +++++--
 kernel/trace/ftrace.c       |  4 ++--
 kernel/trace/ring_buffer.c  |  2 +-
 kernel/trace/trace.c        | 36 +++++++++++++++++++++++++++---------
 4 files changed, 35 insertions(+), 14 deletions(-)
---------------------------
diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index 4f0759634306..b9a129eb54d9 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -38,19 +38,22 @@ TRACE_EVENT(task_rename,
 	TP_ARGS(task, comm),

 	TP_STRUCT__entry(
+		__field(	pid_t,	pid)
 		__array(	char, oldcomm,  TASK_COMM_LEN)
 		__array(	char, newcomm,  TASK_COMM_LEN)
 		__field(	short,	oom_score_adj)
 	),

 	TP_fast_assign(
+		__entry->pid = task->pid;
 		memcpy(entry->oldcomm, task->comm, TASK_COMM_LEN);
 		strscpy(entry->newcomm, comm, TASK_COMM_LEN);
 		__entry->oom_score_adj = task->signal->oom_score_adj;
 	),

-	TP_printk("oldcomm=%s newcomm=%s oom_score_adj=%hd",
-		  __entry->oldcomm, __entry->newcomm, __entry->oom_score_adj)
+	TP_printk("pid=%d oldcomm=%s newcomm=%s oom_score_adj=%hd",
+		__entry->pid, __entry->oldcomm,
+		__entry->newcomm, __entry->oom_score_adj)
 );

 /**
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8df69e702706..413310912609 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6606,9 +6606,9 @@ int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace_hash *hash, b
 	if (!orig_hash)
 		goto unlock;

-	/* Enable the tmp_ops to have the same functions as the direct ops */
+	/* Enable the tmp_ops to have the same functions as the hash object. */
 	ftrace_ops_init(&tmp_ops);
-	tmp_ops.func_hash = ops->func_hash;
+	tmp_ops.func_hash->filter_hash = hash;

 	err = register_ftrace_function_nolock(&tmp_ops);
 	if (err)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 17d0ea0cc3e6..170170bd83bd 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2053,7 +2053,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)

 		entries += ret;
 		entry_bytes += local_read(&head_page->page->commit);
-		local_set(&cpu_buffer->head_page->entries, ret);
+		local_set(&head_page->entries, ret);

 		if (head_page == cpu_buffer->commit_page)
 			break;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ebd996f8710e..a626211ceb9a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -555,7 +555,7 @@ static bool update_marker_trace(struct trace_array *tr, int enabled)
 	lockdep_assert_held(&event_mutex);

 	if (enabled) {
-		if (!list_empty(&tr->marker_list))
+		if (tr->trace_flags & TRACE_ITER(COPY_MARKER))
 			return false;

 		list_add_rcu(&tr->marker_list, &marker_copies);
@@ -563,10 +563,10 @@ static bool update_marker_trace(struct trace_array *tr, int enabled)
 		return true;
 	}

-	if (list_empty(&tr->marker_list))
+	if (!(tr->trace_flags & TRACE_ITER(COPY_MARKER)))
 		return false;

-	list_del_init(&tr->marker_list);
+	list_del_rcu(&tr->marker_list);
 	tr->trace_flags &= ~TRACE_ITER(COPY_MARKER);
 	return true;
 }
@@ -6783,6 +6783,23 @@ char *trace_user_fault_read(struct trace_user_buf_info *tinfo,
 	 */

 	do {
+		/*
+		 * It is possible that something is trying to migrate this
+		 * task. What happens then, is when preemption is enabled,
+		 * the migration thread will preempt this task, try to
+		 * migrate it, fail, then let it run again. That will
+		 * cause this to loop again and never succeed.
+		 * On failures, enabled and disable preemption with
+		 * migration enabled, to allow the migration thread to
+		 * migrate this task.
+		 */
+		if (trys) {
+			preempt_enable_notrace();
+			preempt_disable_notrace();
+			cpu = smp_processor_id();
+			buffer = per_cpu_ptr(tinfo->tbuf, cpu)->buf;
+		}
+
 		/*
 		 * If for some reason, copy_from_user() always causes a context
 		 * switch, this would then cause an infinite loop.
@@ -9744,18 +9761,19 @@ static int __remove_instance(struct trace_array *tr)

 	list_del(&tr->list);

-	/* Disable all the flags that were enabled coming in */
-	for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) {
-		if ((1ULL << i) & ZEROED_TRACE_FLAGS)
-			set_tracer_flag(tr, 1ULL << i, 0);
-	}
-
 	if (printk_trace == tr)
 		update_printk_trace(&global_trace);

+	/* Must be done before disabling all the flags */
 	if (update_marker_trace(tr, 0))
 		synchronize_rcu();

+	/* Disable all the flags that were enabled coming in */
+	for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) {
+		if ((1ULL << i) & ZEROED_TRACE_FLAGS)
+			set_tracer_flag(tr, 1ULL << i, 0);
+	}
+
 	tracing_set_nop(tr);
 	clear_ftrace_function_probes(tr);
 	event_trace_del_tracer(tr);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] tracing: Fixes for 7.0
  2026-03-22 15:47 [GIT PULL] tracing: Fixes for 7.0 Steven Rostedt
@ 2026-03-22 18:45 ` pr-tracker-bot
  0 siblings, 0 replies; 14+ messages in thread
From: pr-tracker-bot @ 2026-03-22 18:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, LKML, Masami Hiramatsu, Mathieu Desnoyers,
	Andrew Morton, Jiri Olsa, Xuewen Yan

The pull request you sent on Sun, 22 Mar 2026 11:47:05 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git trace-v7.0-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ac57fa9faf716c6a0e30128c2c313443cf633019

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-22 18:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 15:47 [GIT PULL] tracing: Fixes for 7.0 Steven Rostedt
2026-03-22 18:45 ` pr-tracker-bot
  -- strict thread matches above, loose matches on Subject: below --
2026-03-05 15:39 Steven Rostedt
2026-03-05 16:44 ` Linus Torvalds
2026-03-05 16:52   ` Steven Rostedt
2026-03-05 17:00   ` David Hildenbrand (Arm)
2026-03-05 17:17     ` Linus Torvalds
2026-03-05 18:59       ` David Hildenbrand (Arm)
2026-03-06 10:33         ` Lorenzo Stoakes (Oracle)
2026-03-06 16:50         ` Linus Torvalds
2026-03-06 16:58           ` David Hildenbrand (Arm)
2026-03-05 19:07   ` Jason Gunthorpe
2026-03-05 19:43 ` pr-tracker-bot
2026-02-19 21:01 Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.