public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [for-next][PATCH 00/11] tracing: Updates for v6.18
@ 2025-09-30 17:01 Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 01/11] tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE() Steven Rostedt
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

Bah, I just noticed these were still sitting in my local repository
and I never pushed them up. Mostly clean ups anyway. No new features.

Just finished running them through my internal tests.

-- Steve


  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/for-next

Head SHA1: db04eea49423f28faf2e2fb6e799b78f2b38564f


Elijah Wright (1):
      tracing: Move buffer in trace_seq to end of struct

Fushuai Wang (1):
      tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu()

Liao Yuanhong (1):
      tracing: Remove redundant 0 value initialization

Marco Crivellari (1):
      tracing: replace use of system_wq with system_percpu_wq

Michal Koutný (1):
      tracing: Ensure optimized hashing works

Qianfeng Rong (1):
      tracing: Use vmalloc_array() to improve code

Sasha Levin (1):
      tracing: Fix lock imbalance in s_start() memory allocation failure path

Steven Rostedt (2):
      tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE()
      tracing: Have syscall trace events show "0x" for values greater than 10

Thorsten Blum (1):
      tracing/osnoise: Replace kmalloc() + copy_from_user() with memdup_user()

Vladimir Riabchun (1):
      ftrace: Fix softlockup in ftrace_module_enable

----
 include/linux/trace_seq.h         |  2 +-
 kernel/trace/ftrace.c             |  2 ++
 kernel/trace/trace.h              |  4 ++--
 kernel/trace/trace_events.c       |  3 +--
 kernel/trace/trace_events_user.c  |  2 +-
 kernel/trace/trace_osnoise.c      | 13 +++++--------
 kernel/trace/trace_sched_switch.c |  3 ++-
 kernel/trace/trace_syscalls.c     | 26 +++++++++++++++-----------
 kernel/trace/tracing_map.c        |  2 +-
 9 files changed, 30 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [for-next][PATCH 01/11] tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE()
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 02/11] tracing: Have syscall trace events show "0x" for values greater than 10 Steven Rostedt
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Peter Zijlstra, Namhyung Kim, Takaya Saeki, Tom Zanussi,
	Thomas Gleixner, Ian Rogers, Douglas Raillard, Paul E. McKenney

From: Steven Rostedt <rostedt@goodmis.org>

The syscall events are pseudo events that hook to the raw syscalls. The
ftrace_syscall_enter/exit() callback is called by the raw_syscall
enter/exit tracepoints respectively whenever any of the syscall events are
enabled.

The trace_array has an array of syscall "files" that correspond to the
system calls based on their __NR_SYSCALL number. The array is read and if
there's a pointer to a trace_event_file then it is considered enabled and
if it is NULL that syscall event is considered disabled.

Currently it uses an rcu_dereference_sched() to get this pointer and a
rcu_assign_ptr() or RCU_INIT_POINTER() to write to it. This is unnecessary
as the file pointer will not go away outside the synchronization of the
tracepoint logic itself. And this code adds no extra RCU synchronization
that uses this.

Replace these functions with a simple READ_ONCE() and WRITE_ONCE() which
is all they need. This will also allow this code to not depend on
preemption being disabled as system call tracepoints are now allowed to
fault.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Link: https://lore.kernel.org/20250923130713.594320290@kernel.org
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace.h          |  4 ++--
 kernel/trace/trace_syscalls.c | 14 ++++++--------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 5f4bed5842f9..85eabb454bee 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -380,8 +380,8 @@ struct trace_array {
 #ifdef CONFIG_FTRACE_SYSCALLS
 	int			sys_refcount_enter;
 	int			sys_refcount_exit;
-	struct trace_event_file __rcu *enter_syscall_files[NR_syscalls];
-	struct trace_event_file __rcu *exit_syscall_files[NR_syscalls];
+	struct trace_event_file	*enter_syscall_files[NR_syscalls];
+	struct trace_event_file	*exit_syscall_files[NR_syscalls];
 #endif
 	int			stop_count;
 	int			clock_id;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 46aab0ab9350..3a0b65f89130 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -310,8 +310,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
 
-	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
-	trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]);
+	trace_file = READ_ONCE(tr->enter_syscall_files[syscall_nr]);
 	if (!trace_file)
 		return;
 
@@ -356,8 +355,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
 
-	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */
-	trace_file = rcu_dereference_sched(tr->exit_syscall_files[syscall_nr]);
+	trace_file = READ_ONCE(tr->exit_syscall_files[syscall_nr]);
 	if (!trace_file)
 		return;
 
@@ -393,7 +391,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 	if (!tr->sys_refcount_enter)
 		ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
 	if (!ret) {
-		rcu_assign_pointer(tr->enter_syscall_files[num], file);
+		WRITE_ONCE(tr->enter_syscall_files[num], file);
 		tr->sys_refcount_enter++;
 	}
 	mutex_unlock(&syscall_trace_lock);
@@ -411,7 +409,7 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
 		return;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_enter--;
-	RCU_INIT_POINTER(tr->enter_syscall_files[num], NULL);
+	WRITE_ONCE(tr->enter_syscall_files[num], NULL);
 	if (!tr->sys_refcount_enter)
 		unregister_trace_sys_enter(ftrace_syscall_enter, tr);
 	mutex_unlock(&syscall_trace_lock);
@@ -431,7 +429,7 @@ static int reg_event_syscall_exit(struct trace_event_file *file,
 	if (!tr->sys_refcount_exit)
 		ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
 	if (!ret) {
-		rcu_assign_pointer(tr->exit_syscall_files[num], file);
+		WRITE_ONCE(tr->exit_syscall_files[num], file);
 		tr->sys_refcount_exit++;
 	}
 	mutex_unlock(&syscall_trace_lock);
@@ -449,7 +447,7 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 		return;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_exit--;
-	RCU_INIT_POINTER(tr->exit_syscall_files[num], NULL);
+	WRITE_ONCE(tr->exit_syscall_files[num], NULL);
 	if (!tr->sys_refcount_exit)
 		unregister_trace_sys_exit(ftrace_syscall_exit, tr);
 	mutex_unlock(&syscall_trace_lock);
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 02/11] tracing: Have syscall trace events show "0x" for values greater than 10
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 01/11] tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE() Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 03/11] tracing: Use vmalloc_array() to improve code Steven Rostedt
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Peter Zijlstra, Namhyung Kim, Takaya Saeki, Tom Zanussi,
	Thomas Gleixner, Ian Rogers, Douglas Raillard

From: Steven Rostedt <rostedt@goodmis.org>

Currently the syscall trace events show each value as hexadecimal, but
without adding "0x" it can be confusing:

   sys_write(fd: 4, buf: 0x55c4a1fa9270, count: 44)

Looks like the above write wrote 44 bytes, when in reality it wrote 68
bytes.

Add a "0x" for all values greater or equal to 10 to remove the ambiguity.
For values less than 10, leave off the "0x" as that just adds noise to the
output.

Also change the iterator to check if "i" is nonzero and print the ", "
delimiter at the start, then adding the logic to the trace_seq_printf() at
the end.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Link: https://lore.kernel.org/20250923130713.764558957@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_syscalls.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 3a0b65f89130..0f932b22f9ec 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -153,14 +153,20 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
 		if (trace_seq_has_overflowed(s))
 			goto end;
 
+		if (i)
+			trace_seq_puts(s, ", ");
+
 		/* parameter types */
 		if (tr && tr->trace_flags & TRACE_ITER_VERBOSE)
 			trace_seq_printf(s, "%s ", entry->types[i]);
 
 		/* parameter values */
-		trace_seq_printf(s, "%s: %lx%s", entry->args[i],
-				 trace->args[i],
-				 i == entry->nb_args - 1 ? "" : ", ");
+		if (trace->args[i] < 10)
+			trace_seq_printf(s, "%s: %lu", entry->args[i],
+					 trace->args[i]);
+		else
+			trace_seq_printf(s, "%s: 0x%lx", entry->args[i],
+					 trace->args[i]);
 	}
 
 	trace_seq_putc(s, ')');
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 03/11] tracing: Use vmalloc_array() to improve code
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 01/11] tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE() Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 02/11] tracing: Have syscall trace events show "0x" for values greater than 10 Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 04/11] tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu() Steven Rostedt
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Qianfeng Rong

From: Qianfeng Rong <rongqianfeng@vivo.com>

Remove array_size() calls and replace vmalloc() with vmalloc_array() in
tracing_map_sort_entries().  vmalloc_array() is optimized better, uses
fewer instructions, and handles overflow more concisely[1].

[1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250817084725.59477-1-rongqianfeng@vivo.com
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/tracing_map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 1921ade45be3..7f8da4dab69d 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -1076,7 +1076,7 @@ int tracing_map_sort_entries(struct tracing_map *map,
 	struct tracing_map_sort_entry *sort_entry, **entries;
 	int i, n_entries, ret;
 
-	entries = vmalloc(array_size(sizeof(sort_entry), map->max_elts));
+	entries = vmalloc_array(map->max_elts, sizeof(sort_entry));
 	if (!entries)
 		return -ENOMEM;
 
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 04/11] tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu()
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (2 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 03/11] tracing: Use vmalloc_array() to improve code Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 05/11] tracing: Move buffer in trace_seq to end of struct Steven Rostedt
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Fushuai Wang

From: Fushuai Wang <wangfushuai@baidu.com>

Replace the opencoded for_each_cpu(cpu, cpu_online_mask) loop with the
more readable and equivalent for_each_online_cpu(cpu) macro.

Link: https://lore.kernel.org/20250811064158.2456-1-wangfushuai@baidu.com
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_osnoise.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index fd259da0aa64..4cb464894faf 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -271,7 +271,7 @@ static inline void tlat_var_reset(void)
 	 * So far, all the values are initialized as 0, so
 	 * zeroing the structure is perfect.
 	 */
-	for_each_cpu(cpu, cpu_online_mask) {
+	for_each_online_cpu(cpu) {
 		tlat_var = per_cpu_ptr(&per_cpu_timerlat_var, cpu);
 		if (tlat_var->kthread)
 			hrtimer_cancel(&tlat_var->timer);
@@ -295,7 +295,7 @@ static inline void osn_var_reset(void)
 	 * So far, all the values are initialized as 0, so
 	 * zeroing the structure is perfect.
 	 */
-	for_each_cpu(cpu, cpu_online_mask) {
+	for_each_online_cpu(cpu) {
 		osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu);
 		memset(osn_var, 0, sizeof(*osn_var));
 	}
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 05/11] tracing: Move buffer in trace_seq to end of struct
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (3 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 04/11] tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu() Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 06/11] tracing: Remove redundant 0 value initialization Steven Rostedt
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Elijah Wright

From: Elijah Wright <git@elijahs.space>

TRACE_SEQ_BUFFER_SIZE is dependent on the architecture for its size. on 64-bit
systems, it is 8148 bytes. forced 8-byte alignment in size_t and seq_buf means
that trace_seq is 8200 bytes on 64-bit systems. moving the buffer to the end
of the struct fixes the issue. there shouldn't be any side effects, i.e.
pointer arithmetic on trace_seq

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250821053917.23301-1-git@elijahs.space
Signed-off-by: Elijah Wright <git@elijahs.space>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/trace_seq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
index a93ed5ac3226..557780fe1c77 100644
--- a/include/linux/trace_seq.h
+++ b/include/linux/trace_seq.h
@@ -21,10 +21,10 @@
 	(sizeof(struct seq_buf) + sizeof(size_t) + sizeof(int)))
 
 struct trace_seq {
-	char			buffer[TRACE_SEQ_BUFFER_SIZE];
 	struct seq_buf		seq;
 	size_t			readpos;
 	int			full;
+	char                    buffer[TRACE_SEQ_BUFFER_SIZE];
 };
 
 static inline void
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 06/11] tracing: Remove redundant 0 value initialization
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (4 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 05/11] tracing: Move buffer in trace_seq to end of struct Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 07/11] tracing: replace use of system_wq with system_percpu_wq Steven Rostedt
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Liao Yuanhong

From: Liao Yuanhong <liaoyuanhong@vivo.com>

The saved_cmdlines_buffer struct is already zeroed by memset(). It's
redundant to initialize s->cmdline_idx to 0.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250825123200.306272-1-liaoyuanhong@vivo.com
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_sched_switch.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index cb49f7279dc8..518dfc74347a 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -224,7 +224,6 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 	/* Place map_cmdline_to_pid array right after saved_cmdlines */
 	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
 
-	s->cmdline_idx = 0;
 	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
 	       sizeof(s->map_pid_to_cmdline));
 	memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 07/11] tracing: replace use of system_wq with system_percpu_wq
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (5 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 06/11] tracing: Remove redundant 0 value initialization Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 08/11] tracing/osnoise: Replace kmalloc() + copy_from_user() with memdup_user() Steven Rostedt
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Michal Hocko, Tejun Heo, Marco Crivellari

From: Marco Crivellari <marco.crivellari@suse.com>

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make
it clear by adding a system_percpu_wq.

queue_work() / queue_delayed_work() mod_delayed_work() will now use the
new per-cpu wq: whether the user still stick on the old name a warn will
be printed along a wq redirect to the new one.

This patch add the new system_percpu_wq except for mm, fs and net
subsystem, whom are handled in separated patches.

The old wq will be kept for a few release cylces.

Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250905091040.109772-2-marco.crivellari@suse.com
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index af42aaa3d172..3169182229ad 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -835,7 +835,7 @@ void user_event_mm_remove(struct task_struct *t)
 	 * so we use a work queue after call_rcu() to run within.
 	 */
 	INIT_RCU_WORK(&mm->put_rwork, delayed_user_event_mm_put);
-	queue_rcu_work(system_wq, &mm->put_rwork);
+	queue_rcu_work(system_percpu_wq, &mm->put_rwork);
 }
 
 void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm)
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 08/11] tracing/osnoise: Replace kmalloc() + copy_from_user() with memdup_user()
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (6 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 07/11] tracing: replace use of system_wq with system_percpu_wq Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 09/11] ftrace: Fix softlockup in ftrace_module_enable Steven Rostedt
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Thorsten Blum

From: Thorsten Blum <thorsten.blum@linux.dev>

Replace kmalloc() followed by copy_from_user() with memdup_user() to
improve and simplify osnoise_cpus_write().

No functional changes intended.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250905192116.554018-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_osnoise.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 4cb464894faf..f8aa9556c695 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2322,12 +2322,9 @@ osnoise_cpus_write(struct file *filp, const char __user *ubuf, size_t count,
 	int running, err;
 	char *buf __free(kfree) = NULL;
 
-	buf = kmalloc(count, GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
-
-	if (copy_from_user(buf, ubuf, count))
-		return -EFAULT;
+	buf = memdup_user(ubuf, count);
+	if (IS_ERR(buf))
+		return PTR_ERR(buf);
 
 	if (!zalloc_cpumask_var(&osnoise_cpumask_new, GFP_KERNEL))
 		return -ENOMEM;
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 09/11] ftrace: Fix softlockup in ftrace_module_enable
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (7 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 08/11] tracing/osnoise: Replace kmalloc() + copy_from_user() with memdup_user() Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 10/11] tracing: Ensure optimized hashing works Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 11/11] tracing: Fix lock imbalance in s_start() memory allocation failure path Steven Rostedt
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vladimir Riabchun

From: Vladimir Riabchun <ferr.lambarginio@gmail.com>

A soft lockup was observed when loading amdgpu module.
If a module has a lot of tracable functions, multiple calls
to kallsyms_lookup can spend too much time in RCU critical
section and with disabled preemption, causing kernel panic.
This is the same issue that was fixed in
commit d0b24b4e91fc ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
kernels") and commit 42ea22e754ba ("ftrace: Add cond_resched() to
ftrace_graph_set_hash()").

Fix it the same way by adding cond_resched() in ftrace_module_enable.

Link: https://lore.kernel.org/aMQD9_lxYmphT-up@vova-pc
Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a69067367c29..42bd2ba68a82 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7535,6 +7535,8 @@ void ftrace_module_enable(struct module *mod)
 		if (!within_module(rec->ip, mod))
 			break;
 
+		cond_resched();
+
 		/* Weak functions should still be ignored */
 		if (!test_for_valid_rec(rec)) {
 			/* Clear all other flags. Should not be enabled anyway */
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 10/11] tracing: Ensure optimized hashing works
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (8 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 09/11] ftrace: Fix softlockup in ftrace_module_enable Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  2025-09-30 17:01 ` [for-next][PATCH 11/11] tracing: Fix lock imbalance in s_start() memory allocation failure path Steven Rostedt
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Michal Koutný

From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny@suse.com>

If ever PID_MAX_DEFAULT changes, it must be compatible with tracing
hashmaps assumptions.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250924113810.2433478-1-mkoutny@suse.com
Link: https://lore.kernel.org/r/20240409110126.651e94cb@gandalf.local.home/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_sched_switch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index 518dfc74347a..c46d584ded3b 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -247,6 +247,8 @@ int trace_save_cmdline(struct task_struct *tsk)
 	if (!tsk->pid)
 		return 1;
 
+	BUILD_BUG_ON(!is_power_of_2(PID_MAX_DEFAULT));
+
 	tpid = tsk->pid & (PID_MAX_DEFAULT - 1);
 
 	/*
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [for-next][PATCH 11/11] tracing: Fix lock imbalance in s_start() memory allocation failure path
  2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
                   ` (9 preceding siblings ...)
  2025-09-30 17:01 ` [for-next][PATCH 10/11] tracing: Ensure optimized hashing works Steven Rostedt
@ 2025-09-30 17:01 ` Steven Rostedt
  10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2025-09-30 17:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Sasha Levin

From: Sasha Levin <sashal@kernel.org>

When s_start() fails to allocate memory for set_event_iter, it returns NULL
before acquiring event_mutex. However, the corresponding s_stop() function
always tries to unlock the mutex, causing a lock imbalance warning:

  WARNING: bad unlock balance detected!
  6.17.0-rc7-00175-g2b2e0c04f78c #7 Not tainted
  -------------------------------------
  syz.0.85611/376514 is trying to release lock (event_mutex) at:
  [<ffffffff8dafc7a4>] traverse.part.0.constprop.0+0x2c4/0x650 fs/seq_file.c:131
  but there are no more locks to release!

The issue was introduced by commit b355247df104 ("tracing: Cache ':mod:'
events for modules not loaded yet") which added the kzalloc() allocation before
the mutex lock, creating a path where s_start() could return without locking
the mutex while s_stop() would still try to unlock it.

Fix this by unconditionally acquiring the mutex immediately after allocation,
regardless of whether the allocation succeeded.

Link: https://lore.kernel.org/20250929113238.3722055-1-sashal@kernel.org
Fixes: b355247df104 ("tracing: Cache ":mod:" events for modules not loaded yet")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9f3e9537417d..e00da4182deb 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1629,11 +1629,10 @@ static void *s_start(struct seq_file *m, loff_t *pos)
 	loff_t l;
 
 	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	mutex_lock(&event_mutex);
 	if (!iter)
 		return NULL;
 
-	mutex_lock(&event_mutex);
-
 	iter->type = SET_EVENT_FILE;
 	iter->file = list_entry(&tr->events, struct trace_event_file, list);
 
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-09-30 17:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30 17:01 [for-next][PATCH 00/11] tracing: Updates for v6.18 Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 01/11] tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE() Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 02/11] tracing: Have syscall trace events show "0x" for values greater than 10 Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 03/11] tracing: Use vmalloc_array() to improve code Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 04/11] tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu() Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 05/11] tracing: Move buffer in trace_seq to end of struct Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 06/11] tracing: Remove redundant 0 value initialization Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 07/11] tracing: replace use of system_wq with system_percpu_wq Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 08/11] tracing/osnoise: Replace kmalloc() + copy_from_user() with memdup_user() Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 09/11] ftrace: Fix softlockup in ftrace_module_enable Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 10/11] tracing: Ensure optimized hashing works Steven Rostedt
2025-09-30 17:01 ` [for-next][PATCH 11/11] tracing: Fix lock imbalance in s_start() memory allocation failure path Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox