* [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier
@ 2024-10-04 13:56 Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 1/8] tracing: Fix trace_check_vprintf() when tp_printk is used Steven Rostedt
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:56 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
Various fixes for tracing:
- Fix tp_printk crashing the kernel
With the code that can handle a buffer from a previous boot, the
trace_check_vprintf() needed access to the delta of the address
space used by the old buffer and the current buffer. To do so,
the trace_array (tr) parameter was used. But when tp_printk is
enabled on the kernel command line, no trace buffer is used and
the trace event is sent directly to printk(). That meant the tr
field of the iterator descriptor was NULL, and since tp_printk still
uses trace_check_vprintf() it caused a NULL dereference.
- Add ptrace.h include to x86 ftrace file for completeness
- Fix rtla installation when done with out-of-tree build
- Fix the help messages in rtla that were incorrect
- Several fixes to fix races with the timerlat and hwlat code
Several locking issues were discovered with the coordination
between timerlat kthread creation and hotplug. As timerlat has
callbacks from hotplug code to start kthreads when CPUs come online.
There are also locking issues with grabbing the cpu_read_lock()
and the locks within timerlat.
Ben Hutchings (1):
tools/rtla: Fix installation from out-of-tree build
Eder Zulian (1):
rtla: Fix the help text in osnoise and timerlat top tools
Sami Tolvanen (1):
x86/ftrace: Include <asm/ptrace.h>
Steven Rostedt (1):
tracing: Fix trace_check_vprintf() when tp_printk is used
Wei Li (4):
tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline
tracing/timerlat: Drop interface_lock in stop_kthread()
tracing/timerlat: Fix a race during cpuhp processing
tracing/hwlat: Fix a race during cpuhp processing
----
arch/x86/include/asm/ftrace.h | 2 ++
kernel/trace/trace.c | 15 +++++++++++++--
kernel/trace/trace_hwlat.c | 2 ++
kernel/trace/trace_osnoise.c | 22 +++++++++++++---------
tools/tracing/rtla/Makefile.rtla | 2 +-
tools/tracing/rtla/src/osnoise_top.c | 2 +-
tools/tracing/rtla/src/timerlat_top.c | 4 ++--
7 files changed, 34 insertions(+), 15 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [for-linus][PATCH 1/8] tracing: Fix trace_check_vprintf() when tp_printk is used
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
@ 2024-10-04 13:56 ` Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 2/8] tools/rtla: Fix installation from out-of-tree build Steven Rostedt
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:56 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Alison Schofield
From: Steven Rostedt <rostedt@goodmis.org>
When the tp_printk kernel command line is used, the trace events go
directly to printk(). It is still checked via the trace_check_vprintf()
function to make sure the pointers of the trace event are legit.
The addition of reading buffers from previous boots required adding a
delta between the addresses of the previous boot and the current boot so
that the pointers in the old buffer can still be used. But this required
adding a trace_array pointer to acquire the delta offsets.
The tp_printk code does not provide a trace_array (tr) pointer, so when
the offsets were examined, a NULL pointer dereference happened and the
kernel crashed.
If the trace_array does not exist, just default the delta offsets to zero,
as that also means the trace event is not being read from a previous boot.
Link: https://lore.kernel.org/all/Zv3z5UsG_jsO9_Tb@aschofie-mobl2.lan/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241003104925.4e1b1fd9@gandalf.local.home
Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions")
Reported-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c01375adc471..1c69ca1f1088 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3697,8 +3697,8 @@ static void test_can_verify(void)
void trace_check_vprintf(struct trace_iterator *iter, const char *fmt,
va_list ap)
{
- long text_delta = iter->tr->text_delta;
- long data_delta = iter->tr->data_delta;
+ long text_delta = 0;
+ long data_delta = 0;
const char *p = fmt;
const char *str;
bool good;
@@ -3710,6 +3710,17 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt,
if (static_branch_unlikely(&trace_no_verify))
goto print;
+ /*
+ * When the kernel is booted with the tp_printk command line
+ * parameter, trace events go directly through to printk().
+ * It also is checked by this function, but it does not
+ * have an associated trace_array (tr) for it.
+ */
+ if (iter->tr) {
+ text_delta = iter->tr->text_delta;
+ data_delta = iter->tr->data_delta;
+ }
+
/* Don't bother checking when doing a ftrace_dump() */
if (iter->fmt == static_fmt_buf)
goto print;
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 2/8] tools/rtla: Fix installation from out-of-tree build
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 1/8] tracing: Fix trace_check_vprintf() when tp_printk is used Steven Rostedt
@ 2024-10-04 13:56 ` Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 3/8] rtla: Fix the help text in osnoise and timerlat top tools Steven Rostedt
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:56 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Tomas Glozar, Ben Hutchings
From: Ben Hutchings <benh@debian.org>
rtla now supports out-of-tree builds, but installation fails as it
still tries to install the rtla binary from the source tree. Use the
existing macro $(RTLA) to refer to the binary.
Link: https://lore.kernel.org/ZudubuoU_JHjPZ7w@decadent.org.uk
Fixes: 01474dc706ca ("tools/rtla: Use tools/build makefiles to build rtla")
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Tested-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
tools/tracing/rtla/Makefile.rtla | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/tracing/rtla/Makefile.rtla b/tools/tracing/rtla/Makefile.rtla
index 3ff0b8970896..cc1d6b615475 100644
--- a/tools/tracing/rtla/Makefile.rtla
+++ b/tools/tracing/rtla/Makefile.rtla
@@ -38,7 +38,7 @@ BINDIR := /usr/bin
.PHONY: install
install: doc_install
@$(MKDIR) -p $(DESTDIR)$(BINDIR)
- $(call QUIET_INSTALL,rtla)$(INSTALL) rtla -m 755 $(DESTDIR)$(BINDIR)
+ $(call QUIET_INSTALL,rtla)$(INSTALL) $(RTLA) -m 755 $(DESTDIR)$(BINDIR)
@$(STRIP) $(DESTDIR)$(BINDIR)/rtla
@test ! -f $(DESTDIR)$(BINDIR)/osnoise || $(RM) $(DESTDIR)$(BINDIR)/osnoise
@$(LN) rtla $(DESTDIR)$(BINDIR)/osnoise
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 3/8] rtla: Fix the help text in osnoise and timerlat top tools
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 1/8] tracing: Fix trace_check_vprintf() when tp_printk is used Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 2/8] tools/rtla: Fix installation from out-of-tree build Steven Rostedt
@ 2024-10-04 13:56 ` Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 4/8] x86/ftrace: Include <asm/ptrace.h> Steven Rostedt
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:56 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Tomas Glozar, Eder Zulian
From: Eder Zulian <ezulian@redhat.com>
The help text in osnoise top and timerlat top had some minor errors
and omissions. The -d option was missing the 's' (second) abbreviation and
the error message for '-d' used '-D'.
Cc: stable@vger.kernel.org
Fixes: 1eceb2fc2ca54 ("rtla/osnoise: Add osnoise top mode")
Fixes: a828cd18bc4ad ("rtla: Add timerlat tool and timelart top mode")
Link: https://lore.kernel.org/20240813155831.384446-1-ezulian@redhat.com
Suggested-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Eder Zulian <ezulian@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
tools/tracing/rtla/src/osnoise_top.c | 2 +-
tools/tracing/rtla/src/timerlat_top.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/tracing/rtla/src/osnoise_top.c b/tools/tracing/rtla/src/osnoise_top.c
index 2f756628613d..30e3853076a0 100644
--- a/tools/tracing/rtla/src/osnoise_top.c
+++ b/tools/tracing/rtla/src/osnoise_top.c
@@ -442,7 +442,7 @@ struct osnoise_top_params *osnoise_top_parse_args(int argc, char **argv)
case 'd':
params->duration = parse_seconds_duration(optarg);
if (!params->duration)
- osnoise_top_usage(params, "Invalid -D duration\n");
+ osnoise_top_usage(params, "Invalid -d duration\n");
break;
case 'e':
tevent = trace_event_alloc(optarg);
diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index 8c16419fe22a..210b0f533534 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -459,7 +459,7 @@ static void timerlat_top_usage(char *usage)
" -c/--cpus cpus: run the tracer only on the given cpus",
" -H/--house-keeping cpus: run rtla control threads only on the given cpus",
" -C/--cgroup[=cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited",
- " -d/--duration time[m|h|d]: duration of the session in seconds",
+ " -d/--duration time[s|m|h|d]: duration of the session",
" -D/--debug: print debug info",
" --dump-tasks: prints the task running on all CPUs if stop conditions are met (depends on !--no-aa)",
" -t/--trace[file]: save the stopped trace to [file|timerlat_trace.txt]",
@@ -613,7 +613,7 @@ static struct timerlat_top_params
case 'd':
params->duration = parse_seconds_duration(optarg);
if (!params->duration)
- timerlat_top_usage("Invalid -D duration\n");
+ timerlat_top_usage("Invalid -d duration\n");
break;
case 'e':
tevent = trace_event_alloc(optarg);
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 4/8] x86/ftrace: Include <asm/ptrace.h>
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
` (2 preceding siblings ...)
2024-10-04 13:56 ` [for-linus][PATCH 3/8] rtla: Fix the help text in osnoise and timerlat top tools Steven Rostedt
@ 2024-10-04 13:56 ` Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 5/8] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Steven Rostedt
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:56 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Masahiro Yamada, Sami Tolvanen
From: Sami Tolvanen <samitolvanen@google.com>
<asm/ftrace.h> uses struct pt_regs in several places. Include
<asm/ptrace.h> to ensure it's visible. This is needed to make sure
object files that only include <asm/asm-prototypes.h> compile.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/20240916221557.846853-2-samitolvanen@google.com
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
arch/x86/include/asm/ftrace.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 0152a81d9b4a..b4d719de2c84 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_FTRACE_H
#define _ASM_X86_FTRACE_H
+#include <asm/ptrace.h>
+
#ifdef CONFIG_FUNCTION_TRACER
#ifndef CC_USING_FENTRY
# error Compiler does not support fentry?
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 5/8] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
` (3 preceding siblings ...)
2024-10-04 13:56 ` [for-linus][PATCH 4/8] x86/ftrace: Include <asm/ptrace.h> Steven Rostedt
@ 2024-10-04 13:57 ` Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 6/8] tracing/timerlat: Drop interface_lock in stop_kthread() Steven Rostedt
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:57 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Wei Li
From: Wei Li <liwei391@huawei.com>
osnoise_hotplug_workfn() is the asynchronous online callback for
"trace/osnoise:online". It may be congested when a CPU goes online and
offline repeatedly and is invoked for multiple times after a certain
online.
This will lead to kthread leak and timer corruption. Add a check
in start_kthread() to prevent this situation.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240924094515.3561410-2-liwei391@huawei.com
Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_osnoise.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 1439064f65d6..d1a539913a5f 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2007,6 +2007,10 @@ static int start_kthread(unsigned int cpu)
void *main = osnoise_main;
char comm[24];
+ /* Do not start a new thread if it is already running */
+ if (per_cpu(per_cpu_osnoise_var, cpu).kthread)
+ return 0;
+
if (timerlat_enabled()) {
snprintf(comm, 24, "timerlat/%d", cpu);
main = timerlat_main;
@@ -2061,11 +2065,10 @@ static int start_per_cpu_kthreads(void)
if (cpumask_test_and_clear_cpu(cpu, &kthread_cpumask)) {
struct task_struct *kthread;
- kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread;
+ kthread = xchg_relaxed(&(per_cpu(per_cpu_osnoise_var, cpu).kthread), NULL);
if (!WARN_ON(!kthread))
kthread_stop(kthread);
}
- per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL;
}
for_each_cpu(cpu, current_mask) {
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 6/8] tracing/timerlat: Drop interface_lock in stop_kthread()
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
` (4 preceding siblings ...)
2024-10-04 13:57 ` [for-linus][PATCH 5/8] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Steven Rostedt
@ 2024-10-04 13:57 ` Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 7/8] tracing/timerlat: Fix a race during cpuhp processing Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 8/8] tracing/hwlat: " Steven Rostedt
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:57 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Wei Li
From: Wei Li <liwei391@huawei.com>
stop_kthread() is the offline callback for "trace/osnoise:online", since
commit 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing
of kthread in stop_kthread()"), the following ABBA deadlock scenario is
introduced:
T1 | T2 [BP] | T3 [AP]
osnoise_hotplug_workfn() | work_for_cpu_fn() | cpuhp_thread_fun()
| _cpu_down() | osnoise_cpu_die()
mutex_lock(&interface_lock) | | stop_kthread()
| cpus_write_lock() | mutex_lock(&interface_lock)
cpus_read_lock() | cpuhp_kick_ap() |
As the interface_lock here in just for protecting the "kthread" field of
the osn_var, use xchg() instead to fix this issue. Also use
for_each_online_cpu() back in stop_per_cpu_kthreads() as it can take
cpu_read_lock() again.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240924094515.3561410-3-liwei391@huawei.com
Fixes: 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_osnoise.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index d1a539913a5f..e22567174dd3 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -1953,12 +1953,8 @@ static void stop_kthread(unsigned int cpu)
{
struct task_struct *kthread;
- mutex_lock(&interface_lock);
- kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread;
+ kthread = xchg_relaxed(&(per_cpu(per_cpu_osnoise_var, cpu).kthread), NULL);
if (kthread) {
- per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL;
- mutex_unlock(&interface_lock);
-
if (cpumask_test_and_clear_cpu(cpu, &kthread_cpumask) &&
!WARN_ON(!test_bit(OSN_WORKLOAD, &osnoise_options))) {
kthread_stop(kthread);
@@ -1972,7 +1968,6 @@ static void stop_kthread(unsigned int cpu)
put_task_struct(kthread);
}
} else {
- mutex_unlock(&interface_lock);
/* if no workload, just return */
if (!test_bit(OSN_WORKLOAD, &osnoise_options)) {
/*
@@ -1994,8 +1989,12 @@ static void stop_per_cpu_kthreads(void)
{
int cpu;
- for_each_possible_cpu(cpu)
+ cpus_read_lock();
+
+ for_each_online_cpu(cpu)
stop_kthread(cpu);
+
+ cpus_read_unlock();
}
/*
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 7/8] tracing/timerlat: Fix a race during cpuhp processing
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
` (5 preceding siblings ...)
2024-10-04 13:57 ` [for-linus][PATCH 6/8] tracing/timerlat: Drop interface_lock in stop_kthread() Steven Rostedt
@ 2024-10-04 13:57 ` Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 8/8] tracing/hwlat: " Steven Rostedt
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:57 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Wei Li
From: Wei Li <liwei391@huawei.com>
There is another found exception that the "timerlat/1" thread was
scheduled on CPU0, and lead to timer corruption finally:
```
ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220
WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0
Modules linked in:
CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:debug_print_object+0x7d/0xb0
...
Call Trace:
<TASK>
? __warn+0x7c/0x110
? debug_print_object+0x7d/0xb0
? report_bug+0xf1/0x1d0
? prb_read_valid+0x17/0x20
? handle_bug+0x3f/0x70
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? debug_print_object+0x7d/0xb0
? debug_print_object+0x7d/0xb0
? __pfx_timerlat_irq+0x10/0x10
__debug_object_init+0x110/0x150
hrtimer_init+0x1d/0x60
timerlat_main+0xab/0x2d0
? __pfx_timerlat_main+0x10/0x10
kthread+0xb7/0xe0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2d/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
```
After tracing the scheduling event, it was discovered that the migration
of the "timerlat/1" thread was performed during thread creation. Further
analysis confirmed that it is because the CPU online processing for
osnoise is implemented through workers, which is asynchronous with the
offline processing. When the worker was scheduled to create a thread, the
CPU may has already been removed from the cpu_online_mask during the offline
process, resulting in the inability to select the right CPU:
T1 | T2
[CPUHP_ONLINE] | cpu_device_down()
osnoise_hotplug_workfn() |
| cpus_write_lock()
| takedown_cpu(1)
| cpus_write_unlock()
[CPUHP_OFFLINE] |
cpus_read_lock() |
start_kthread(1) |
cpus_read_unlock() |
To fix this, skip online processing if the CPU is already offline.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240924094515.3561410-4-liwei391@huawei.com
Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_osnoise.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index e22567174dd3..a50ed23bee77 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2097,6 +2097,8 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy)
mutex_lock(&interface_lock);
cpus_read_lock();
+ if (!cpu_online(cpu))
+ goto out_unlock;
if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
goto out_unlock;
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [for-linus][PATCH 8/8] tracing/hwlat: Fix a race during cpuhp processing
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
` (6 preceding siblings ...)
2024-10-04 13:57 ` [for-linus][PATCH 7/8] tracing/timerlat: Fix a race during cpuhp processing Steven Rostedt
@ 2024-10-04 13:57 ` Steven Rostedt
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-10-04 13:57 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
stable, Wei Li
From: Wei Li <liwei391@huawei.com>
The cpuhp online/offline processing race also exists in percpu-mode hwlat
tracer in theory, apply the fix too. That is:
T1 | T2
[CPUHP_ONLINE] | cpu_device_down()
hwlat_hotplug_workfn() |
| cpus_write_lock()
| takedown_cpu(1)
| cpus_write_unlock()
[CPUHP_OFFLINE] |
cpus_read_lock() |
start_kthread(1) |
cpus_read_unlock() |
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240924094515.3561410-5-liwei391@huawei.com
Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_hwlat.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index b791524a6536..3bd6071441ad 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -520,6 +520,8 @@ static void hwlat_hotplug_workfn(struct work_struct *dummy)
if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU)
goto out_unlock;
+ if (!cpu_online(cpu))
+ goto out_unlock;
if (!cpumask_test_cpu(cpu, tr->tracing_cpumask))
goto out_unlock;
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-10-04 13:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 13:56 [for-linus][PATCH 0/8] tracing: Fixes for v6.12 and earlier Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 1/8] tracing: Fix trace_check_vprintf() when tp_printk is used Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 2/8] tools/rtla: Fix installation from out-of-tree build Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 3/8] rtla: Fix the help text in osnoise and timerlat top tools Steven Rostedt
2024-10-04 13:56 ` [for-linus][PATCH 4/8] x86/ftrace: Include <asm/ptrace.h> Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 5/8] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 6/8] tracing/timerlat: Drop interface_lock in stop_kthread() Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 7/8] tracing/timerlat: Fix a race during cpuhp processing Steven Rostedt
2024-10-04 13:57 ` [for-linus][PATCH 8/8] tracing/hwlat: " Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox