* [for-next][PATCH 1/8] tracing: Support to dump instance traces by ftrace_dump_on_oops
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 2/8] tracefs: Remove SLAB_MEM_SPREAD flag usage Steven Rostedt
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Ross Zwisler, mcgrof, keescook, j.granados, corbet, Huang Yiwei
From: Huang Yiwei <quic_hyiwei@quicinc.com>
Currently ftrace only dumps the global trace buffer on an OOPs. For
debugging a production usecase, instance trace will be helpful to
check specific problems since global trace buffer may be used for
other purposes.
This patch extend the ftrace_dump_on_oops parameter to dump a specific
or multiple trace instances:
- ftrace_dump_on_oops=0: as before -- don't dump
- ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer
on all CPUs
- ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global
trace buffer on CPU that triggered the oops
- ftrace_dump_on_oops=<instance_name>: new behavior -- dump the
tracing instance matching <instance_name>
- ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu],
<instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace
buffer and multiple instance buffer on all CPUs, or only dump on CPU
that triggered the oops if =2 or =orig_cpu is given
Also, the sysctl node can handle the input accordingly.
Link: https://lore.kernel.org/linux-trace-kernel/20240223083126.1817731-1-quic_hyiwei@quicinc.com
Cc: Ross Zwisler <zwisler@google.com>
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <mcgrof@kernel.org>
Cc: <keescook@chromium.org>
Cc: <j.granados@samsung.com>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <corbet@lwn.net>
Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
.../admin-guide/kernel-parameters.txt | 26 ++-
Documentation/admin-guide/sysctl/kernel.rst | 30 +++-
include/linux/ftrace.h | 4 +-
include/linux/kernel.h | 1 +
kernel/sysctl.c | 4 +-
kernel/trace/trace.c | 156 +++++++++++++-----
kernel/trace/trace_selftest.c | 2 +-
7 files changed, 168 insertions(+), 55 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 31b3a25680d0..3d6ea8e80c2f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1561,12 +1561,28 @@
The above will cause the "foo" tracing instance to trigger
a snapshot at the end of boot up.
- ftrace_dump_on_oops[=orig_cpu]
+ ftrace_dump_on_oops[=2(orig_cpu) | =<instance>][,<instance> |
+ ,<instance>=2(orig_cpu)]
[FTRACE] will dump the trace buffers on oops.
- If no parameter is passed, ftrace will dump
- buffers of all CPUs, but if you pass orig_cpu, it will
- dump only the buffer of the CPU that triggered the
- oops.
+ If no parameter is passed, ftrace will dump global
+ buffers of all CPUs, if you pass 2 or orig_cpu, it
+ will dump only the buffer of the CPU that triggered
+ the oops, or the specific instance will be dumped if
+ its name is passed. Multiple instance dump is also
+ supported, and instances are separated by commas. Each
+ instance supports only dump on CPU that triggered the
+ oops by passing 2 or orig_cpu to it.
+
+ ftrace_dump_on_oops=foo=orig_cpu
+
+ The above will dump only the buffer of "foo" instance
+ on CPU that triggered the oops.
+
+ ftrace_dump_on_oops,foo,bar=orig_cpu
+
+ The above will dump global buffer on all CPUs, the
+ buffer of "foo" instance on all CPUs and the buffer
+ of "bar" instance on CPU that triggered the oops.
ftrace_filter=[function-list]
[FTRACE] Limit the functions traced by the function
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 6584a1f9bfe3..ea8e5f152edc 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -296,12 +296,30 @@ kernel panic). This will output the contents of the ftrace buffers to
the console. This is very useful for capturing traces that lead to
crashes and outputting them to a serial console.
-= ===================================================
-0 Disabled (default).
-1 Dump buffers of all CPUs.
-2 Dump the buffer of the CPU that triggered the oops.
-= ===================================================
-
+======================= ===========================================
+0 Disabled (default).
+1 Dump buffers of all CPUs.
+2(orig_cpu) Dump the buffer of the CPU that triggered the
+ oops.
+<instance> Dump the specific instance buffer on all CPUs.
+<instance>=2(orig_cpu) Dump the specific instance buffer on the CPU
+ that triggered the oops.
+======================= ===========================================
+
+Multiple instance dump is also supported, and instances are separated
+by commas. If global buffer also needs to be dumped, please specify
+the dump mode (1/2/orig_cpu) first for global buffer.
+
+So for example to dump "foo" and "bar" instance buffer on all CPUs,
+user can::
+
+ echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops
+
+To dump global buffer and "foo" instance buffer on all
+CPUs along with the "bar" instance buffer on CPU that triggered the
+oops, user can::
+
+ echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops
ftrace_enabled, stack_tracer_enabled
====================================
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e8921871ef9a..54d53f345d14 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1151,7 +1151,9 @@ static inline void unpause_graph_tracing(void) { }
#ifdef CONFIG_TRACING
enum ftrace_dump_mode;
-extern enum ftrace_dump_mode ftrace_dump_on_oops;
+#define MAX_TRACER_SIZE 100
+extern char ftrace_dump_on_oops[];
+extern int ftrace_dump_on_oops_enabled(void);
extern int tracepoint_printk;
extern void disable_trace_on_warning(void);
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d9ad21058eed..b142a4f41d34 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -255,6 +255,7 @@ enum ftrace_dump_mode {
DUMP_NONE,
DUMP_ALL,
DUMP_ORIG,
+ DUMP_PARAM,
};
#ifdef CONFIG_TRACING
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 157f7ce2942d..81cc974913bb 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1710,9 +1710,9 @@ static struct ctl_table kern_table[] = {
{
.procname = "ftrace_dump_on_oops",
.data = &ftrace_dump_on_oops,
- .maxlen = sizeof(int),
+ .maxlen = MAX_TRACER_SIZE,
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dostring,
},
{
.procname = "traceoff_on_warning",
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1bcfbc21fb3e..ff0b0a999171 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -130,9 +130,12 @@ cpumask_var_t __read_mostly tracing_buffer_mask;
* /proc/sys/kernel/ftrace_dump_on_oops
* Set 1 if you want to dump buffers of all CPUs
* Set 2 if you want to dump the buffer of the CPU that triggered oops
+ * Set instance name if you want to dump the specific trace instance
+ * Multiple instance dump is also supported, and instances are seperated
+ * by commas.
*/
-
-enum ftrace_dump_mode ftrace_dump_on_oops;
+/* Set to string format zero to disable by default */
+char ftrace_dump_on_oops[MAX_TRACER_SIZE] = "0";
/* When set, tracing will stop when a WARN*() is hit */
int __disable_trace_on_warning;
@@ -178,7 +181,6 @@ static void ftrace_trace_userstack(struct trace_array *tr,
struct trace_buffer *buffer,
unsigned int trace_ctx);
-#define MAX_TRACER_SIZE 100
static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata;
static char *default_bootup_tracer;
@@ -201,19 +203,33 @@ static int __init set_cmdline_ftrace(char *str)
}
__setup("ftrace=", set_cmdline_ftrace);
+int ftrace_dump_on_oops_enabled(void)
+{
+ if (!strcmp("0", ftrace_dump_on_oops))
+ return 0;
+ else
+ return 1;
+}
+
static int __init set_ftrace_dump_on_oops(char *str)
{
- if (*str++ != '=' || !*str || !strcmp("1", str)) {
- ftrace_dump_on_oops = DUMP_ALL;
+ if (!*str) {
+ strscpy(ftrace_dump_on_oops, "1", MAX_TRACER_SIZE);
return 1;
}
- if (!strcmp("orig_cpu", str) || !strcmp("2", str)) {
- ftrace_dump_on_oops = DUMP_ORIG;
- return 1;
- }
+ if (*str == ',') {
+ strscpy(ftrace_dump_on_oops, "1", MAX_TRACER_SIZE);
+ strscpy(ftrace_dump_on_oops + 1, str, MAX_TRACER_SIZE - 1);
+ return 1;
+ }
+
+ if (*str++ == '=') {
+ strscpy(ftrace_dump_on_oops, str, MAX_TRACER_SIZE);
+ return 1;
+ }
- return 0;
+ return 0;
}
__setup("ftrace_dump_on_oops", set_ftrace_dump_on_oops);
@@ -9932,14 +9948,14 @@ static struct notifier_block trace_die_notifier = {
static int trace_die_panic_handler(struct notifier_block *self,
unsigned long ev, void *unused)
{
- if (!ftrace_dump_on_oops)
+ if (!ftrace_dump_on_oops_enabled())
return NOTIFY_DONE;
/* The die notifier requires DIE_OOPS to trigger */
if (self == &trace_die_notifier && ev != DIE_OOPS)
return NOTIFY_DONE;
- ftrace_dump(ftrace_dump_on_oops);
+ ftrace_dump(DUMP_PARAM);
return NOTIFY_DONE;
}
@@ -9980,12 +9996,12 @@ trace_printk_seq(struct trace_seq *s)
trace_seq_init(s);
}
-void trace_init_global_iter(struct trace_iterator *iter)
+static void trace_init_iter(struct trace_iterator *iter, struct trace_array *tr)
{
- iter->tr = &global_trace;
+ iter->tr = tr;
iter->trace = iter->tr->current_trace;
iter->cpu_file = RING_BUFFER_ALL_CPUS;
- iter->array_buffer = &global_trace.array_buffer;
+ iter->array_buffer = &tr->array_buffer;
if (iter->trace && iter->trace->open)
iter->trace->open(iter);
@@ -10005,22 +10021,19 @@ void trace_init_global_iter(struct trace_iterator *iter)
iter->fmt_size = STATIC_FMT_BUF_SIZE;
}
-void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
+void trace_init_global_iter(struct trace_iterator *iter)
+{
+ trace_init_iter(iter, &global_trace);
+}
+
+static void ftrace_dump_one(struct trace_array *tr, enum ftrace_dump_mode dump_mode)
{
/* use static because iter can be a bit big for the stack */
static struct trace_iterator iter;
- static atomic_t dump_running;
- struct trace_array *tr = &global_trace;
unsigned int old_userobj;
unsigned long flags;
int cnt = 0, cpu;
- /* Only allow one dump user at a time. */
- if (atomic_inc_return(&dump_running) != 1) {
- atomic_dec(&dump_running);
- return;
- }
-
/*
* Always turn off tracing when we dump.
* We don't need to show trace output of what happens
@@ -10029,12 +10042,12 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
* If the user does a sysrq-z, then they can re-enable
* tracing with echo 1 > tracing_on.
*/
- tracing_off();
+ tracer_tracing_off(tr);
local_irq_save(flags);
/* Simulate the iterator */
- trace_init_global_iter(&iter);
+ trace_init_iter(&iter, tr);
for_each_tracing_cpu(cpu) {
atomic_inc(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
@@ -10045,21 +10058,15 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
/* don't look at user memory in panic mode */
tr->trace_flags &= ~TRACE_ITER_SYM_USEROBJ;
- switch (oops_dump_mode) {
- case DUMP_ALL:
- iter.cpu_file = RING_BUFFER_ALL_CPUS;
- break;
- case DUMP_ORIG:
+ if (dump_mode == DUMP_ORIG)
iter.cpu_file = raw_smp_processor_id();
- break;
- case DUMP_NONE:
- goto out_enable;
- default:
- printk(KERN_TRACE "Bad dumping mode, switching to all CPUs dump\n");
+ else
iter.cpu_file = RING_BUFFER_ALL_CPUS;
- }
- printk(KERN_TRACE "Dumping ftrace buffer:\n");
+ if (tr == &global_trace)
+ printk(KERN_TRACE "Dumping ftrace buffer:\n");
+ else
+ printk(KERN_TRACE "Dumping ftrace instance %s buffer:\n", tr->name);
/* Did function tracer already get disabled? */
if (ftrace_is_dead()) {
@@ -10101,15 +10108,84 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
else
printk(KERN_TRACE "---------------------------------\n");
- out_enable:
tr->trace_flags |= old_userobj;
for_each_tracing_cpu(cpu) {
atomic_dec(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
}
- atomic_dec(&dump_running);
local_irq_restore(flags);
}
+
+static void ftrace_dump_by_param(void)
+{
+ bool first_param = true;
+ char dump_param[MAX_TRACER_SIZE];
+ char *buf, *token, *inst_name;
+ struct trace_array *tr;
+
+ strscpy(dump_param, ftrace_dump_on_oops, MAX_TRACER_SIZE);
+ buf = dump_param;
+
+ while ((token = strsep(&buf, ",")) != NULL) {
+ if (first_param) {
+ first_param = false;
+ if (!strcmp("0", token))
+ continue;
+ else if (!strcmp("1", token)) {
+ ftrace_dump_one(&global_trace, DUMP_ALL);
+ continue;
+ }
+ else if (!strcmp("2", token) ||
+ !strcmp("orig_cpu", token)) {
+ ftrace_dump_one(&global_trace, DUMP_ORIG);
+ continue;
+ }
+ }
+
+ inst_name = strsep(&token, "=");
+ tr = trace_array_find(inst_name);
+ if (!tr) {
+ printk(KERN_TRACE "Instance %s not found\n", inst_name);
+ continue;
+ }
+
+ if (token && (!strcmp("2", token) ||
+ !strcmp("orig_cpu", token)))
+ ftrace_dump_one(tr, DUMP_ORIG);
+ else
+ ftrace_dump_one(tr, DUMP_ALL);
+ }
+}
+
+void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
+{
+ static atomic_t dump_running;
+
+ /* Only allow one dump user at a time. */
+ if (atomic_inc_return(&dump_running) != 1) {
+ atomic_dec(&dump_running);
+ return;
+ }
+
+ switch (oops_dump_mode) {
+ case DUMP_ALL:
+ ftrace_dump_one(&global_trace, DUMP_ALL);
+ break;
+ case DUMP_ORIG:
+ ftrace_dump_one(&global_trace, DUMP_ORIG);
+ break;
+ case DUMP_PARAM:
+ ftrace_dump_by_param();
+ break;
+ case DUMP_NONE:
+ break;
+ default:
+ printk(KERN_TRACE "Bad dumping mode, switching to all CPUs dump\n");
+ ftrace_dump_one(&global_trace, DUMP_ALL);
+ }
+
+ atomic_dec(&dump_running);
+}
EXPORT_SYMBOL_GPL(ftrace_dump);
#define WRITE_BUFSIZE 4096
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 529590499b1f..e9c5058a8efd 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -768,7 +768,7 @@ static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
ftrace_graph_stop();
printk(KERN_WARNING "BUG: Function graph tracer hang!\n");
- if (ftrace_dump_on_oops) {
+ if (ftrace_dump_on_oops_enabled()) {
ftrace_dump(DUMP_ALL);
/* ftrace_dump() disables tracing */
tracing_on();
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 2/8] tracefs: Remove SLAB_MEM_SPREAD flag usage
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 1/8] tracing: Support to dump instance traces by ftrace_dump_on_oops Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 3/8] tracing: Use div64_u64() instead of do_div() Steven Rostedt
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove
its usage so we can delete it from slab. No functional change.
Link: https://lore.kernel.org/linux-trace-kernel/20240224135206.830300-1-chengming.zhou@linux.dev
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/inode.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index d65ffad4c327..5545e6bf7d26 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -731,7 +731,6 @@ static int __init tracefs_init(void)
tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
sizeof(struct tracefs_inode),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|
SLAB_ACCOUNT),
init_once);
if (!tracefs_inode_cachep)
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 3/8] tracing: Use div64_u64() instead of do_div()
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 1/8] tracing: Support to dump instance traces by ftrace_dump_on_oops Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 2/8] tracefs: Remove SLAB_MEM_SPREAD flag usage Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 4/8] tracepoints: Use WARN() and not WARN_ON() for warnings Steven Rostedt
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thorsten Blum
From: Thorsten Blum <thorsten.blum@toblux.com>
Fixes Coccinelle/coccicheck warnings reported by do_div.cocci.
Compared to do_div(), div64_u64() does not implicitly cast the divisor and
does not unnecessarily calculate the remainder.
Link: https://lore.kernel.org/linux-trace-kernel/20240225164507.232942-2-thorsten.blum@toblux.com
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_benchmark.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index 54d5fa35c90a..811b08439406 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -92,7 +92,6 @@ static void trace_do_benchmark(void)
bm_total += delta;
bm_totalsq += delta * delta;
-
if (bm_cnt > 1) {
/*
* Apply Welford's method to calculate standard deviation:
@@ -105,7 +104,7 @@ static void trace_do_benchmark(void)
stddev = 0;
delta = bm_total;
- do_div(delta, bm_cnt);
+ delta = div64_u64(delta, bm_cnt);
avg = delta;
if (stddev > 0) {
@@ -127,7 +126,7 @@ static void trace_do_benchmark(void)
seed = stddev;
if (!last_seed)
break;
- do_div(seed, last_seed);
+ seed = div64_u64(seed, last_seed);
seed += last_seed;
do_div(seed, 2);
} while (i++ < 10 && last_seed != seed);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 4/8] tracepoints: Use WARN() and not WARN_ON() for warnings
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
` (2 preceding siblings ...)
2024-03-14 14:23 ` [for-next][PATCH 3/8] tracing: Use div64_u64() instead of do_div() Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 5/8] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Steven Rostedt
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thomas Gleixner, Borislav Petkov, Paul E. McKenney
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
There are two WARN_ON*() warnings in tracepoint.h that deal with RCU
usage. But when they trigger, especially from using a TRACE_EVENT()
macro, the information is not very helpful and is confusing:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at include/trace/events/lock.h:24 lock_acquire+0x2b2/0x2d0
Where the above warning takes you to:
TRACE_EVENT(lock_acquire, <<<--- line 24 in lock.h
TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
int trylock, int read, int check,
struct lockdep_map *next_lock, unsigned long ip),
[..]
Change the WARN_ON_ONCE() to WARN_ONCE() and add a string that allows
someone to search for exactly where the bug happened.
Link: https://lore.kernel.org/linux-trace-kernel/20240228133112.0d64fb1b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/tracepoint.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 88c0ba623ee6..689b6d71590e 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -199,7 +199,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (!(cond)) \
return; \
\
- if (WARN_ON_ONCE(RCUIDLE_COND(rcuidle))) \
+ if (WARN_ONCE(RCUIDLE_COND(rcuidle), \
+ "Bad RCU usage for tracepoint")) \
return; \
\
/* keep srcu and sched-rcu usage consistent */ \
@@ -259,7 +260,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
TP_ARGS(args), \
TP_CONDITION(cond), 0); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
- WARN_ON_ONCE(!rcu_is_watching()); \
+ WARN_ONCE(!rcu_is_watching(), \
+ "RCU not watching for tracepoint"); \
} \
} \
__DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 5/8] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
` (3 preceding siblings ...)
2024-03-14 14:23 ` [for-next][PATCH 4/8] tracepoints: Use WARN() and not WARN_ON() for warnings Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 6/8] ring-buffer: Have mmapped ring buffer keep track of missed events Steven Rostedt
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
linke li
From: linke li <lilinke99@qq.com>
In function ring_buffer_iter_empty(), cpu_buffer->commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.
Link: https://lore.kernel.org/linux-trace-kernel/tencent_DFF7D3561A0686B5E8FC079150A02505180A@qq.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: linke li <lilinke99@qq.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1d7d7a701867..13250856a3a8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4346,7 +4346,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
cpu_buffer = iter->cpu_buffer;
reader = cpu_buffer->reader_page;
head_page = cpu_buffer->head_page;
- commit_page = cpu_buffer->commit_page;
+ commit_page = READ_ONCE(cpu_buffer->commit_page);
commit_ts = commit_page->page->time_stamp;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 6/8] ring-buffer: Have mmapped ring buffer keep track of missed events
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
` (4 preceding siblings ...)
2024-03-14 14:23 ` [for-next][PATCH 5/8] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 7/8] net: hns3: tracing: fix hclgevf trace event strings Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 8/8] tracing: Use strcmp() in __assign_str() WARN_ON() check Steven Rostedt
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
While testing libtracefs on the mmapped ring buffer, the test that checks
if missed events are accounted for failed when using the mapped buffer.
This is because the mapped page does not update the missed events that
were dropped because the writer filled up the ring buffer before the
reader could catch it.
Add the missed events to the reader page/sub-buffer when the IOCTL is done
and a new reader page is acquired.
Note that all accesses to the reader_page via rb_page_commit() had to be
switched to rb_page_size(), and rb_page_size() which was just a copy of
rb_page_commit() but now it masks out the RB_MISSED bits. This is needed
as the mapped reader page is still active in the ring buffer code and
where it reads the commit field of the bpage for the size, it now must
mask it otherwise the missed bits that are now set will corrupt the size
returned.
Link: https://lore.kernel.org/linux-trace-kernel/20240312175405.12fb6726@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 53 +++++++++++++++++++++++++++++++++-----
1 file changed, 47 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 13250856a3a8..abe21c47fb49 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -313,6 +313,8 @@ static u64 rb_event_time_stamp(struct ring_buffer_event *event)
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_MASK (3 << 30)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -2274,7 +2276,7 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
/* Size is determined by what has been committed */
static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
{
- return rb_page_commit(bpage);
+ return rb_page_commit(bpage) & ~RB_MISSED_MASK;
}
static __always_inline unsigned
@@ -3901,7 +3903,7 @@ static bool rb_per_cpu_empty(struct ring_buffer_per_cpu *cpu_buffer)
return true;
/* Reader should exhaust content in reader page */
- if (reader->read != rb_page_commit(reader))
+ if (reader->read != rb_page_size(reader))
return false;
/*
@@ -4372,7 +4374,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
return ((iter->head_page == commit_page && iter->head >= commit) ||
(iter->head_page == reader && commit_page == head_page &&
head_page->read == commit &&
- iter->head == rb_page_commit(cpu_buffer->reader_page)));
+ iter->head == rb_page_size(cpu_buffer->reader_page)));
}
EXPORT_SYMBOL_GPL(ring_buffer_iter_empty);
@@ -5701,7 +5703,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
event = rb_reader_event(cpu_buffer);
read = reader->read;
- commit = rb_page_commit(reader);
+ commit = rb_page_size(reader);
/* Check if any events were dropped */
missed_events = cpu_buffer->lost_events;
@@ -5778,7 +5780,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
} else {
/* update the entry counter */
cpu_buffer->read += rb_page_entries(reader);
- cpu_buffer->read_bytes += rb_page_commit(reader);
+ cpu_buffer->read_bytes += rb_page_size(reader);
/* swap the pages */
rb_init_page(bpage);
@@ -6331,6 +6333,8 @@ struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
+ struct buffer_page *reader;
+ unsigned long missed_events;
unsigned long reader_size;
unsigned long flags;
@@ -6356,9 +6360,46 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
goto out;
}
- if (WARN_ON(!rb_get_reader_page(cpu_buffer)))
+ reader = rb_get_reader_page(cpu_buffer);
+ if (WARN_ON(!reader))
goto out;
+ /* Check if any events were dropped */
+ missed_events = cpu_buffer->lost_events;
+
+ if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
+ if (missed_events) {
+ struct buffer_data_page *bpage = reader->page;
+ unsigned int commit;
+ /*
+ * Use the real_end for the data size,
+ * This gives us a chance to store the lost events
+ * on the page.
+ */
+ if (reader->real_end)
+ local_set(&bpage->commit, reader->real_end);
+ /*
+ * If there is room at the end of the page to save the
+ * missed events, then record it there.
+ */
+ commit = rb_page_size(reader);
+ if (buffer->subbuf_size - commit >= sizeof(missed_events)) {
+ memcpy(&bpage->data[commit], &missed_events,
+ sizeof(missed_events));
+ local_add(RB_MISSED_STORED, &bpage->commit);
+ }
+ local_add(RB_MISSED_EVENTS, &bpage->commit);
+ }
+ } else {
+ /*
+ * There really shouldn't be any missed events if the commit
+ * is on the reader page.
+ */
+ WARN_ON_ONCE(missed_events);
+ }
+
+ cpu_buffer->lost_events = 0;
+
goto consume;
out:
/* Some archs do not have data cache coherency between kernel and user-space */
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 7/8] net: hns3: tracing: fix hclgevf trace event strings
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
` (5 preceding siblings ...)
2024-03-14 14:23 ` [for-next][PATCH 6/8] ring-buffer: Have mmapped ring buffer keep track of missed events Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
2024-03-14 14:23 ` [for-next][PATCH 8/8] tracing: Use strcmp() in __assign_str() WARN_ON() check Steven Rostedt
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
netdev, Yisen Zhuang, Salil Mehta, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Yufeng Mo, Huazhong Tan, Jijie Shao
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The __string() and __assign_str() helper macros of the TRACE_EVENT() macro
are going through some optimizations where only the source string of
__string() will be used and the __assign_str() source will be ignored and
later removed.
To make sure that there's no issues, a new check is added between the
__string() src argument and the __assign_str() src argument that does a
strcmp() to make sure they are the same string.
The hclgevf trace events have:
__assign_str(devname, &hdev->nic.kinfo.netdev->name);
Which triggers the warning:
hclgevf_trace.h:34:39: error: passing argument 1 of ‘strcmp’ from incompatible pointer type [-Werror=incompatible-pointer-types]
34 | __assign_str(devname, &hdev->nic.kinfo.netdev->name);
[..]
arch/x86/include/asm/string_64.h:75:24: note: expected ‘const char *’ but argument is of type ‘char (*)[16]’
75 | int strcmp(const char *cs, const char *ct);
| ~~~~~~~~~~~~^~
Because __assign_str() now has:
WARN_ON_ONCE(__builtin_constant_p(src) ? \
strcmp((src), __data_offsets.dst##_ptr_) : \
(src) != __data_offsets.dst##_ptr_); \
The problem is the '&' on hdev->nic.kinfo.netdev->name. That's because
that name is:
char name[IFNAMSIZ]
Where passing an address '&' of a char array is not compatible with strcmp().
The '&' is not necessary, remove it.
Link: https://lore.kernel.org/linux-trace-kernel/20240313093454.3909afe7@gandalf.local.home
Cc: netdev <netdev@vger.kernel.org>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Yufeng Mo <moyufeng@huawei.com>
Cc: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Fixes: d8355240cf8fb ("net: hns3: add trace event support for PF/VF mailbox")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_trace.h | 8 ++++----
.../net/ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_trace.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_trace.h
index 8510b88d4982..f3cd5a376eca 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_trace.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_trace.h
@@ -24,7 +24,7 @@ TRACE_EVENT(hclge_pf_mbx_get,
__field(u8, code)
__field(u8, subcode)
__string(pciname, pci_name(hdev->pdev))
- __string(devname, &hdev->vport[0].nic.kinfo.netdev->name)
+ __string(devname, hdev->vport[0].nic.kinfo.netdev->name)
__array(u32, mbx_data, PF_GET_MBX_LEN)
),
@@ -33,7 +33,7 @@ TRACE_EVENT(hclge_pf_mbx_get,
__entry->code = req->msg.code;
__entry->subcode = req->msg.subcode;
__assign_str(pciname, pci_name(hdev->pdev));
- __assign_str(devname, &hdev->vport[0].nic.kinfo.netdev->name);
+ __assign_str(devname, hdev->vport[0].nic.kinfo.netdev->name);
memcpy(__entry->mbx_data, req,
sizeof(struct hclge_mbx_vf_to_pf_cmd));
),
@@ -56,7 +56,7 @@ TRACE_EVENT(hclge_pf_mbx_send,
__field(u8, vfid)
__field(u16, code)
__string(pciname, pci_name(hdev->pdev))
- __string(devname, &hdev->vport[0].nic.kinfo.netdev->name)
+ __string(devname, hdev->vport[0].nic.kinfo.netdev->name)
__array(u32, mbx_data, PF_SEND_MBX_LEN)
),
@@ -64,7 +64,7 @@ TRACE_EVENT(hclge_pf_mbx_send,
__entry->vfid = req->dest_vfid;
__entry->code = le16_to_cpu(req->msg.code);
__assign_str(pciname, pci_name(hdev->pdev));
- __assign_str(devname, &hdev->vport[0].nic.kinfo.netdev->name);
+ __assign_str(devname, hdev->vport[0].nic.kinfo.netdev->name);
memcpy(__entry->mbx_data, req,
sizeof(struct hclge_mbx_pf_to_vf_cmd));
),
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h
index 5d4895bb57a1..b259e95dd53c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h
@@ -23,7 +23,7 @@ TRACE_EVENT(hclge_vf_mbx_get,
__field(u8, vfid)
__field(u16, code)
__string(pciname, pci_name(hdev->pdev))
- __string(devname, &hdev->nic.kinfo.netdev->name)
+ __string(devname, hdev->nic.kinfo.netdev->name)
__array(u32, mbx_data, VF_GET_MBX_LEN)
),
@@ -31,7 +31,7 @@ TRACE_EVENT(hclge_vf_mbx_get,
__entry->vfid = req->dest_vfid;
__entry->code = le16_to_cpu(req->msg.code);
__assign_str(pciname, pci_name(hdev->pdev));
- __assign_str(devname, &hdev->nic.kinfo.netdev->name);
+ __assign_str(devname, hdev->nic.kinfo.netdev->name);
memcpy(__entry->mbx_data, req,
sizeof(struct hclge_mbx_pf_to_vf_cmd));
),
@@ -55,7 +55,7 @@ TRACE_EVENT(hclge_vf_mbx_send,
__field(u8, code)
__field(u8, subcode)
__string(pciname, pci_name(hdev->pdev))
- __string(devname, &hdev->nic.kinfo.netdev->name)
+ __string(devname, hdev->nic.kinfo.netdev->name)
__array(u32, mbx_data, VF_SEND_MBX_LEN)
),
@@ -64,7 +64,7 @@ TRACE_EVENT(hclge_vf_mbx_send,
__entry->code = req->msg.code;
__entry->subcode = req->msg.subcode;
__assign_str(pciname, pci_name(hdev->pdev));
- __assign_str(devname, &hdev->nic.kinfo.netdev->name);
+ __assign_str(devname, hdev->nic.kinfo.netdev->name);
memcpy(__entry->mbx_data, req,
sizeof(struct hclge_mbx_vf_to_pf_cmd));
),
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [for-next][PATCH 8/8] tracing: Use strcmp() in __assign_str() WARN_ON() check
2024-03-14 14:23 [for-next][PATCH 0/8] tracing: Final updates for 6.9 Steven Rostedt
` (6 preceding siblings ...)
2024-03-14 14:23 ` [for-next][PATCH 7/8] net: hns3: tracing: fix hclgevf trace event strings Steven Rostedt
@ 2024-03-14 14:23 ` Steven Rostedt
7 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2024-03-14 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Nathan Chancellor, kernel test robot
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The WARN_ON() check in __assign_str() to catch where the source variable
to the macro doesn't match the source variable to __string() gives an
error in clang:
>> include/trace/events/sunrpc.h:703:4: warning: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Wstring-compare]
670 | __assign_str(progname, "unknown");
That's because the __assign_str() macro has:
WARN_ON_ONCE((src) != __data_offsets.dst##_ptr_);
Where "src" is a string literal. Clang warns when comparing a string
literal directly as it is undefined to what the value of the literal is.
Since this is still to make sure the same string that goes to __string()
is the same as __assign_str(), for string literals do a test for that and
then use strcmp() in those cases
Note that this depends on commit 51270d573a8d ("tracing/net_sched: Fix
tracepoints that save qdisc_dev() as a string") being applied, as this was
what found that bug.
Link: https://lore.kernel.org/linux-trace-kernel/20240312113002.00031668@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402292111.KIdExylU-lkp@intel.com/
Fixes: 433e1d88a3be ("tracing: Add warning if string in __assign_str() does not match __string()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/trace/stages/stage6_event_callback.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/trace/stages/stage6_event_callback.h b/include/trace/stages/stage6_event_callback.h
index a0c15f67eabe..83da83a0c14f 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -35,7 +35,9 @@
do { \
char *__str__ = __get_str(dst); \
int __len__ = __get_dynamic_array_len(dst) - 1; \
- WARN_ON_ONCE((src) != __data_offsets.dst##_ptr_); \
+ WARN_ON_ONCE(__builtin_constant_p(src) ? \
+ strcmp((src), __data_offsets.dst##_ptr_) : \
+ (src) != __data_offsets.dst##_ptr_); \
memcpy(__str__, __data_offsets.dst##_ptr_ ? : \
EVENT_NULL_STR, __len__); \
__str__[__len__] = '\0'; \
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread