* [for-linus][PATCH 1/8] tracing: Fix cpu buffers unavailable due to record_disabled missed
[not found] <20230822023803.605698724@goodmis.org>
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 2/8] selftests/ftrace: Add a basic testcase for snapshot Steven Rostedt
` (6 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, vnagarnaik, shuah,
Zheng Yejian
From: Zheng Yejian <zhengyejian1@huawei.com>
Trace ring buffer can no longer record anything after executing
following commands at the shell prompt:
# cd /sys/kernel/tracing
# cat tracing_cpumask
fff
# echo 0 > tracing_cpumask
# echo 1 > snapshot
# echo fff > tracing_cpumask
# echo 1 > tracing_on
# echo "hello world" > trace_marker
-bash: echo: write error: Bad file descriptor
The root cause is that:
1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
(see update_max_tr());
3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
Then array_buffer and max_buffer are both unavailable due to value of
'record_disabled' is not 0.
To fix it, enable or disable both array_buffer and max_buffer at the same
time in tracing_set_cpumask().
Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Fixes: 71babb2705e2 ("tracing: change CPU ring buffer state from tracing_cpumask")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b8870078ef58..b0e8eb6ea8ac 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5277,11 +5277,17 @@ int tracing_set_cpumask(struct trace_array *tr,
!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
atomic_inc(&per_cpu_ptr(tr->array_buffer.data, cpu)->disabled);
ring_buffer_record_disable_cpu(tr->array_buffer.buffer, cpu);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ ring_buffer_record_disable_cpu(tr->max_buffer.buffer, cpu);
+#endif
}
if (!cpumask_test_cpu(cpu, tr->tracing_cpumask) &&
cpumask_test_cpu(cpu, tracing_cpumask_new)) {
atomic_dec(&per_cpu_ptr(tr->array_buffer.data, cpu)->disabled);
ring_buffer_record_enable_cpu(tr->array_buffer.buffer, cpu);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ ring_buffer_record_enable_cpu(tr->max_buffer.buffer, cpu);
+#endif
}
}
arch_spin_unlock(&tr->max_lock);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 2/8] selftests/ftrace: Add a basic testcase for snapshot
[not found] <20230822023803.605698724@goodmis.org>
2023-08-22 2:38 ` [for-linus][PATCH 1/8] tracing: Fix cpu buffers unavailable due to record_disabled missed Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 3/8] tracing/synthetic: Use union instead of casts Steven Rostedt
` (5 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, vnagarnaik, shuah,
Zheng Yejian
From: Zheng Yejian <zhengyejian1@huawei.com>
This testcase is constrived to reproduce a problem that the cpu buffers
become unavailable which is due to 'record_disabled' of array_buffer and
max_buffer being messed up.
Local test result after bugfix:
# ./ftracetest test.d/00basic/snapshot1.tc
=== Ftrace unit tests ===
[1] Snapshot and tracing_cpumask [PASS]
[2] (instance) Snapshot and tracing_cpumask [PASS]
# of passed: 2
# of failed: 0
# of unresolved: 0
# of untested: 0
# of unsupported: 0
# of xfailed: 0
# of undefined(test bug): 0
Link: https://lkml.kernel.org/r/20230805033816.3284594-3-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
.../ftrace/test.d/00basic/snapshot1.tc | 31 +++++++++++++++++++
1 file changed, 31 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/snapshot1.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/snapshot1.tc b/tools/testing/selftests/ftrace/test.d/00basic/snapshot1.tc
new file mode 100644
index 000000000000..63b76cf2a360
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/snapshot1.tc
@@ -0,0 +1,31 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Snapshot and tracing_cpumask
+# requires: trace_marker tracing_cpumask snapshot
+# flags: instance
+
+# This testcase is constrived to reproduce a problem that the cpu buffers
+# become unavailable which is due to 'record_disabled' of array_buffer and
+# max_buffer being messed up.
+
+# Store origin cpumask
+ORIG_CPUMASK=`cat tracing_cpumask`
+
+# Stop tracing all cpu
+echo 0 > tracing_cpumask
+
+# Take a snapshot of the main buffer
+echo 1 > snapshot
+
+# Restore origin cpumask, note that there should be some cpus being traced
+echo ${ORIG_CPUMASK} > tracing_cpumask
+
+# Set tracing on
+echo 1 > tracing_on
+
+# Write a log into buffer
+echo "test input 1" > trace_marker
+
+# Ensure the log writed so that cpu buffers are still available
+grep -q "test input 1" trace
+exit 0
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 3/8] tracing/synthetic: Use union instead of casts
[not found] <20230822023803.605698724@goodmis.org>
2023-08-22 2:38 ` [for-linus][PATCH 1/8] tracing: Fix cpu buffers unavailable due to record_disabled missed Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 2/8] selftests/ftrace: Add a basic testcase for snapshot Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 4/8] tracing/synthetic: Skip first entry for stack traces Steven Rostedt
` (4 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, stable,
Sven Schnelle
From: Sven Schnelle <svens@linux.ibm.com>
The current code uses a lot of casts to access the fields member in struct
synth_trace_events with different sizes. This makes the code hard to
read, and had already introduced an endianness bug. Use a union and struct
instead.
Link: https://lkml.kernel.org/r/20230816154928.4171614-2-svens@linux.ibm.com
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a9dd ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/trace_events.h | 11 ++++
kernel/trace/trace.h | 8 +++
kernel/trace/trace_events_synth.c | 87 +++++++++++++------------------
3 files changed, 56 insertions(+), 50 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 3930e676436c..1e8bbdb8da90 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -59,6 +59,17 @@ int trace_raw_output_prep(struct trace_iterator *iter,
extern __printf(2, 3)
void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...);
+/* Used to find the offset and length of dynamic fields in trace events */
+struct trace_dynamic_info {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+ u16 offset;
+ u16 len;
+#else
+ u16 len;
+ u16 offset;
+#endif
+};
+
/*
* The trace entry - the most basic unit of tracing. This is what
* is printed in the end as a single line in the trace output, such as:
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e1edc2197fc8..95956f75bea5 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1295,6 +1295,14 @@ static inline void trace_branch_disable(void)
/* set ring buffers to default size if not already done so */
int tracing_update_buffers(void);
+union trace_synth_field {
+ u8 as_u8;
+ u16 as_u16;
+ u32 as_u32;
+ u64 as_u64;
+ struct trace_dynamic_info as_dynamic;
+};
+
struct ftrace_event_field {
struct list_head link;
const char *name;
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index dd398afc8e25..7fff8235075f 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -127,7 +127,7 @@ static bool synth_event_match(const char *system, const char *event,
struct synth_trace_event {
struct trace_entry ent;
- u64 fields[];
+ union trace_synth_field fields[];
};
static int synth_event_define_fields(struct trace_event_call *call)
@@ -321,19 +321,19 @@ static const char *synth_field_fmt(char *type)
static void print_synth_event_num_val(struct trace_seq *s,
char *print_fmt, char *name,
- int size, u64 val, char *space)
+ int size, union trace_synth_field *val, char *space)
{
switch (size) {
case 1:
- trace_seq_printf(s, print_fmt, name, (u8)val, space);
+ trace_seq_printf(s, print_fmt, name, val->as_u8, space);
break;
case 2:
- trace_seq_printf(s, print_fmt, name, (u16)val, space);
+ trace_seq_printf(s, print_fmt, name, val->as_u16, space);
break;
case 4:
- trace_seq_printf(s, print_fmt, name, (u32)val, space);
+ trace_seq_printf(s, print_fmt, name, val->as_u32, space);
break;
default:
@@ -374,36 +374,26 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
/* parameter values */
if (se->fields[i]->is_string) {
if (se->fields[i]->is_dynamic) {
- u32 offset, data_offset;
- char *str_field;
-
- offset = (u32)entry->fields[n_u64];
- data_offset = offset & 0xffff;
-
- str_field = (char *)entry + data_offset;
+ union trace_synth_field *data = &entry->fields[n_u64];
trace_seq_printf(s, print_fmt, se->fields[i]->name,
STR_VAR_LEN_MAX,
- str_field,
+ (char *)entry + data->as_dynamic.offset,
i == se->n_fields - 1 ? "" : " ");
n_u64++;
} else {
trace_seq_printf(s, print_fmt, se->fields[i]->name,
STR_VAR_LEN_MAX,
- (char *)&entry->fields[n_u64],
+ (char *)&entry->fields[n_u64].as_u64,
i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
}
} else if (se->fields[i]->is_stack) {
- u32 offset, data_offset, len;
unsigned long *p, *end;
+ union trace_synth_field *data = &entry->fields[n_u64];
- offset = (u32)entry->fields[n_u64];
- data_offset = offset & 0xffff;
- len = offset >> 16;
-
- p = (void *)entry + data_offset;
- end = (void *)p + len - (sizeof(long) - 1);
+ p = (void *)entry + data->as_dynamic.offset;
+ end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
@@ -419,13 +409,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
print_synth_event_num_val(s, print_fmt,
se->fields[i]->name,
se->fields[i]->size,
- entry->fields[n_u64],
+ &entry->fields[n_u64],
space);
if (strcmp(se->fields[i]->type, "gfp_t") == 0) {
trace_seq_puts(s, " (");
trace_print_flags_seq(s, "|",
- entry->fields[n_u64],
+ entry->fields[n_u64].as_u64,
__flags);
trace_seq_putc(s, ')');
}
@@ -454,21 +444,16 @@ static unsigned int trace_string(struct synth_trace_event *entry,
int ret;
if (is_dynamic) {
- u32 data_offset;
+ union trace_synth_field *data = &entry->fields[*n_u64];
- data_offset = struct_size(entry, fields, event->n_u64);
- data_offset += data_size;
-
- len = fetch_store_strlen((unsigned long)str_val);
-
- data_offset |= len << 16;
- *(u32 *)&entry->fields[*n_u64] = data_offset;
+ data->as_dynamic.offset = struct_size(entry, fields, event->n_u64) + data_size;
+ data->as_dynamic.len = fetch_store_strlen((unsigned long)str_val);
ret = fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
(*n_u64)++;
} else {
- str_field = (char *)&entry->fields[*n_u64];
+ str_field = (char *)&entry->fields[*n_u64].as_u64;
#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
if ((unsigned long)str_val < TASK_SIZE)
@@ -492,6 +477,7 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
unsigned int data_size,
unsigned int *n_u64)
{
+ union trace_synth_field *data = &entry->fields[*n_u64];
unsigned int len;
u32 data_offset;
void *data_loc;
@@ -515,8 +501,9 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
memcpy(data_loc, stack, len);
/* Fill in the field that holds the offset/len combo */
- data_offset |= len << 16;
- *(u32 *)&entry->fields[*n_u64] = data_offset;
+
+ data->as_dynamic.offset = data_offset;
+ data->as_dynamic.len = len;
(*n_u64)++;
@@ -592,19 +579,19 @@ static notrace void trace_event_raw_event_synth(void *__data,
switch (field->size) {
case 1:
- *(u8 *)&entry->fields[n_u64] = (u8)val;
+ entry->fields[n_u64].as_u8 = (u8)val;
break;
case 2:
- *(u16 *)&entry->fields[n_u64] = (u16)val;
+ entry->fields[n_u64].as_u16 = (u16)val;
break;
case 4:
- *(u32 *)&entry->fields[n_u64] = (u32)val;
+ entry->fields[n_u64].as_u32 = (u32)val;
break;
default:
- entry->fields[n_u64] = val;
+ entry->fields[n_u64].as_u64 = val;
break;
}
n_u64++;
@@ -1791,19 +1778,19 @@ int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
switch (field->size) {
case 1:
- *(u8 *)&state.entry->fields[n_u64] = (u8)val;
+ state.entry->fields[n_u64].as_u8 = (u8)val;
break;
case 2:
- *(u16 *)&state.entry->fields[n_u64] = (u16)val;
+ state.entry->fields[n_u64].as_u16 = (u16)val;
break;
case 4:
- *(u32 *)&state.entry->fields[n_u64] = (u32)val;
+ state.entry->fields[n_u64].as_u32 = (u32)val;
break;
default:
- state.entry->fields[n_u64] = val;
+ state.entry->fields[n_u64].as_u64 = val;
break;
}
n_u64++;
@@ -1884,19 +1871,19 @@ int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
switch (field->size) {
case 1:
- *(u8 *)&state.entry->fields[n_u64] = (u8)val;
+ state.entry->fields[n_u64].as_u8 = (u8)val;
break;
case 2:
- *(u16 *)&state.entry->fields[n_u64] = (u16)val;
+ state.entry->fields[n_u64].as_u16 = (u16)val;
break;
case 4:
- *(u32 *)&state.entry->fields[n_u64] = (u32)val;
+ state.entry->fields[n_u64].as_u32 = (u32)val;
break;
default:
- state.entry->fields[n_u64] = val;
+ state.entry->fields[n_u64].as_u64 = val;
break;
}
n_u64++;
@@ -2031,19 +2018,19 @@ static int __synth_event_add_val(const char *field_name, u64 val,
} else {
switch (field->size) {
case 1:
- *(u8 *)&trace_state->entry->fields[field->offset] = (u8)val;
+ trace_state->entry->fields[field->offset].as_u8 = (u8)val;
break;
case 2:
- *(u16 *)&trace_state->entry->fields[field->offset] = (u16)val;
+ trace_state->entry->fields[field->offset].as_u16 = (u16)val;
break;
case 4:
- *(u32 *)&trace_state->entry->fields[field->offset] = (u32)val;
+ trace_state->entry->fields[field->offset].as_u32 = (u32)val;
break;
default:
- trace_state->entry->fields[field->offset] = val;
+ trace_state->entry->fields[field->offset].as_u64 = val;
break;
}
}
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 4/8] tracing/synthetic: Skip first entry for stack traces
[not found] <20230822023803.605698724@goodmis.org>
` (2 preceding siblings ...)
2023-08-22 2:38 ` [for-linus][PATCH 3/8] tracing/synthetic: Use union instead of casts Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 5/8] tracing/synthetic: Allocate one additional element for size Steven Rostedt
` (3 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel; +Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Sven Schnelle
From: Sven Schnelle <svens@linux.ibm.com>
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_events_synth.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 7fff8235075f..80a2a832f857 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -350,7 +350,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
- unsigned int i, n_u64;
+ unsigned int i, j, n_u64;
char print_fmt[32];
const char *fmt;
@@ -389,18 +389,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
}
} else if (se->fields[i]->is_stack) {
- unsigned long *p, *end;
union trace_synth_field *data = &entry->fields[n_u64];
-
- p = (void *)entry + data->as_dynamic.offset;
- end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
+ unsigned long *p = (void *)entry + data->as_dynamic.offset;
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
-
- for (; *p && p < end; p++)
- trace_seq_printf(s, "=> %pS\n", (void *)*p);
+ for (j = 1; j < data->as_dynamic.len / sizeof(long); j++)
+ trace_seq_printf(s, "=> %pS\n", (void *)p[j]);
n_u64++;
-
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -490,10 +485,6 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
break;
}
- /* Include the zero'd element if it fits */
- if (len < HIST_STACKTRACE_DEPTH)
- len++;
-
len *= sizeof(long);
/* Find the dynamic section to copy the stack into. */
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 5/8] tracing/synthetic: Allocate one additional element for size
[not found] <20230822023803.605698724@goodmis.org>
` (3 preceding siblings ...)
2023-08-22 2:38 ` [for-linus][PATCH 4/8] tracing/synthetic: Skip first entry for stack traces Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 6/8] tracing: Fix memleak due to race between current_tracer and trace Steven Rostedt
` (2 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel; +Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Sven Schnelle
From: Sven Schnelle <svens@linux.ibm.com>
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_events_synth.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 80a2a832f857..9897d0bfcab7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -528,7 +528,8 @@ static notrace void trace_event_raw_event_synth(void *__data,
str_val = (char *)(long)var_ref_vals[val_idx];
if (event->dynamic_fields[i]->is_stack) {
- len = *((unsigned long *)str_val);
+ /* reserve one extra element for size */
+ len = *((unsigned long *)str_val) + 1;
len *= sizeof(unsigned long);
} else {
len = fetch_store_strlen((unsigned long)str_val);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 6/8] tracing: Fix memleak due to race between current_tracer and trace
[not found] <20230822023803.605698724@goodmis.org>
` (4 preceding siblings ...)
2023-08-22 2:38 ` [for-linus][PATCH 5/8] tracing/synthetic: Allocate one additional element for size Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 7/8] tracing: Introduce pipe_cpumask to avoid race on trace_pipes Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 8/8] samples: ftrace: Replace bti assembly with hint for older compiler Steven Rostedt
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel; +Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Zheng Yejian
From: Zheng Yejian <zhengyejian1@huawei.com>
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128):
comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
hex dump (first 32 bytes):
e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
backtrace:
[<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
[<000000007df90faa>] graph_trace_open+0xb0/0x344
[<00000000737524cd>] __tracing_open+0x450/0xb10
[<0000000098043327>] tracing_open+0x1a0/0x2a0
[<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
[<000000004015bcd6>] vfs_open+0x98/0xd0
[<000000002b5f60c9>] do_open+0x520/0x8d0
[<00000000376c7820>] path_openat+0x1c0/0x3e0
[<00000000336a54b5>] do_filp_open+0x14c/0x324
[<000000002802df13>] do_sys_openat2+0x2c4/0x530
[<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
[<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
[<00000000313647bf>] do_el0_svc+0xac/0xec
[<000000002ef1c651>] el0_svc+0x20/0x30
[<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
[<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened;
...
*iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is
// currently set;
...
iter->trace->open(iter); // 3. Call graph_trace_open() here,
// and memory are allocated in it;
...
}
s_start() { // 4. The opened file is being read;
...
*iter->trace = *tr->current_trace; // 5. If tracer is switched to
// 'nop' or others, then memory
// in step 3 are leaked!!!
...
}
To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 9 ++++++++-
kernel/trace/trace_irqsoff.c | 3 ++-
kernel/trace/trace_sched_wakeup.c | 2 ++
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b0e8eb6ea8ac..29a2e4d7886d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4213,8 +4213,15 @@ static void *s_start(struct seq_file *m, loff_t *pos)
* will point to the same string as current_trace->name.
*/
mutex_lock(&trace_types_lock);
- if (unlikely(tr->current_trace && iter->trace->name != tr->current_trace->name))
+ if (unlikely(tr->current_trace && iter->trace->name != tr->current_trace->name)) {
+ /* Close iter->trace before switching to the new current tracer */
+ if (iter->trace->close)
+ iter->trace->close(iter);
*iter->trace = *tr->current_trace;
+ /* Reopen the new current tracer */
+ if (iter->trace->open)
+ iter->trace->open(iter);
+ }
mutex_unlock(&trace_types_lock);
#ifdef CONFIG_TRACER_MAX_TRACE
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 590b3d51afae..ba37f768e2f2 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -231,7 +231,8 @@ static void irqsoff_trace_open(struct trace_iterator *iter)
{
if (is_graph(iter->tr))
graph_trace_open(iter);
-
+ else
+ iter->private = NULL;
}
static void irqsoff_trace_close(struct trace_iterator *iter)
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 330aee1c1a49..0469a04a355f 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -168,6 +168,8 @@ static void wakeup_trace_open(struct trace_iterator *iter)
{
if (is_graph(iter->tr))
graph_trace_open(iter);
+ else
+ iter->private = NULL;
}
static void wakeup_trace_close(struct trace_iterator *iter)
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 7/8] tracing: Introduce pipe_cpumask to avoid race on trace_pipes
[not found] <20230822023803.605698724@goodmis.org>
` (5 preceding siblings ...)
2023-08-22 2:38 ` [for-linus][PATCH 6/8] tracing: Fix memleak due to race between current_tracer and trace Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 8/8] samples: ftrace: Replace bti assembly with hint for older compiler Steven Rostedt
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel; +Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Zheng Yejian
From: Zheng Yejian <zhengyejian1@huawei.com>
There is race issue when concurrently splice_read main trace_pipe and
per_cpu trace_pipes which will result in data read out being different
from what actually writen.
As suggested by Steven:
> I believe we should add a ref count to trace_pipe and the per_cpu
> trace_pipes, where if they are opened, nothing else can read it.
>
> Opening trace_pipe locks all per_cpu ref counts, if any of them are
> open, then the trace_pipe open will fail (and releases any ref counts
> it had taken).
>
> Opening a per_cpu trace_pipe will up the ref count for just that
> CPU buffer. This will allow multiple tasks to read different per_cpu
> trace_pipe files, but will prevent the main trace_pipe file from
> being opened.
But because we only need to know whether per_cpu trace_pipe is open or
not, using a cpumask instead of using ref count may be easier.
After this patch, users will find that:
- Main trace_pipe can be opened by only one user, and if it is
opened, all per_cpu trace_pipes cannot be opened;
- Per_cpu trace_pipes can be opened by multiple users, but each per_cpu
trace_pipe can only be opened by one user. And if one of them is
opened, main trace_pipe cannot be opened.
Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 55 ++++++++++++++++++++++++++++++++++++++------
kernel/trace/trace.h | 2 ++
2 files changed, 50 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 29a2e4d7886d..8e64aaad5361 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6718,10 +6718,36 @@ tracing_max_lat_write(struct file *filp, const char __user *ubuf,
#endif
+static int open_pipe_on_cpu(struct trace_array *tr, int cpu)
+{
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ if (cpumask_empty(tr->pipe_cpumask)) {
+ cpumask_setall(tr->pipe_cpumask);
+ return 0;
+ }
+ } else if (!cpumask_test_cpu(cpu, tr->pipe_cpumask)) {
+ cpumask_set_cpu(cpu, tr->pipe_cpumask);
+ return 0;
+ }
+ return -EBUSY;
+}
+
+static void close_pipe_on_cpu(struct trace_array *tr, int cpu)
+{
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ WARN_ON(!cpumask_full(tr->pipe_cpumask));
+ cpumask_clear(tr->pipe_cpumask);
+ } else {
+ WARN_ON(!cpumask_test_cpu(cpu, tr->pipe_cpumask));
+ cpumask_clear_cpu(cpu, tr->pipe_cpumask);
+ }
+}
+
static int tracing_open_pipe(struct inode *inode, struct file *filp)
{
struct trace_array *tr = inode->i_private;
struct trace_iterator *iter;
+ int cpu;
int ret;
ret = tracing_check_open_get_tr(tr);
@@ -6729,13 +6755,16 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
return ret;
mutex_lock(&trace_types_lock);
+ cpu = tracing_get_cpu(inode);
+ ret = open_pipe_on_cpu(tr, cpu);
+ if (ret)
+ goto fail_pipe_on_cpu;
/* create a buffer to store the information to pass to userspace */
iter = kzalloc(sizeof(*iter), GFP_KERNEL);
if (!iter) {
ret = -ENOMEM;
- __trace_array_put(tr);
- goto out;
+ goto fail_alloc_iter;
}
trace_seq_init(&iter->seq);
@@ -6758,7 +6787,7 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
iter->tr = tr;
iter->array_buffer = &tr->array_buffer;
- iter->cpu_file = tracing_get_cpu(inode);
+ iter->cpu_file = cpu;
mutex_init(&iter->mutex);
filp->private_data = iter;
@@ -6768,12 +6797,15 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
nonseekable_open(inode, filp);
tr->trace_ref++;
-out:
+
mutex_unlock(&trace_types_lock);
return ret;
fail:
kfree(iter);
+fail_alloc_iter:
+ close_pipe_on_cpu(tr, cpu);
+fail_pipe_on_cpu:
__trace_array_put(tr);
mutex_unlock(&trace_types_lock);
return ret;
@@ -6790,7 +6822,7 @@ static int tracing_release_pipe(struct inode *inode, struct file *file)
if (iter->trace->pipe_close)
iter->trace->pipe_close(iter);
-
+ close_pipe_on_cpu(tr, iter->cpu_file);
mutex_unlock(&trace_types_lock);
free_cpumask_var(iter->started);
@@ -9454,6 +9486,9 @@ static struct trace_array *trace_array_create(const char *name)
if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL))
goto out_free_tr;
+ if (!alloc_cpumask_var(&tr->pipe_cpumask, GFP_KERNEL))
+ goto out_free_tr;
+
tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS;
cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
@@ -9495,6 +9530,7 @@ static struct trace_array *trace_array_create(const char *name)
out_free_tr:
ftrace_free_ftrace_ops(tr);
free_trace_buffers(tr);
+ free_cpumask_var(tr->pipe_cpumask);
free_cpumask_var(tr->tracing_cpumask);
kfree(tr->name);
kfree(tr);
@@ -9597,6 +9633,7 @@ static int __remove_instance(struct trace_array *tr)
}
kfree(tr->topts);
+ free_cpumask_var(tr->pipe_cpumask);
free_cpumask_var(tr->tracing_cpumask);
kfree(tr->name);
kfree(tr);
@@ -10394,12 +10431,14 @@ __init static int tracer_alloc_buffers(void)
if (trace_create_savedcmd() < 0)
goto out_free_temp_buffer;
+ if (!alloc_cpumask_var(&global_trace.pipe_cpumask, GFP_KERNEL))
+ goto out_free_savedcmd;
+
/* TODO: make the number of buffers hot pluggable with CPUS */
if (allocate_trace_buffers(&global_trace, ring_buf_size) < 0) {
MEM_FAIL(1, "tracer: failed to allocate ring buffer!\n");
- goto out_free_savedcmd;
+ goto out_free_pipe_cpumask;
}
-
if (global_trace.buffer_disabled)
tracing_off();
@@ -10452,6 +10491,8 @@ __init static int tracer_alloc_buffers(void)
return 0;
+out_free_pipe_cpumask:
+ free_cpumask_var(global_trace.pipe_cpumask);
out_free_savedcmd:
free_saved_cmdlines_buffer(savedcmd);
out_free_temp_buffer:
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 95956f75bea5..73eaec158473 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -377,6 +377,8 @@ struct trace_array {
struct list_head events;
struct trace_event_file *trace_marker_file;
cpumask_var_t tracing_cpumask; /* only trace on set CPUs */
+ /* one per_cpu trace_pipe can be opened by only one user */
+ cpumask_var_t pipe_cpumask;
int ref;
int trace_ref;
#ifdef CONFIG_FUNCTION_TRACER
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [for-linus][PATCH 8/8] samples: ftrace: Replace bti assembly with hint for older compiler
[not found] <20230822023803.605698724@goodmis.org>
` (6 preceding siblings ...)
2023-08-22 2:38 ` [for-linus][PATCH 7/8] tracing: Introduce pipe_cpumask to avoid race on trace_pipes Steven Rostedt
@ 2023-08-22 2:38 ` Steven Rostedt
7 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2023-08-22 2:38 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Florent Revest,
GONG, Ruiqi
From: "GONG, Ruiqi" <gongruiqi1@huawei.com>
When cross-building the arm64 kernel with allmodconfig using GCC 9.4,
the following error occurs on multiple files under samples/ftrace/:
/tmp/ccPC1ODs.s: Assembler messages:
/tmp/ccPC1ODs.s:8: Error: selected processor does not support `bti c'
Fix this issue by replacing `bti c` with `hint 34`, which is compatible
for the older compiler.
Link: https://lore.kernel.org/linux-trace-kernel/20230820111509.1470826-1-gongruiqi@huaweicloud.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Fixes: 8c3526fb86060cb5 ("arm64: ftrace: Add direct call trampoline samples support")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
samples/ftrace/ftrace-direct-modify.c | 4 ++--
samples/ftrace/ftrace-direct-multi-modify.c | 4 ++--
samples/ftrace/ftrace-direct-multi.c | 2 +-
samples/ftrace/ftrace-direct-too.c | 2 +-
samples/ftrace/ftrace-direct.c | 2 +-
5 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index e5ed08098ff3..e2a6a69352df 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -105,7 +105,7 @@ asm (
" .type my_tramp1, @function\n"
" .globl my_tramp1\n"
" my_tramp1:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #16\n"
" stp x9, x30, [sp]\n"
" bl my_direct_func1\n"
@@ -117,7 +117,7 @@ asm (
" .type my_tramp2, @function\n"
" .globl my_tramp2\n"
" my_tramp2:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #16\n"
" stp x9, x30, [sp]\n"
" bl my_direct_func2\n"
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index 292cff2b3f5d..2e349834d63c 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -112,7 +112,7 @@ asm (
" .type my_tramp1, @function\n"
" .globl my_tramp1\n"
" my_tramp1:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #32\n"
" stp x9, x30, [sp]\n"
" str x0, [sp, #16]\n"
@@ -127,7 +127,7 @@ asm (
" .type my_tramp2, @function\n"
" .globl my_tramp2\n"
" my_tramp2:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #32\n"
" stp x9, x30, [sp]\n"
" str x0, [sp, #16]\n"
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index b4391e08c913..9243dbfe4d0c 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -75,7 +75,7 @@ asm (
" .type my_tramp, @function\n"
" .globl my_tramp\n"
" my_tramp:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #32\n"
" stp x9, x30, [sp]\n"
" str x0, [sp, #16]\n"
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index e9804c5307c0..e39c3563ae4e 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -81,7 +81,7 @@ asm (
" .type my_tramp, @function\n"
" .globl my_tramp\n"
" my_tramp:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #48\n"
" stp x9, x30, [sp]\n"
" stp x0, x1, [sp, #16]\n"
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index 20f4a7caa810..32c477da1e9a 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -72,7 +72,7 @@ asm (
" .type my_tramp, @function\n"
" .globl my_tramp\n"
" my_tramp:"
-" bti c\n"
+" hint 34\n" // bti c
" sub sp, sp, #32\n"
" stp x9, x30, [sp]\n"
" str x0, [sp, #16]\n"
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-22 2:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230822023803.605698724@goodmis.org>
2023-08-22 2:38 ` [for-linus][PATCH 1/8] tracing: Fix cpu buffers unavailable due to record_disabled missed Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 2/8] selftests/ftrace: Add a basic testcase for snapshot Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 3/8] tracing/synthetic: Use union instead of casts Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 4/8] tracing/synthetic: Skip first entry for stack traces Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 5/8] tracing/synthetic: Allocate one additional element for size Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 6/8] tracing: Fix memleak due to race between current_tracer and trace Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 7/8] tracing: Introduce pipe_cpumask to avoid race on trace_pipes Steven Rostedt
2023-08-22 2:38 ` [for-linus][PATCH 8/8] samples: ftrace: Replace bti assembly with hint for older compiler Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox