* [for-next][PATCH 0/6] ring-buffer: Updates for 7.1
@ 2026-04-04 2:26 Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 1/6] ring-buffer: Show what clock function is used on timestamp errors Steven Rostedt
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
ring-buffer/for-next
Head SHA1: 3515572dd068895ffd241b8a69399a0ebfac7593
Masami Hiramatsu (Google) (3):
tracing: Make the backup instance non-reusable
tracing: Remove the backup instance automatically after read
tracing/Documentation: Add a section about backup instance
Steven Rostedt (2):
ring-buffer: Show what clock function is used on timestamp errors
tracing: Allow backup to save persistent ring buffer before it starts
Vincent Donnefort (1):
ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers
----
Documentation/trace/debugging.rst | 19 ++++
kernel/trace/ring_buffer.c | 29 ++++--
kernel/trace/trace.c | 182 ++++++++++++++++++++++++++++++--------
kernel/trace/trace.h | 18 +++-
kernel/trace/trace_boot.c | 5 +-
kernel/trace/trace_events.c | 76 +++++++++-------
6 files changed, 252 insertions(+), 77 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [for-next][PATCH 1/6] ring-buffer: Show what clock function is used on timestamp errors
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
@ 2026-04-04 2:26 ` Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 2/6] ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers Steven Rostedt
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
The testing for tracing was triggering a timestamp count issue that was
always off by one. This has been happening for some time but has never
been reported by anyone else. It was finally discovered to be an issue
with the "uptime" (jiffies) clock that happened to be traced and the
internal recursion caused the discrepancy. This would have been much
easier to solve if the clock function being used was displayed when the
error was detected.
Add the clock function to the error output.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260323202212.479bb288@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d6bebb782efc..8b6c39bba56d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4506,18 +4506,20 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer,
ret = rb_read_data_buffer(bpage, tail, cpu_buffer->cpu, &ts, &delta);
if (ret < 0) {
if (delta < ts) {
- buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
- cpu_buffer->cpu, ts, delta);
+ buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld clock:%pS\n",
+ cpu_buffer->cpu, ts, delta,
+ cpu_buffer->buffer->clock);
goto out;
}
}
if ((full && ts > info->ts) ||
(!full && ts + info->delta != info->ts)) {
- buffer_warn_return("[CPU: %d]TIME DOES NOT MATCH expected:%lld actual:%lld delta:%lld before:%lld after:%lld%s context:%s\n",
+ buffer_warn_return("[CPU: %d]TIME DOES NOT MATCH expected:%lld actual:%lld delta:%lld before:%lld after:%lld%s context:%s\ntrace clock:%pS",
cpu_buffer->cpu,
ts + info->delta, info->ts, info->delta,
info->before, info->after,
- full ? " (full)" : "", show_interrupt_level());
+ full ? " (full)" : "", show_interrupt_level(),
+ cpu_buffer->buffer->clock);
}
out:
atomic_dec(this_cpu_ptr(&checking));
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-next][PATCH 2/6] ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 1/6] ring-buffer: Show what clock function is used on timestamp errors Steven Rostedt
@ 2026-04-04 2:26 ` Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 3/6] tracing: Make the backup instance non-reusable Steven Rostedt
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a
ring_buffer_per_cpu will be allocated and its corresponding bit toggled
in the cpumask. Many readers check this cpumask to know if they can
safely read the ring_buffer_per_cpu but they are doing so without memory
ordering and may observe the cpumask bit set while having NULL buffer
pointer.
Enforce the memory read ordering by sending an IPI to all online CPUs.
The hotplug path is a slow-path anyway and it saves us from adding read
barriers in numerous call sites.
Link: https://patch.msgid.link/20260401053659.3458961-1-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8b6c39bba56d..2caa5d3d0ae9 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7722,6 +7722,12 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
return 0;
}
+static void rb_cpu_sync(void *data)
+{
+ /* Not really needed, but documents what is happening */
+ smp_rmb();
+}
+
/*
* We only allocate new buffers, never free them if the CPU goes down.
* If we were to free the buffer, then the user would lose any trace that was in
@@ -7760,7 +7766,18 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node)
cpu);
return -ENOMEM;
}
- smp_wmb();
+
+ /*
+ * Ensure trace_buffer readers observe the newly allocated
+ * ring_buffer_per_cpu before they check the cpumask. Instead of using a
+ * read barrier for all readers, send an IPI.
+ */
+ if (unlikely(system_state == SYSTEM_RUNNING)) {
+ on_each_cpu(rb_cpu_sync, NULL, 1);
+ /* Not really needed, but documents what is happening */
+ smp_wmb();
+ }
+
cpumask_set_cpu(cpu, buffer->cpumask);
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-next][PATCH 3/6] tracing: Make the backup instance non-reusable
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 1/6] ring-buffer: Show what clock function is used on timestamp errors Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 2/6] ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers Steven Rostedt
@ 2026-04-04 2:26 ` Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 4/6] tracing: Remove the backup instance automatically after read Steven Rostedt
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Since there is no reason to reuse the backup instance, make it readonly
(but erasable). Note that only backup instances are readonly, because
other trace instances will be empty unless it is writable. Only backup
instances have copy entries from the original.
With this change, most of the trace control files are removed from the
backup instance, including eventfs enable/filter etc.
# find /sys/kernel/tracing/instances/backup/events/ | wc -l
4093
# find /sys/kernel/tracing/instances/boot_map/events/ | wc -l
9573
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502546939.1311542.1826814401724828930.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 81 +++++++++++++++++++++++--------------
kernel/trace/trace.h | 7 ++++
kernel/trace/trace_boot.c | 5 ++-
kernel/trace/trace_events.c | 76 +++++++++++++++++++---------------
4 files changed, 104 insertions(+), 65 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4189ec9df6a5..c8e43675ff76 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4022,6 +4022,11 @@ int tracing_open_generic_tr(struct inode *inode, struct file *filp)
if (ret)
return ret;
+ if ((filp->f_mode & FMODE_WRITE) && trace_array_is_readonly(tr)) {
+ trace_array_put(tr);
+ return -EACCES;
+ }
+
filp->private_data = inode->i_private;
return 0;
@@ -7097,6 +7102,11 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
if (ret)
return ret;
+ if ((file->f_mode & FMODE_WRITE) && trace_array_is_readonly(tr)) {
+ trace_array_put(tr);
+ return -EACCES;
+ }
+
ret = single_open(file, tracing_clock_show, inode->i_private);
if (ret < 0)
trace_array_put(tr);
@@ -9866,17 +9876,22 @@ static __init void create_trace_instances(struct dentry *d_tracer)
static void
init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
{
+ umode_t writable_mode = TRACE_MODE_WRITE;
int cpu;
+ if (trace_array_is_readonly(tr))
+ writable_mode = TRACE_MODE_READ;
+
trace_create_file("available_tracers", TRACE_MODE_READ, d_tracer,
- tr, &show_traces_fops);
+ tr, &show_traces_fops);
- trace_create_file("current_tracer", TRACE_MODE_WRITE, d_tracer,
- tr, &set_tracer_fops);
+ trace_create_file("current_tracer", writable_mode, d_tracer,
+ tr, &set_tracer_fops);
- trace_create_file("tracing_cpumask", TRACE_MODE_WRITE, d_tracer,
+ trace_create_file("tracing_cpumask", writable_mode, d_tracer,
tr, &tracing_cpumask_fops);
+ /* Options are used for changing print-format even for readonly instance. */
trace_create_file("trace_options", TRACE_MODE_WRITE, d_tracer,
tr, &tracing_iter_fops);
@@ -9886,12 +9901,36 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
trace_create_file("trace_pipe", TRACE_MODE_READ, d_tracer,
tr, &tracing_pipe_fops);
- trace_create_file("buffer_size_kb", TRACE_MODE_WRITE, d_tracer,
+ trace_create_file("buffer_size_kb", writable_mode, d_tracer,
tr, &tracing_entries_fops);
trace_create_file("buffer_total_size_kb", TRACE_MODE_READ, d_tracer,
tr, &tracing_total_entries_fops);
+ trace_create_file("trace_clock", writable_mode, d_tracer, tr,
+ &trace_clock_fops);
+
+ trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr,
+ &trace_time_stamp_mode_fops);
+
+ tr->buffer_percent = 50;
+
+ trace_create_file("buffer_subbuf_size_kb", writable_mode, d_tracer,
+ tr, &buffer_subbuf_size_fops);
+
+ create_trace_options_dir(tr);
+
+ if (tr->range_addr_start)
+ trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer,
+ tr, &last_boot_fops);
+
+ for_each_tracing_cpu(cpu)
+ tracing_init_tracefs_percpu(tr, cpu);
+
+ /* Read-only instance has above files only. */
+ if (trace_array_is_readonly(tr))
+ return;
+
trace_create_file("free_buffer", 0200, d_tracer,
tr, &tracing_free_buffer_fops);
@@ -9903,49 +9942,29 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
trace_create_file("trace_marker_raw", 0220, d_tracer,
tr, &tracing_mark_raw_fops);
- trace_create_file("trace_clock", TRACE_MODE_WRITE, d_tracer, tr,
- &trace_clock_fops);
-
- trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer,
- tr, &rb_simple_fops);
-
- trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr,
- &trace_time_stamp_mode_fops);
-
- tr->buffer_percent = 50;
-
trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
- tr, &buffer_percent_fops);
-
- trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer,
- tr, &buffer_subbuf_size_fops);
+ tr, &buffer_percent_fops);
trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_syscall_buf_fops);
+ tr, &tracing_syscall_buf_fops);
- create_trace_options_dir(tr);
+ trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer,
+ tr, &rb_simple_fops);
trace_create_maxlat_file(tr, d_tracer);
if (ftrace_create_function_files(tr, d_tracer))
MEM_FAIL(1, "Could not allocate function filter files");
- if (tr->range_addr_start) {
- trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer,
- tr, &last_boot_fops);
#ifdef CONFIG_TRACER_SNAPSHOT
- } else {
+ if (!tr->range_addr_start)
trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer,
tr, &snapshot_fops);
#endif
- }
trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer,
tr, &tracing_err_log_fops);
- for_each_tracing_cpu(cpu)
- tracing_init_tracefs_percpu(tr, cpu);
-
ftrace_init_tracefs(tr, d_tracer);
}
@@ -10772,7 +10791,7 @@ __init static void enable_instances(void)
* Backup buffers can be freed but need vfree().
*/
if (backup)
- tr->flags |= TRACE_ARRAY_FL_VMALLOC;
+ tr->flags |= TRACE_ARRAY_FL_VMALLOC | TRACE_ARRAY_FL_RDONLY;
if (start || backup) {
tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index fbbd0b2ee76f..366baa43dfab 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -462,6 +462,7 @@ enum {
TRACE_ARRAY_FL_MOD_INIT = BIT(3),
TRACE_ARRAY_FL_MEMMAP = BIT(4),
TRACE_ARRAY_FL_VMALLOC = BIT(5),
+ TRACE_ARRAY_FL_RDONLY = BIT(6),
};
#ifdef CONFIG_MODULES
@@ -491,6 +492,12 @@ extern unsigned long trace_adjust_address(struct trace_array *tr, unsigned long
extern struct trace_array *printk_trace;
+static inline bool trace_array_is_readonly(struct trace_array *tr)
+{
+ /* backup instance is read only. */
+ return tr->flags & TRACE_ARRAY_FL_RDONLY;
+}
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index dbe29b4c6a7a..2ca2541c8a58 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -61,7 +61,8 @@ trace_boot_set_instance_options(struct trace_array *tr, struct xbc_node *node)
v = memparse(p, NULL);
if (v < PAGE_SIZE)
pr_err("Buffer size is too small: %s\n", p);
- if (tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
+ if (trace_array_is_readonly(tr) ||
+ tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
pr_err("Failed to resize trace buffer to %s\n", p);
}
@@ -597,7 +598,7 @@ trace_boot_enable_tracer(struct trace_array *tr, struct xbc_node *node)
p = xbc_node_find_value(node, "tracer", NULL);
if (p && *p != '\0') {
- if (tracing_set_tracer(tr, p) < 0)
+ if (trace_array_is_readonly(tr) || tracing_set_tracer(tr, p) < 0)
pr_err("Failed to set given tracer: %s\n", p);
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 249d1cba72c0..aa422dc80ae8 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1401,6 +1401,9 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match,
{
int ret;
+ if (trace_array_is_readonly(tr))
+ return -EACCES;
+
mutex_lock(&event_mutex);
ret = __ftrace_set_clr_event_nolock(tr, match, sub, event, set, mod);
mutex_unlock(&event_mutex);
@@ -2973,8 +2976,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
} else
__get_system(system);
- /* ftrace only has directories no files */
- if (strcmp(name, "ftrace") == 0)
+ /* ftrace only has directories no files, readonly instance too. */
+ if (strcmp(name, "ftrace") == 0 || trace_array_is_readonly(tr))
nr_entries = 0;
else
nr_entries = ARRAY_SIZE(system_entries);
@@ -3139,28 +3142,30 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
int ret;
static struct eventfs_entry event_entries[] = {
{
- .name = "enable",
+ .name = "format",
.callback = event_callback,
- .release = event_release,
},
+#ifdef CONFIG_PERF_EVENTS
{
- .name = "filter",
+ .name = "id",
.callback = event_callback,
},
+#endif
+#define NR_RO_EVENT_ENTRIES (1 + IS_ENABLED(CONFIG_PERF_EVENTS))
+/* Readonly files must be above this line and counted by NR_RO_EVENT_ENTRIES. */
{
- .name = "trigger",
+ .name = "enable",
.callback = event_callback,
+ .release = event_release,
},
{
- .name = "format",
+ .name = "filter",
.callback = event_callback,
},
-#ifdef CONFIG_PERF_EVENTS
{
- .name = "id",
+ .name = "trigger",
.callback = event_callback,
},
-#endif
#ifdef CONFIG_HIST_TRIGGERS
{
.name = "hist",
@@ -3193,7 +3198,10 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
if (!e_events)
return -ENOMEM;
- nr_entries = ARRAY_SIZE(event_entries);
+ if (trace_array_is_readonly(tr))
+ nr_entries = NR_RO_EVENT_ENTRIES;
+ else
+ nr_entries = ARRAY_SIZE(event_entries);
name = trace_event_name(call);
ei = eventfs_create_dir(name, e_events, event_entries, nr_entries, file);
@@ -4536,31 +4544,44 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
int nr_entries;
static struct eventfs_entry events_entries[] = {
{
- .name = "enable",
+ .name = "header_page",
.callback = events_callback,
},
{
- .name = "header_page",
+ .name = "header_event",
.callback = events_callback,
},
+#define NR_RO_TOP_ENTRIES 2
+/* Readonly files must be above this line and counted by NR_RO_TOP_ENTRIES. */
{
- .name = "header_event",
+ .name = "enable",
.callback = events_callback,
},
};
- entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
- tr, &ftrace_set_event_fops);
- if (!entry)
- return -ENOMEM;
+ if (!trace_array_is_readonly(tr)) {
+ entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
+ tr, &ftrace_set_event_fops);
+ if (!entry)
+ return -ENOMEM;
- trace_create_file("show_event_filters", TRACE_MODE_READ, parent, tr,
- &ftrace_show_event_filters_fops);
+ /* There are not as crucial, just warn if they are not created */
+ trace_create_file("show_event_filters", TRACE_MODE_READ, parent, tr,
+ &ftrace_show_event_filters_fops);
- trace_create_file("show_event_triggers", TRACE_MODE_READ, parent, tr,
- &ftrace_show_event_triggers_fops);
+ trace_create_file("show_event_triggers", TRACE_MODE_READ, parent, tr,
+ &ftrace_show_event_triggers_fops);
- nr_entries = ARRAY_SIZE(events_entries);
+ trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent,
+ tr, &ftrace_set_event_pid_fops);
+
+ trace_create_file("set_event_notrace_pid",
+ TRACE_MODE_WRITE, parent, tr,
+ &ftrace_set_event_notrace_pid_fops);
+ nr_entries = ARRAY_SIZE(events_entries);
+ } else {
+ nr_entries = NR_RO_TOP_ENTRIES;
+ }
e_events = eventfs_create_events_dir("events", parent, events_entries,
nr_entries, tr);
@@ -4569,15 +4590,6 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
return -ENOMEM;
}
- /* There are not as crucial, just warn if they are not created */
-
- trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent,
- tr, &ftrace_set_event_pid_fops);
-
- trace_create_file("set_event_notrace_pid",
- TRACE_MODE_WRITE, parent, tr,
- &ftrace_set_event_notrace_pid_fops);
-
tr->event_dir = e_events;
return 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-next][PATCH 4/6] tracing: Remove the backup instance automatically after read
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
` (2 preceding siblings ...)
2026-04-04 2:26 ` [for-next][PATCH 3/6] tracing: Make the backup instance non-reusable Steven Rostedt
@ 2026-04-04 2:26 ` Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 5/6] tracing/Documentation: Add a section about backup instance Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 6/6] tracing: Allow backup to save persistent ring buffer before it starts Steven Rostedt
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Since the backup instance is readonly, after reading all data via pipe, no
data is left on the instance. Thus it can be removed safely after closing
all files. This also removes it if user resets the ring buffer manually
via 'trace' file.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502547711.1311542.12572973358010839400.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 79 ++++++++++++++++++++++++++++++++++++++++----
kernel/trace/trace.h | 6 ++++
2 files changed, 79 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c8e43675ff76..267671b33dd3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -578,8 +578,59 @@ void trace_set_ring_buffer_expanded(struct trace_array *tr)
tr->ring_buffer_expanded = true;
}
+static void trace_array_autoremove(struct work_struct *work)
+{
+ struct trace_array *tr = container_of(work, struct trace_array, autoremove_work);
+
+ trace_array_destroy(tr);
+}
+
+static struct workqueue_struct *autoremove_wq;
+
+static void trace_array_kick_autoremove(struct trace_array *tr)
+{
+ if (autoremove_wq)
+ queue_work(autoremove_wq, &tr->autoremove_work);
+}
+
+static void trace_array_cancel_autoremove(struct trace_array *tr)
+{
+ /*
+ * Since this can be called inside trace_array_autoremove(),
+ * it has to avoid deadlock of the workqueue.
+ */
+ if (work_pending(&tr->autoremove_work))
+ cancel_work_sync(&tr->autoremove_work);
+}
+
+static void trace_array_init_autoremove(struct trace_array *tr)
+{
+ INIT_WORK(&tr->autoremove_work, trace_array_autoremove);
+}
+
+static void trace_array_start_autoremove(void)
+{
+ if (autoremove_wq)
+ return;
+
+ autoremove_wq = alloc_workqueue("tr_autoremove_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+ if (!autoremove_wq)
+ pr_warn("Unable to allocate tr_autoremove_wq. autoremove disabled.\n");
+}
+
LIST_HEAD(ftrace_trace_arrays);
+static int __trace_array_get(struct trace_array *this_tr)
+{
+ /* When free_on_close is set, this is not available anymore. */
+ if (autoremove_wq && this_tr->free_on_close)
+ return -ENODEV;
+
+ this_tr->ref++;
+ return 0;
+}
+
int trace_array_get(struct trace_array *this_tr)
{
struct trace_array *tr;
@@ -587,8 +638,7 @@ int trace_array_get(struct trace_array *this_tr)
guard(mutex)(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (tr == this_tr) {
- tr->ref++;
- return 0;
+ return __trace_array_get(tr);
}
}
@@ -599,6 +649,12 @@ static void __trace_array_put(struct trace_array *this_tr)
{
WARN_ON(!this_tr->ref);
this_tr->ref--;
+ /*
+ * When free_on_close is set, prepare removing the array
+ * when the last reference is released.
+ */
+ if (this_tr->ref == 1 && this_tr->free_on_close)
+ trace_array_kick_autoremove(this_tr);
}
/**
@@ -5467,6 +5523,10 @@ static void update_last_data(struct trace_array *tr)
/* Only if the buffer has previous boot data clear and update it. */
tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
+ /* If this is a backup instance, mark it for autoremove. */
+ if (tr->flags & TRACE_ARRAY_FL_VMALLOC)
+ tr->free_on_close = true;
+
/* Reset the module list and reload them */
if (tr->scratch) {
struct trace_scratch *tscratch = tr->scratch;
@@ -9537,8 +9597,8 @@ struct trace_array *trace_array_find_get(const char *instance)
guard(mutex)(&trace_types_lock);
tr = trace_array_find(instance);
- if (tr)
- tr->ref++;
+ if (tr && __trace_array_get(tr) < 0)
+ tr = NULL;
return tr;
}
@@ -9635,6 +9695,8 @@ trace_array_create_systems(const char *name, const char *systems,
if (ftrace_allocate_ftrace_ops(tr) < 0)
goto out_free_tr;
+ trace_array_init_autoremove(tr);
+
ftrace_init_trace_array(tr);
init_trace_flags_index(tr);
@@ -9745,7 +9807,9 @@ struct trace_array *trace_array_get_by_name(const char *name, const char *system
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (tr->name && strcmp(tr->name, name) == 0) {
- tr->ref++;
+ /* if this fails, @tr is going to be removed. */
+ if (__trace_array_get(tr) < 0)
+ tr = NULL;
return tr;
}
}
@@ -9784,6 +9848,7 @@ static int __remove_instance(struct trace_array *tr)
set_tracer_flag(tr, 1ULL << i, 0);
}
+ trace_array_cancel_autoremove(tr);
tracing_set_nop(tr);
clear_ftrace_function_probes(tr);
event_trace_del_tracer(tr);
@@ -10790,8 +10855,10 @@ __init static void enable_instances(void)
/*
* Backup buffers can be freed but need vfree().
*/
- if (backup)
+ if (backup) {
tr->flags |= TRACE_ARRAY_FL_VMALLOC | TRACE_ARRAY_FL_RDONLY;
+ trace_array_start_autoremove();
+ }
if (start || backup) {
tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 366baa43dfab..b001fbba0881 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -453,6 +453,12 @@ struct trace_array {
* we do not waste memory on systems that are not using tracing.
*/
bool ring_buffer_expanded;
+ /*
+ * If the ring buffer is a read only backup instance, it will be
+ * removed after dumping all data via pipe, because no readable data.
+ */
+ bool free_on_close;
+ struct work_struct autoremove_work;
};
enum {
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-next][PATCH 5/6] tracing/Documentation: Add a section about backup instance
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
` (3 preceding siblings ...)
2026-04-04 2:26 ` [for-next][PATCH 4/6] tracing: Remove the backup instance automatically after read Steven Rostedt
@ 2026-04-04 2:26 ` Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 6/6] tracing: Allow backup to save persistent ring buffer before it starts Steven Rostedt
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Add a section about backup instance to the debugging.rst.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502548479.1311542.7062269603547001007.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Documentation/trace/debugging.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
index 4d88c346fc38..bca1710d92bf 100644
--- a/Documentation/trace/debugging.rst
+++ b/Documentation/trace/debugging.rst
@@ -159,3 +159,22 @@ If setting it from the kernel command line, it is recommended to also
disable tracing with the "traceoff" flag, and enable tracing after boot up.
Otherwise the trace from the most recent boot will be mixed with the trace
from the previous boot, and may make it confusing to read.
+
+Using a backup instance for keeping previous boot data
+------------------------------------------------------
+
+It is also possible to record trace data at system boot time by specifying
+events with the persistent ring buffer, but in this case the data before the
+reboot will be lost before it can be read. This problem can be solved by a
+backup instance. From the kernel command line::
+
+ reserve_mem=12M:4096:trace trace_instance=boot_map@trace,sched,irq trace_instance=backup=boot_map
+
+On boot up, the previous data in the "boot_map" is copied to the "backup"
+instance, and the "sched:*" and "irq:*" events for the current boot are traced
+in the "boot_map". Thus the user can read the previous boot data from the "backup"
+instance without stopping the trace.
+
+Note that this "backup" instance is readonly, and will be removed automatically
+if you clear the trace data or read out all trace data from the "trace_pipe"
+or the "trace_pipe_raw" files.
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [for-next][PATCH 6/6] tracing: Allow backup to save persistent ring buffer before it starts
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
` (4 preceding siblings ...)
2026-04-04 2:26 ` [for-next][PATCH 5/6] tracing/Documentation: Add a section about backup instance Steven Rostedt
@ 2026-04-04 2:26 ` Steven Rostedt
5 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2026-04-04 2:26 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
John Stultz
From: Steven Rostedt <rostedt@goodmis.org>
When the persistent ring buffer was first introduced, it did not make
sense to start tracing for it on the kernel command line. That's because
if there was a crash, the start of events would invalidate the events from
the previous boot that had the crash.
But now that there's a "backup" instance that can take a snapshot of the
persistent ring buffer when boot starts, it is possible to have the
persistent ring buffer start events at boot up and not lose the old events.
Update the code where the boot events start after all boot time instances
are created. This will allow the backup instance to copy the persistent
ring buffer from the previous boot, and allow the persistent ring buffer
to start tracing new events for the current boot.
reserve_mem=100M:12M:trace trace_instance=boot_mapped^@trace,sched trace_instance=backup=boot_mapped
The above will create a boot_mapped persistent ring buffer and enabled the
scheduler events. If there's a crash, a "backup" instance will be created
holding the events of the persistent ring buffer from the previous boot,
while the persistent ring buffer will once again start tracing scheduler
events of the current boot.
Now the user doesn't have to remember to start the persistent ring buffer.
It will always have the events started at each boot.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260331163924.6ccb3896@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 22 ++++++++++++++++++++++
kernel/trace/trace.h | 5 ++++-
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 267671b33dd3..e9455d46ec16 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10865,9 +10865,31 @@ __init static void enable_instances(void)
tr->range_name = no_free_ptr(rname);
}
+ /*
+ * Save the events to start and enabled them after all boot instances
+ * have been created.
+ */
+ tr->boot_events = curr_str;
+ }
+
+ /* Enable the events after all boot instances have been created */
+ list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+
+ if (!tr->boot_events || !(*tr->boot_events)) {
+ tr->boot_events = NULL;
+ continue;
+ }
+
+ curr_str = tr->boot_events;
+
+ /* Clear the instance if this is a persistent buffer */
+ if (tr->flags & TRACE_ARRAY_FL_LAST_BOOT)
+ update_last_data(tr);
+
while ((tok = strsep(&curr_str, ","))) {
early_enable_events(tr, tok, true);
}
+ tr->boot_events = NULL;
}
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b001fbba0881..e68f9c2027eb 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -405,7 +405,10 @@ struct trace_array {
unsigned char trace_flags_index[TRACE_FLAGS_MAX_SIZE];
unsigned int flags;
raw_spinlock_t start_lock;
- const char *system_names;
+ union {
+ const char *system_names;
+ char *boot_events;
+ };
struct list_head err_log;
struct dentry *dir;
struct dentry *options;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-04 2:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-04 2:26 [for-next][PATCH 0/6] ring-buffer: Updates for 7.1 Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 1/6] ring-buffer: Show what clock function is used on timestamp errors Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 2/6] ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 3/6] tracing: Make the backup instance non-reusable Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 4/6] tracing: Remove the backup instance automatically after read Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 5/6] tracing/Documentation: Add a section about backup instance Steven Rostedt
2026-04-04 2:26 ` [for-next][PATCH 6/6] tracing: Allow backup to save persistent ring buffer before it starts Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox