* [PATCH v6 0/4] tracing: Remove backup instance after read all
@ 2026-02-01 3:28 Masami Hiramatsu (Google)
2026-02-01 3:29 ` [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google)
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-01 3:28 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
Hi,
Here is the v6 of the series to improve backup instances of
the persistent ring buffer. The previous version is here:
https://lore.kernel.org/all/176955897718.2786091.11948759407196200082.stgit@mhiramat.tok.corp.google.com/
This version cleanup code and fix typo, according to Steve's
comments. Also fix to init autoremove_wq only when the readonly
instance has been made. A major UI change is that the tracing_on
file is also removed from readonly instance.
Since backup instances are a kind of snapshot of the persistent
ring buffer, it should be readonly. And if it is readonly
there is no reason to keep it after reading all data via trace_pipe
because the data has been consumed. But user should be able to remove
the readonly instance by rmdir or truncating `trace` file.
Thus, [2/4] makes backup instances readonly (not able to write any
events, cleanup trace, change buffer size). Also, [3/4] removes the
backup instance after consuming all data via trace_pipe.
With this improvements, even if we makes a backup instance (using
the same amount of memory of the persistent ring buffer), it will
be removed after reading the data automatically.
---
Masami Hiramatsu (Google) (4):
tracing: Reset last_boot_info if ring buffer is reset
tracing: Make the backup instance non-reusable
tracing: Remove the backup instance automatically after read
tracing/Documentation: Add a section about backup instance
Documentation/trace/debugging.rst | 19 ++++
kernel/trace/trace.c | 162 ++++++++++++++++++++++++++++++-------
kernel/trace/trace.h | 13 +++
kernel/trace/trace_boot.c | 5 +
kernel/trace/trace_events.c | 76 ++++++++++-------
5 files changed, 209 insertions(+), 66 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset 2026-02-01 3:28 [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu (Google) @ 2026-02-01 3:29 ` Masami Hiramatsu (Google) 2026-02-05 1:40 ` Steven Rostedt 2026-02-01 3:29 ` [PATCH v6 2/4] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google) ` (3 subsequent siblings) 4 siblings, 1 reply; 16+ messages in thread From: Masami Hiramatsu (Google) @ 2026-02-01 3:29 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-trace-kernel From: Masami Hiramatsu (Google) <mhiramat@kernel.org> Commit 32dc0042528d ("tracing: Reset last-boot buffers when reading out all cpu buffers") resets the last_boot_info when user read out all data via trace_pipe* files. But it is not reset when user resets the buffer from other files. (e.g. write `trace` file) Reset it when the corresponding ring buffer is reset too. Fixes: 32dc0042528d ("tracing: Reset last-boot buffers when reading out all cpu buffers") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> --- kernel/trace/trace.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 396d59202438..5c3e4a554143 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4891,6 +4891,8 @@ static int tracing_single_release_tr(struct inode *inode, struct file *file) return single_release(inode, file); } +static bool update_last_data_if_empty(struct trace_array *tr); + static int tracing_open(struct inode *inode, struct file *file) { struct trace_array *tr = inode->i_private; @@ -4915,6 +4917,8 @@ static int tracing_open(struct inode *inode, struct file *file) tracing_reset_online_cpus(trace_buf); else tracing_reset_cpu(trace_buf, cpu); + + update_last_data_if_empty(tr); } if (file->f_mode & FMODE_READ) { @@ -5981,6 +5985,7 @@ tracing_set_trace_read(struct file *filp, char __user *ubuf, int tracer_init(struct tracer *t, struct trace_array *tr) { tracing_reset_online_cpus(&tr->array_buffer); + update_last_data_if_empty(tr); return t->init(tr); } @@ -7799,6 +7804,7 @@ int tracing_set_clock(struct trace_array *tr, const char *clockstr) ring_buffer_set_clock(tr->max_buffer.buffer, trace_clocks[i].func); tracing_reset_online_cpus(&tr->max_buffer); #endif + update_last_data_if_empty(tr); if (tr->scratch && !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT)) { struct trace_scratch *tscratch = tr->scratch; @@ -8036,6 +8042,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, tracing_reset_online_cpus(&tr->max_buffer); else tracing_reset_cpu(&tr->max_buffer, iter->cpu_file); + update_last_data_if_empty(tr); } break; } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset 2026-02-01 3:29 ` [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google) @ 2026-02-05 1:40 ` Steven Rostedt 2026-02-06 13:50 ` Masami Hiramatsu 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-02-05 1:40 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Sun, 1 Feb 2026 12:29:07 +0900 "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote: > @@ -8036,6 +8042,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, > tracing_reset_online_cpus(&tr->max_buffer); > else > tracing_reset_cpu(&tr->max_buffer, iter->cpu_file); > + update_last_data_if_empty(tr); Is this needed? Memory mapped buffers (which the persistent ring buffer is) do not have snapshot buffers. -- Steve > } > break; > } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset 2026-02-05 1:40 ` Steven Rostedt @ 2026-02-06 13:50 ` Masami Hiramatsu 2026-02-11 18:28 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Masami Hiramatsu @ 2026-02-06 13:50 UTC (permalink / raw) To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Wed, 4 Feb 2026 20:40:49 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Sun, 1 Feb 2026 12:29:07 +0900 > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote: > > > @@ -8036,6 +8042,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, > > tracing_reset_online_cpus(&tr->max_buffer); > > else > > tracing_reset_cpu(&tr->max_buffer, iter->cpu_file); > > + update_last_data_if_empty(tr); > > Is this needed? Memory mapped buffers (which the persistent ring buffer > is) do not have snapshot buffers. Yeah, I did this just for consistency. But it is better to just leave the comment here. Thank you, > > -- Steve > > > > } > > break; > > } > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset 2026-02-06 13:50 ` Masami Hiramatsu @ 2026-02-11 18:28 ` Steven Rostedt 2026-02-11 18:33 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-02-11 18:28 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Fri, 6 Feb 2026 22:50:51 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Wed, 4 Feb 2026 20:40:49 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Sun, 1 Feb 2026 12:29:07 +0900 > > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote: > > > > > @@ -8036,6 +8042,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, > > > tracing_reset_online_cpus(&tr->max_buffer); > > > else > > > tracing_reset_cpu(&tr->max_buffer, iter->cpu_file); > > > + update_last_data_if_empty(tr); > > > > Is this needed? Memory mapped buffers (which the persistent ring buffer > > is) do not have snapshot buffers. > > Yeah, I did this just for consistency. But it is better to just leave > the comment here. > I have code to move all the snapshot buffers into its own file. I'm going to drop this one as I think leaving it makes it more confusing. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset 2026-02-11 18:28 ` Steven Rostedt @ 2026-02-11 18:33 ` Steven Rostedt 0 siblings, 0 replies; 16+ messages in thread From: Steven Rostedt @ 2026-02-11 18:33 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Wed, 11 Feb 2026 13:28:48 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > I have code to move all the snapshot buffers into its own file. I'm > going to drop this one as I think leaving it makes it more confusing. Never mind, I was looking at the wrong patch. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-01 3:28 [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu (Google) 2026-02-01 3:29 ` [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google) @ 2026-02-01 3:29 ` Masami Hiramatsu (Google) 2026-02-05 2:17 ` Steven Rostedt 2026-02-01 3:29 ` [PATCH v6 3/4] tracing: Remove the backup instance automatically after read Masami Hiramatsu (Google) ` (2 subsequent siblings) 4 siblings, 1 reply; 16+ messages in thread From: Masami Hiramatsu (Google) @ 2026-02-01 3:29 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-trace-kernel From: Masami Hiramatsu (Google) <mhiramat@kernel.org> Since there is no reason to reuse the backup instance, make it readonly (but erasable). Note that only backup instances are readonly, because other trace instances will be empty unless it is writable. Only backup instances have copy entries from the original. With this change, most of the trace control files are removed from the backup instance, including eventfs enable/filter etc. # find /sys/kernel/tracing/instances/backup/events/ | wc -l 4093 # find /sys/kernel/tracing/instances/boot_map/events/ | wc -l 9573 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> --- Changes in v6: - Remove tracing_on file from readonly instances. - Remove unused writable_mode from tracing_init_tracefs_percpu(). - Cleanup init_tracer_tracefs() and create_event_toplevel_files(). - Remove TRACE_MODE_WRITE_MASK. - Add TRACE_ARRAY_FL_RDONLY. Changes in v5: - Rebased on the latest for-next (and hide show_event_filters/triggers if the instance is readonly. Changes in v4: - Make trace data erasable. (not reusable) Changes in v3: - Resuse the beginning part of event_entries for readonly files. - Remove readonly file_operations and checking readonly flag in each write operation. Changes in v2: - Use readonly file_operations to prohibit writing instead of checking flags in write() callbacks. - Remove writable files from eventfs. --- kernel/trace/trace.c | 94 +++++++++++++++++++++++++++++-------------- kernel/trace/trace.h | 7 +++ kernel/trace/trace_boot.c | 5 +- kernel/trace/trace_events.c | 76 ++++++++++++++++++++--------------- 4 files changed, 117 insertions(+), 65 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5c3e4a554143..b0efcf1e0809 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5052,6 +5052,11 @@ static ssize_t tracing_write_stub(struct file *filp, const char __user *ubuf, size_t count, loff_t *ppos) { + struct trace_array *tr = file_inode(filp)->i_private; + + if (trace_array_is_readonly(tr)) + return -EPERM; + return count; } @@ -5152,6 +5157,9 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, cpumask_var_t tracing_cpumask_new; int err; + if (trace_array_is_readonly(tr)) + return -EPERM; + if (count == 0 || count > KMALLOC_MAX_SIZE) return -EINVAL; @@ -6436,6 +6444,9 @@ tracing_set_trace_write(struct file *filp, const char __user *ubuf, size_t ret; int err; + if (trace_array_is_readonly(tr)) + return -EPERM; + ret = cnt; if (cnt > MAX_TRACER_SIZE) @@ -7070,6 +7081,9 @@ tracing_entries_write(struct file *filp, const char __user *ubuf, unsigned long val; int ret; + if (trace_array_is_readonly(tr)) + return -EPERM; + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); if (ret) return ret; @@ -7824,6 +7838,9 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf, const char *clockstr; int ret; + if (trace_array_is_readonly(tr)) + return -EPERM; + if (cnt >= sizeof(buf)) return -EINVAL; @@ -9846,6 +9863,9 @@ rb_simple_write(struct file *filp, const char __user *ubuf, unsigned long val; int ret; + if (trace_array_is_readonly(tr)) + return -EPERM; + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); if (ret) return ret; @@ -9952,6 +9972,9 @@ buffer_subbuf_size_write(struct file *filp, const char __user *ubuf, int pages; int ret; + if (trace_array_is_readonly(tr)) + return -EPERM; + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); if (ret) return ret; @@ -10632,17 +10655,22 @@ static __init void create_trace_instances(struct dentry *d_tracer) static void init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) { + umode_t writable_mode = TRACE_MODE_WRITE; int cpu; + if (trace_array_is_readonly(tr)) + writable_mode = TRACE_MODE_READ; + trace_create_file("available_tracers", TRACE_MODE_READ, d_tracer, - tr, &show_traces_fops); + tr, &show_traces_fops); - trace_create_file("current_tracer", TRACE_MODE_WRITE, d_tracer, - tr, &set_tracer_fops); + trace_create_file("current_tracer", writable_mode, d_tracer, + tr, &set_tracer_fops); - trace_create_file("tracing_cpumask", TRACE_MODE_WRITE, d_tracer, + trace_create_file("tracing_cpumask", writable_mode, d_tracer, tr, &tracing_cpumask_fops); + /* Options are used for changing print-format even for readonly instance. */ trace_create_file("trace_options", TRACE_MODE_WRITE, d_tracer, tr, &tracing_iter_fops); @@ -10652,12 +10680,36 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) trace_create_file("trace_pipe", TRACE_MODE_READ, d_tracer, tr, &tracing_pipe_fops); - trace_create_file("buffer_size_kb", TRACE_MODE_WRITE, d_tracer, + trace_create_file("buffer_size_kb", writable_mode, d_tracer, tr, &tracing_entries_fops); trace_create_file("buffer_total_size_kb", TRACE_MODE_READ, d_tracer, tr, &tracing_total_entries_fops); + trace_create_file("trace_clock", writable_mode, d_tracer, tr, + &trace_clock_fops); + + trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr, + &trace_time_stamp_mode_fops); + + tr->buffer_percent = 50; + + trace_create_file("buffer_subbuf_size_kb", writable_mode, d_tracer, + tr, &buffer_subbuf_size_fops); + + create_trace_options_dir(tr); + + if (tr->range_addr_start) + trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer, + tr, &last_boot_fops); + + for_each_tracing_cpu(cpu) + tracing_init_tracefs_percpu(tr, cpu); + + /* Read-only instance has above files only. */ + if (trace_array_is_readonly(tr)) + return; + trace_create_file("free_buffer", 0200, d_tracer, tr, &tracing_free_buffer_fops); @@ -10669,27 +10721,14 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) trace_create_file("trace_marker_raw", 0220, d_tracer, tr, &tracing_mark_raw_fops); - trace_create_file("trace_clock", TRACE_MODE_WRITE, d_tracer, tr, - &trace_clock_fops); - - trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer, - tr, &rb_simple_fops); - - trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr, - &trace_time_stamp_mode_fops); - - tr->buffer_percent = 50; - trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer, - tr, &buffer_percent_fops); - - trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer, - tr, &buffer_subbuf_size_fops); + tr, &buffer_percent_fops); trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer, - tr, &tracing_syscall_buf_fops); + tr, &tracing_syscall_buf_fops); - create_trace_options_dir(tr); + trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer, + tr, &rb_simple_fops); #ifdef CONFIG_TRACER_MAX_TRACE trace_create_maxlat_file(tr, d_tracer); @@ -10698,22 +10737,15 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) if (ftrace_create_function_files(tr, d_tracer)) MEM_FAIL(1, "Could not allocate function filter files"); - if (tr->range_addr_start) { - trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer, - tr, &last_boot_fops); #ifdef CONFIG_TRACER_SNAPSHOT - } else { + if (!tr->range_addr_start) trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, tr, &snapshot_fops); #endif - } trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer, tr, &tracing_err_log_fops); - for_each_tracing_cpu(cpu) - tracing_init_tracefs_percpu(tr, cpu); - ftrace_init_tracefs(tr, d_tracer); } @@ -11540,7 +11572,7 @@ __init static void enable_instances(void) * Backup buffers can be freed but need vfree(). */ if (backup) - tr->flags |= TRACE_ARRAY_FL_VMALLOC; + tr->flags |= TRACE_ARRAY_FL_VMALLOC | TRACE_ARRAY_FL_RDONLY; if (start || backup) { tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 69e7defba6c6..4ff7ebd8a32b 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -455,6 +455,7 @@ enum { TRACE_ARRAY_FL_MOD_INIT = BIT(3), TRACE_ARRAY_FL_MEMMAP = BIT(4), TRACE_ARRAY_FL_VMALLOC = BIT(5), + TRACE_ARRAY_FL_RDONLY = BIT(6), }; #ifdef CONFIG_MODULES @@ -483,6 +484,12 @@ extern bool trace_clock_in_ns(struct trace_array *tr); extern unsigned long trace_adjust_address(struct trace_array *tr, unsigned long addr); +static inline bool trace_array_is_readonly(struct trace_array *tr) +{ + /* backup instance is read only. */ + return tr->flags & TRACE_ARRAY_FL_RDONLY; +} + /* * The global tracer (top) should be the first trace array added, * but we check the flag anyway. diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c index dbe29b4c6a7a..2ca2541c8a58 100644 --- a/kernel/trace/trace_boot.c +++ b/kernel/trace/trace_boot.c @@ -61,7 +61,8 @@ trace_boot_set_instance_options(struct trace_array *tr, struct xbc_node *node) v = memparse(p, NULL); if (v < PAGE_SIZE) pr_err("Buffer size is too small: %s\n", p); - if (tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0) + if (trace_array_is_readonly(tr) || + tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0) pr_err("Failed to resize trace buffer to %s\n", p); } @@ -597,7 +598,7 @@ trace_boot_enable_tracer(struct trace_array *tr, struct xbc_node *node) p = xbc_node_find_value(node, "tracer", NULL); if (p && *p != '\0') { - if (tracing_set_tracer(tr, p) < 0) + if (trace_array_is_readonly(tr) || tracing_set_tracer(tr, p) < 0) pr_err("Failed to set given tracer: %s\n", p); } diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 4972e1a2b5f3..16b9f222407c 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1380,6 +1380,9 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match, { int ret; + if (trace_array_is_readonly(tr)) + return -EPERM; + mutex_lock(&event_mutex); ret = __ftrace_set_clr_event_nolock(tr, match, sub, event, set, mod); mutex_unlock(&event_mutex); @@ -2952,8 +2955,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name, } else __get_system(system); - /* ftrace only has directories no files */ - if (strcmp(name, "ftrace") == 0) + /* ftrace only has directories no files, readonly instance too. */ + if (strcmp(name, "ftrace") == 0 || trace_array_is_readonly(tr)) nr_entries = 0; else nr_entries = ARRAY_SIZE(system_entries); @@ -3118,28 +3121,30 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file) int ret; static struct eventfs_entry event_entries[] = { { - .name = "enable", + .name = "format", .callback = event_callback, - .release = event_release, }, +#ifdef CONFIG_PERF_EVENTS { - .name = "filter", + .name = "id", .callback = event_callback, }, +#endif +#define NR_RO_EVENT_ENTRIES (1 + IS_ENABLED(CONFIG_PERF_EVENTS)) +/* Readonly files must be above this line and counted by NR_RO_EVENT_ENTRIES. */ { - .name = "trigger", + .name = "enable", .callback = event_callback, + .release = event_release, }, { - .name = "format", + .name = "filter", .callback = event_callback, }, -#ifdef CONFIG_PERF_EVENTS { - .name = "id", + .name = "trigger", .callback = event_callback, }, -#endif #ifdef CONFIG_HIST_TRIGGERS { .name = "hist", @@ -3172,7 +3177,10 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file) if (!e_events) return -ENOMEM; - nr_entries = ARRAY_SIZE(event_entries); + if (trace_array_is_readonly(tr)) + nr_entries = NR_RO_EVENT_ENTRIES; + else + nr_entries = ARRAY_SIZE(event_entries); name = trace_event_name(call); ei = eventfs_create_dir(name, e_events, event_entries, nr_entries, file); @@ -4511,31 +4519,44 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) int nr_entries; static struct eventfs_entry events_entries[] = { { - .name = "enable", + .name = "header_page", .callback = events_callback, }, { - .name = "header_page", + .name = "header_event", .callback = events_callback, }, +#define NR_RO_TOP_ENTRIES 2 +/* Readonly files must be above this line and counted by NR_RO_TOP_ENTRIES. */ { - .name = "header_event", + .name = "enable", .callback = events_callback, }, }; - entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, - tr, &ftrace_set_event_fops); - if (!entry) - return -ENOMEM; + if (!trace_array_is_readonly(tr)) { + entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, + tr, &ftrace_set_event_fops); + if (!entry) + return -ENOMEM; - trace_create_file("show_event_filters", TRACE_MODE_READ, parent, tr, - &ftrace_show_event_filters_fops); + /* There are not as crucial, just warn if they are not created */ + trace_create_file("show_event_filters", TRACE_MODE_READ, parent, tr, + &ftrace_show_event_filters_fops); - trace_create_file("show_event_triggers", TRACE_MODE_READ, parent, tr, - &ftrace_show_event_triggers_fops); + trace_create_file("show_event_triggers", TRACE_MODE_READ, parent, tr, + &ftrace_show_event_triggers_fops); - nr_entries = ARRAY_SIZE(events_entries); + trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent, + tr, &ftrace_set_event_pid_fops); + + trace_create_file("set_event_notrace_pid", + TRACE_MODE_WRITE, parent, tr, + &ftrace_set_event_notrace_pid_fops); + nr_entries = ARRAY_SIZE(events_entries); + } else { + nr_entries = NR_RO_TOP_ENTRIES; + } e_events = eventfs_create_events_dir("events", parent, events_entries, nr_entries, tr); @@ -4544,15 +4565,6 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) return -ENOMEM; } - /* There are not as crucial, just warn if they are not created */ - - trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent, - tr, &ftrace_set_event_pid_fops); - - trace_create_file("set_event_notrace_pid", - TRACE_MODE_WRITE, parent, tr, - &ftrace_set_event_notrace_pid_fops); - tr->event_dir = e_events; return 0; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-01 3:29 ` [PATCH v6 2/4] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google) @ 2026-02-05 2:17 ` Steven Rostedt 2026-02-09 9:08 ` Masami Hiramatsu 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-02-05 2:17 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Sun, 1 Feb 2026 12:29:15 +0900 "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote: > From: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > Since there is no reason to reuse the backup instance, make it > readonly (but erasable). > Note that only backup instances are readonly, because > other trace instances will be empty unless it is writable. > Only backup instances have copy entries from the original. > > With this change, most of the trace control files are removed > from the backup instance, including eventfs enable/filter etc. > > # find /sys/kernel/tracing/instances/backup/events/ | wc -l > 4093 > # find /sys/kernel/tracing/instances/boot_map/events/ | wc -l > 9573 > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > --- > Changes in v6: > - Remove tracing_on file from readonly instances. > - Remove unused writable_mode from tracing_init_tracefs_percpu(). > - Cleanup init_tracer_tracefs() and create_event_toplevel_files(). > - Remove TRACE_MODE_WRITE_MASK. > - Add TRACE_ARRAY_FL_RDONLY. > Changes in v5: > - Rebased on the latest for-next (and hide show_event_filters/triggers > if the instance is readonly. > Changes in v4: > - Make trace data erasable. (not reusable) > Changes in v3: > - Resuse the beginning part of event_entries for readonly files. > - Remove readonly file_operations and checking readonly flag in > each write operation. > Changes in v2: > - Use readonly file_operations to prohibit writing instead of > checking flags in write() callbacks. > - Remove writable files from eventfs. > --- > kernel/trace/trace.c | 94 +++++++++++++++++++++++++++++-------------- > kernel/trace/trace.h | 7 +++ > kernel/trace/trace_boot.c | 5 +- > kernel/trace/trace_events.c | 76 ++++++++++++++++++++--------------- > 4 files changed, 117 insertions(+), 65 deletions(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 5c3e4a554143..b0efcf1e0809 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -5052,6 +5052,11 @@ static ssize_t > tracing_write_stub(struct file *filp, const char __user *ubuf, > size_t count, loff_t *ppos) > { > + struct trace_array *tr = file_inode(filp)->i_private; > + > + if (trace_array_is_readonly(tr)) > + return -EPERM; > + > return count; > } > > @@ -5152,6 +5157,9 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, > cpumask_var_t tracing_cpumask_new; > int err; > > + if (trace_array_is_readonly(tr)) > + return -EPERM; > + Shouldn't these checks be done in the open function? Doing it now is too late, as -EPERM on a write is confusing when the open for write succeeds. -- Steve > if (count == 0 || count > KMALLOC_MAX_SIZE) > return -EINVAL; > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-05 2:17 ` Steven Rostedt @ 2026-02-09 9:08 ` Masami Hiramatsu 2026-02-09 23:42 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Masami Hiramatsu @ 2026-02-09 9:08 UTC (permalink / raw) To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Wed, 4 Feb 2026 21:17:21 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Sun, 1 Feb 2026 12:29:15 +0900 > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote: > > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > > > Since there is no reason to reuse the backup instance, make it > > readonly (but erasable). > > Note that only backup instances are readonly, because > > other trace instances will be empty unless it is writable. > > Only backup instances have copy entries from the original. > > > > With this change, most of the trace control files are removed > > from the backup instance, including eventfs enable/filter etc. > > > > # find /sys/kernel/tracing/instances/backup/events/ | wc -l > > 4093 > > # find /sys/kernel/tracing/instances/boot_map/events/ | wc -l > > 9573 > > > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > --- > > Changes in v6: > > - Remove tracing_on file from readonly instances. > > - Remove unused writable_mode from tracing_init_tracefs_percpu(). > > - Cleanup init_tracer_tracefs() and create_event_toplevel_files(). > > - Remove TRACE_MODE_WRITE_MASK. > > - Add TRACE_ARRAY_FL_RDONLY. > > Changes in v5: > > - Rebased on the latest for-next (and hide show_event_filters/triggers > > if the instance is readonly. > > Changes in v4: > > - Make trace data erasable. (not reusable) > > Changes in v3: > > - Resuse the beginning part of event_entries for readonly files. > > - Remove readonly file_operations and checking readonly flag in > > each write operation. > > Changes in v2: > > - Use readonly file_operations to prohibit writing instead of > > checking flags in write() callbacks. > > - Remove writable files from eventfs. > > --- > > kernel/trace/trace.c | 94 +++++++++++++++++++++++++++++-------------- > > kernel/trace/trace.h | 7 +++ > > kernel/trace/trace_boot.c | 5 +- > > kernel/trace/trace_events.c | 76 ++++++++++++++++++++--------------- > > 4 files changed, 117 insertions(+), 65 deletions(-) > > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index 5c3e4a554143..b0efcf1e0809 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -5052,6 +5052,11 @@ static ssize_t > > tracing_write_stub(struct file *filp, const char __user *ubuf, > > size_t count, loff_t *ppos) > > { > > + struct trace_array *tr = file_inode(filp)->i_private; > > + > > + if (trace_array_is_readonly(tr)) > > + return -EPERM; > > + > > return count; > > } > > > > @@ -5152,6 +5157,9 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, > > cpumask_var_t tracing_cpumask_new; > > int err; > > > > + if (trace_array_is_readonly(tr)) > > + return -EPERM; > > + > > Shouldn't these checks be done in the open function? Doing it now is > too late, as -EPERM on a write is confusing when the open for write > succeeds. I've made a small program and straced. Surprisingly, for the super user, open(2) does not return error on opening a readonly file with O_RDWR. With normal user, it returns -EACCES ----- $ strace ./write-test hoge execve("./write-test", ["./write-test", "hoge"], 0x7ffc30b99928 /* 73 vars */) = 0 ... openat(AT_FDCWD, "hoge", O_RDWR) = -1 EACCES (Permission denied) exit_group(-1) = ? +++ exited with 255 +++ ----- But for the superuser case: ----- $ sudo strace ./write-test hoge execve("./write-test", ["./write-test", "hoge"], 0x7ffcffa80488 /* 32 vars */) = 0 ... openat(AT_FDCWD, "hoge", O_RDWR) = 3 write(3, "test\0", 5) = 5 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x2), ...}) = 0 write(1, "write return 5\n", 15write return 5 ) = 15 exit_group(0) = ? +++ exited with 0 +++ So I think we can postpone it until actual action for the superuser case. (Anyway, I will make it return -EACCES) Thank you, > > -- Steve > > > if (count == 0 || count > KMALLOC_MAX_SIZE) > > return -EINVAL; > > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-09 9:08 ` Masami Hiramatsu @ 2026-02-09 23:42 ` Steven Rostedt 2026-02-10 5:14 ` Masami Hiramatsu 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-02-09 23:42 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Mon, 9 Feb 2026 18:08:44 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > > @@ -5152,6 +5157,9 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, > > > cpumask_var_t tracing_cpumask_new; > > > int err; > > > > > > + if (trace_array_is_readonly(tr)) > > > + return -EPERM; > > > + > > > > Shouldn't these checks be done in the open function? Doing it now is > > too late, as -EPERM on a write is confusing when the open for write > > succeeds. > > I've made a small program and straced. Surprisingly, for the super user, > open(2) does not return error on opening a readonly file with O_RDWR. *blink* So if on open, the trace_array_is_read_only(tr) returns true and you return -EPREM, it still succeeds? That sounds like a bug! -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-09 23:42 ` Steven Rostedt @ 2026-02-10 5:14 ` Masami Hiramatsu 2026-02-11 15:42 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Masami Hiramatsu @ 2026-02-10 5:14 UTC (permalink / raw) To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Mon, 9 Feb 2026 18:42:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Mon, 9 Feb 2026 18:08:44 +0900 > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > > > @@ -5152,6 +5157,9 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, > > > > cpumask_var_t tracing_cpumask_new; > > > > int err; > > > > > > > > + if (trace_array_is_readonly(tr)) > > > > + return -EPERM; > > > > + > > > > > > Shouldn't these checks be done in the open function? Doing it now is > > > too late, as -EPERM on a write is confusing when the open for write > > > succeeds. > > > > I've made a small program and straced. Surprisingly, for the super user, > > open(2) does not return error on opening a readonly file with O_RDWR. > > *blink* > > So if on open, the trace_array_is_read_only(tr) returns true and you > return -EPREM, it still succeeds? That sounds like a bug! Hmm, OK. Now I found how sysfs handles it. /* * For regular files, if the opener has CAP_DAC_OVERRIDE, open(2) * succeeds regardless of the RW permissions. sysfs had an extra * layer of enforcement where open(2) fails with -EACCES regardless * of CAP_DAC_OVERRIDE if the permission doesn't have the * respective read or write access at all (none of S_IRUGO or * S_IWUGO) or the respective operation isn't implemented. The * following flag enables that behavior. */ KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK = 0x0002, So for the similar reason, I will make tracefs to check the permission even if CAP_DAC_OVERRIDE is set. (But this check should be done in general, instead of each open() operation) Thank you, > -- Steve -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-10 5:14 ` Masami Hiramatsu @ 2026-02-11 15:42 ` Steven Rostedt 2026-02-12 4:24 ` Masami Hiramatsu 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-02-11 15:42 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Tue, 10 Feb 2026 14:14:15 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > Hmm, OK. Now I found how sysfs handles it. > > /* > * For regular files, if the opener has CAP_DAC_OVERRIDE, open(2) > * succeeds regardless of the RW permissions. sysfs had an extra > * layer of enforcement where open(2) fails with -EACCES regardless > * of CAP_DAC_OVERRIDE if the permission doesn't have the > * respective read or write access at all (none of S_IRUGO or > * S_IWUGO) or the respective operation isn't implemented. The > * following flag enables that behavior. > */ > KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK = 0x0002, > > So for the similar reason, I will make tracefs to check the permission > even if CAP_DAC_OVERRIDE is set. (But this check should be done in general, > instead of each open() operation) > I don't believe this is the same. This is about an instance being truly read only. The instance is special, not the files. Note, permissions can be changed by root too. After applying your patches, I did the following: ~# cd /sys/kernel/tracing/instances/backup/ ~# ls -l current_tracer -r--r----- 1 root root 0 Feb 11 10:29 current_tracer ~# cat current_tracer nop ~# cat trace # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | ~# chmod 664 current_tracer ~# ls -l current_tracer -rw-rw-r-- 1 root root 0 Feb 11 10:29 current_tracer ~# echo function > current_tracer ~# cat current_tracer function ~# cat trace # tracer: function # # entries-in-buffer/entries-written: 1750306/2076556 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | bash-1056 [001] ..... 231.448852: mutex_unlock <-tracing_set_tracer <idle>-0 [002] ...1. 231.448853: arch_cpu_idle_exit <-do_idle ##### CPU 7 buffer started #### <idle>-0 [007] ...1. 231.448853: arch_cpu_idle_exit <-do_idle bash-1056 [001] ..... 231.448854: __mutex_unlock_slowpath <-tracing_set_tracer <idle>-0 [002] d..1. 231.448855: arch_cpu_idle_enter <-do_idle <idle>-0 [007] d..1. 231.448855: arch_cpu_idle_enter <-do_idle <idle>-0 [007] d..1. 231.448855: tsc_verify_tsc_adjust <-arch_cpu_idle_enter <idle>-0 [002] d..1. 231.448855: tsc_verify_tsc_adjust <-arch_cpu_idle_enter bash-1056 [001] d.... 231.448856: fpregs_assert_state_consistent <-arch_exit_to_user_mode_prepare <idle>-0 [007] d..1. 231.448856: local_touch_nmi <-do_idle <idle>-0 [002] d..1. 231.448856: local_touch_nmi <-do_idle bash-1056 [001] d.... 231.448856: switch_fpu_return <-arch_exit_to_user_mode_prepare <idle>-0 [007] d..1. 231.448856: rcu_nocb_flush_deferred_wakeup <-do_idle <idle>-0 [002] d..1. 231.448856: rcu_nocb_flush_deferred_wakeup <-do_idle <idle>-0 [007] d..1. 231.448857: cpuidle_get_cpu_driver <-do_idle <idle>-0 [002] d..1. 231.448857: cpuidle_get_cpu_driver <-do_idle <idle>-0 [007] d..1. 231.448857: cpuidle_not_available <-do_idle [..] Not too read only! I change permissions all the time for tracefs files, so I don't want that changed. This is not the same as sysfs. Let's keep it simple. Have all the open callers that can do writes return error -EACCES if a file allows writes and is open for write, but is part of the read only instance. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 2/4] tracing: Make the backup instance non-reusable 2026-02-11 15:42 ` Steven Rostedt @ 2026-02-12 4:24 ` Masami Hiramatsu 0 siblings, 0 replies; 16+ messages in thread From: Masami Hiramatsu @ 2026-02-12 4:24 UTC (permalink / raw) To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel On Wed, 11 Feb 2026 10:42:44 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Tue, 10 Feb 2026 14:14:15 +0900 > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > > Hmm, OK. Now I found how sysfs handles it. > > > > /* > > * For regular files, if the opener has CAP_DAC_OVERRIDE, open(2) > > * succeeds regardless of the RW permissions. sysfs had an extra > > * layer of enforcement where open(2) fails with -EACCES regardless > > * of CAP_DAC_OVERRIDE if the permission doesn't have the > > * respective read or write access at all (none of S_IRUGO or > > * S_IWUGO) or the respective operation isn't implemented. The > > * following flag enables that behavior. > > */ > > KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK = 0x0002, > > > > So for the similar reason, I will make tracefs to check the permission > > even if CAP_DAC_OVERRIDE is set. (But this check should be done in general, > > instead of each open() operation) > > > > I don't believe this is the same. This is about an instance being truly > read only. The instance is special, not the files. Note, permissions can > be changed by root too. Ah, OK. Let me add read only checks in all related .open operations. > After applying your patches, I did the following: > > ~# cd /sys/kernel/tracing/instances/backup/ > ~# ls -l current_tracer > -r--r----- 1 root root 0 Feb 11 10:29 current_tracer > > ~# cat current_tracer > nop > > ~# cat trace > # tracer: nop > # > # entries-in-buffer/entries-written: 0/0 #P:8 > # > # _-----=> irqs-off/BH-disabled > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / _-=> migrate-disable > # |||| / delay > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > # | | | ||||| | | > > ~# chmod 664 current_tracer > ~# ls -l current_tracer > -rw-rw-r-- 1 root root 0 Feb 11 10:29 current_tracer Ah, OK... > > ~# echo function > current_tracer > ~# cat current_tracer > function > > ~# cat trace > # tracer: function > # > # entries-in-buffer/entries-written: 1750306/2076556 #P:8 > # > # _-----=> irqs-off/BH-disabled > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / _-=> migrate-disable > # |||| / delay > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > # | | | ||||| | | > bash-1056 [001] ..... 231.448852: mutex_unlock <-tracing_set_tracer > <idle>-0 [002] ...1. 231.448853: arch_cpu_idle_exit <-do_idle > ##### CPU 7 buffer started #### > <idle>-0 [007] ...1. 231.448853: arch_cpu_idle_exit <-do_idle > bash-1056 [001] ..... 231.448854: __mutex_unlock_slowpath <-tracing_set_tracer > <idle>-0 [002] d..1. 231.448855: arch_cpu_idle_enter <-do_idle > <idle>-0 [007] d..1. 231.448855: arch_cpu_idle_enter <-do_idle > <idle>-0 [007] d..1. 231.448855: tsc_verify_tsc_adjust <-arch_cpu_idle_enter > <idle>-0 [002] d..1. 231.448855: tsc_verify_tsc_adjust <-arch_cpu_idle_enter > bash-1056 [001] d.... 231.448856: fpregs_assert_state_consistent <-arch_exit_to_user_mode_prepare > <idle>-0 [007] d..1. 231.448856: local_touch_nmi <-do_idle > <idle>-0 [002] d..1. 231.448856: local_touch_nmi <-do_idle > bash-1056 [001] d.... 231.448856: switch_fpu_return <-arch_exit_to_user_mode_prepare > <idle>-0 [007] d..1. 231.448856: rcu_nocb_flush_deferred_wakeup <-do_idle > <idle>-0 [002] d..1. 231.448856: rcu_nocb_flush_deferred_wakeup <-do_idle > <idle>-0 [007] d..1. 231.448857: cpuidle_get_cpu_driver <-do_idle > <idle>-0 [002] d..1. 231.448857: cpuidle_get_cpu_driver <-do_idle > <idle>-0 [007] d..1. 231.448857: cpuidle_not_available <-do_idle > [..] > > Not too read only! > > I change permissions all the time for tracefs files, so I don't want > that changed. > > This is not the same as sysfs. Let's keep it simple. Have all the open > callers that can do writes return error -EACCES if a file allows writes > and is open for write, but is part of the read only instance. OK. Thank you, > > -- Steve > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v6 3/4] tracing: Remove the backup instance automatically after read 2026-02-01 3:28 [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu (Google) 2026-02-01 3:29 ` [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google) 2026-02-01 3:29 ` [PATCH v6 2/4] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google) @ 2026-02-01 3:29 ` Masami Hiramatsu (Google) 2026-02-01 3:29 ` [PATCH v6 4/4] tracing/Documentation: Add a section about backup instance Masami Hiramatsu (Google) 2026-02-05 0:41 ` [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu 4 siblings, 0 replies; 16+ messages in thread From: Masami Hiramatsu (Google) @ 2026-02-01 3:29 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-trace-kernel From: Masami Hiramatsu (Google) <mhiramat@kernel.org> Since the backup instance is readonly, after reading all data via pipe, no data is left on the instance. Thus it can be removed safely after closing all files. This also removes it if user resets the ring buffer manually via 'trace' file. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> --- Changes in v6: - Fix typo in comment. - Only when there is a readonly trace array, initialize autoremove_wq. - Fix to exit loop in trace_array_get() if tr is found in the list. Changes in v4: - Update description. --- kernel/trace/trace.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++- kernel/trace/trace.h | 6 +++++ 2 files changed, 66 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b0efcf1e0809..9d09befd5da5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -590,6 +590,51 @@ void trace_set_ring_buffer_expanded(struct trace_array *tr) tr->ring_buffer_expanded = true; } +static int __remove_instance(struct trace_array *tr); + +static void trace_array_autoremove(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, autoremove_work); + + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock); + + /* + * This can be fail if someone gets @tr before starting this + * function, but in that case, this will be kicked again when + * putting it. So we don't care about the result. + */ + __remove_instance(tr); +} + +static struct workqueue_struct *autoremove_wq; + +static void trace_array_kick_autoremove(struct trace_array *tr) +{ + if (autoremove_wq && !work_pending(&tr->autoremove_work)) + queue_work(autoremove_wq, &tr->autoremove_work); +} + +static void trace_array_cancel_autoremove(struct trace_array *tr) +{ + if (autoremove_wq && work_pending(&tr->autoremove_work)) + cancel_work(&tr->autoremove_work); +} + +static void trace_array_init_autoremove(struct trace_array *tr) +{ + INIT_WORK(&tr->autoremove_work, trace_array_autoremove); + + /* Only readonly trace_array can kick the autoremove. */ + if (!trace_array_is_readonly(tr) || autoremove_wq) + return; + + autoremove_wq = alloc_workqueue("tr_autoremove_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!autoremove_wq) + pr_warn("Unable to allocate tr_autoremove_wq. autoremove disabled.\n"); +} + LIST_HEAD(ftrace_trace_arrays); int trace_array_get(struct trace_array *this_tr) @@ -599,7 +644,8 @@ int trace_array_get(struct trace_array *this_tr) guard(mutex)(&trace_types_lock); list_for_each_entry(tr, &ftrace_trace_arrays, list) { if (tr == this_tr) { - tr->ref++; + if (!tr->free_on_close) + tr->ref++; return 0; } } @@ -611,6 +657,12 @@ static void __trace_array_put(struct trace_array *this_tr) { WARN_ON(!this_tr->ref); this_tr->ref--; + /* + * When free_on_close is set, prepare removing the array + * when the last reference is released. + */ + if (this_tr->ref == 1 && this_tr->free_on_close) + trace_array_kick_autoremove(this_tr); } /** @@ -6237,6 +6289,10 @@ static void update_last_data(struct trace_array *tr) /* Only if the buffer has previous boot data clear and update it. */ tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT; + /* If this is a backup instance, mark it for autoremove. */ + if (tr->flags & TRACE_ARRAY_FL_VMALLOC) + tr->free_on_close = true; + /* Reset the module list and reload them */ if (tr->scratch) { struct trace_scratch *tscratch = tr->scratch; @@ -10415,6 +10471,8 @@ trace_array_create_systems(const char *name, const char *systems, if (ftrace_allocate_ftrace_ops(tr) < 0) goto out_free_tr; + trace_array_init_autoremove(tr); + ftrace_init_trace_array(tr); init_trace_flags_index(tr); @@ -10563,6 +10621,7 @@ static int __remove_instance(struct trace_array *tr) if (update_marker_trace(tr, 0)) synchronize_rcu(); + trace_array_cancel_autoremove(tr); tracing_set_nop(tr); clear_ftrace_function_probes(tr); event_trace_del_tracer(tr); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4ff7ebd8a32b..4c5ef9d8f8ad 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -446,6 +446,12 @@ struct trace_array { * we do not waste memory on systems that are not using tracing. */ bool ring_buffer_expanded; + /* + * If the ring buffer is a read only backup instance, it will be + * removed after dumping all data via pipe, because no readable data. + */ + bool free_on_close; + struct work_struct autoremove_work; }; enum { ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 4/4] tracing/Documentation: Add a section about backup instance 2026-02-01 3:28 [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu (Google) ` (2 preceding siblings ...) 2026-02-01 3:29 ` [PATCH v6 3/4] tracing: Remove the backup instance automatically after read Masami Hiramatsu (Google) @ 2026-02-01 3:29 ` Masami Hiramatsu (Google) 2026-02-05 0:41 ` [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu 4 siblings, 0 replies; 16+ messages in thread From: Masami Hiramatsu (Google) @ 2026-02-01 3:29 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-trace-kernel From: Masami Hiramatsu (Google) <mhiramat@kernel.org> Add a section about backup instance to the debugging.rst. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> --- Changes in v6: - Fix typos. --- Documentation/trace/debugging.rst | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst index 4d88c346fc38..15857951b506 100644 --- a/Documentation/trace/debugging.rst +++ b/Documentation/trace/debugging.rst @@ -159,3 +159,22 @@ If setting it from the kernel command line, it is recommended to also disable tracing with the "traceoff" flag, and enable tracing after boot up. Otherwise the trace from the most recent boot will be mixed with the trace from the previous boot, and may make it confusing to read. + +Using a backup instance for keeping previous boot data +------------------------------------------------------ + +It is also possible to record trace data at system boot time by specifying +events with the persistent ring buffer, but in this case the data before the +reboot will be lost before it can be read. This problem can be solved by a +backup instance. From the kernel command line:: + + reserve_mem=12M:4096:trace trace_instance=boot_map@trace,sched,irq trace_instance=backup=boot_map + +On boot up, the previous data in the "boot_map" is copied to the "backup" +instance, and the "sched:*" and "irq:*" events for the current boot are traced +in the "boot_map". Thus the user can read the previous boot data from the "backup" +instance without stopping the trace. + +Note that this "backup" instance is readonly, and will be removed automatically +if you clear the trace data or read out all trace data from the "trace_pipe" +or the "trace_pipe_raw" files. \ No newline at end of file ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6 0/4] tracing: Remove backup instance after read all 2026-02-01 3:28 [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu (Google) ` (3 preceding siblings ...) 2026-02-01 3:29 ` [PATCH v6 4/4] tracing/Documentation: Add a section about backup instance Masami Hiramatsu (Google) @ 2026-02-05 0:41 ` Masami Hiramatsu 4 siblings, 0 replies; 16+ messages in thread From: Masami Hiramatsu @ 2026-02-05 0:41 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, linux-trace-kernel Hi Steve, Can you review this series? On Sun, 1 Feb 2026 12:28:55 +0900 "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote: > Hi, > > Here is the v6 of the series to improve backup instances of > the persistent ring buffer. The previous version is here: > > https://lore.kernel.org/all/176955897718.2786091.11948759407196200082.stgit@mhiramat.tok.corp.google.com/ > > This version cleanup code and fix typo, according to Steve's > comments. Also fix to init autoremove_wq only when the readonly > instance has been made. A major UI change is that the tracing_on > file is also removed from readonly instance. > > Since backup instances are a kind of snapshot of the persistent > ring buffer, it should be readonly. And if it is readonly > there is no reason to keep it after reading all data via trace_pipe > because the data has been consumed. But user should be able to remove > the readonly instance by rmdir or truncating `trace` file. > > Thus, [2/4] makes backup instances readonly (not able to write any > events, cleanup trace, change buffer size). Also, [3/4] removes the > backup instance after consuming all data via trace_pipe. > With this improvements, even if we makes a backup instance (using > the same amount of memory of the persistent ring buffer), it will > be removed after reading the data automatically. > > --- > > Masami Hiramatsu (Google) (4): > tracing: Reset last_boot_info if ring buffer is reset > tracing: Make the backup instance non-reusable > tracing: Remove the backup instance automatically after read > tracing/Documentation: Add a section about backup instance > > > Documentation/trace/debugging.rst | 19 ++++ > kernel/trace/trace.c | 162 ++++++++++++++++++++++++++++++------- > kernel/trace/trace.h | 13 +++ > kernel/trace/trace_boot.c | 5 + > kernel/trace/trace_events.c | 76 ++++++++++------- > 5 files changed, 209 insertions(+), 66 deletions(-) > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org> -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-02-12 4:24 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-01 3:28 [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu (Google) 2026-02-01 3:29 ` [PATCH v6 1/4] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google) 2026-02-05 1:40 ` Steven Rostedt 2026-02-06 13:50 ` Masami Hiramatsu 2026-02-11 18:28 ` Steven Rostedt 2026-02-11 18:33 ` Steven Rostedt 2026-02-01 3:29 ` [PATCH v6 2/4] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google) 2026-02-05 2:17 ` Steven Rostedt 2026-02-09 9:08 ` Masami Hiramatsu 2026-02-09 23:42 ` Steven Rostedt 2026-02-10 5:14 ` Masami Hiramatsu 2026-02-11 15:42 ` Steven Rostedt 2026-02-12 4:24 ` Masami Hiramatsu 2026-02-01 3:29 ` [PATCH v6 3/4] tracing: Remove the backup instance automatically after read Masami Hiramatsu (Google) 2026-02-01 3:29 ` [PATCH v6 4/4] tracing/Documentation: Add a section about backup instance Masami Hiramatsu (Google) 2026-02-05 0:41 ` [PATCH v6 0/4] tracing: Remove backup instance after read all Masami Hiramatsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox