* [PATCH v8 0/6] tracing: Remove backup instance after read all
@ 2026-02-10 8:43 Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 1/6] tracing: Fix to set write permission to per-cpu buffer_size_kb Masami Hiramatsu (Google)
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
Hi,
Here is the v8 of the series to improve backup instances of
the persistent ring buffer. The previous version is here:
https://lore.kernel.org/all/177062912135.1230888.17419570791737357433.stgit@mhiramat.tok.corp.google.com/
In this version, I modified the tracefs to check the file permission
even if the user has CAP_DAC_OVERRIDE as same as sysfs[3/6] and remove
read-only check from each read() operations[4/6]. Also add a bugfix [2/6]
for per-cpu buffer_size_kb permission.
Series Description
------------------
Since backup instances are a kind of snapshot of the persistent
ring buffer, it should be readonly. And if it is readonly
there is no reason to keep it after reading all data via trace_pipe
because the data has been consumed. But user should be able to remove
the readonly instance by rmdir or truncating `trace` file.
Thus, [3/5] makes backup instances readonly (not able to write any
events, cleanup trace, change buffer size). Also, [4/5] removes the
backup instance after consuming all data via trace_pipe.
With this improvements, even if we makes a backup instance (using
the same amount of memory of the persistent ring buffer), it will
be removed after reading the data automatically.
---
Masami Hiramatsu (Google) (6):
tracing: Fix to set write permission to per-cpu buffer_size_kb
tracing: Reset last_boot_info if ring buffer is reset
tracefs: Check file permission even if user has CAP_DAC_OVERRIDE
tracing: Make the backup instance non-reusable
tracing: Remove the backup instance automatically after read
tracing/Documentation: Add a section about backup instance
Documentation/trace/debugging.rst | 19 +++++
fs/tracefs/event_inode.c | 2 +
fs/tracefs/inode.c | 36 +++++++++-
fs/tracefs/internal.h | 3 +
kernel/trace/trace.c | 140 ++++++++++++++++++++++++++++---------
kernel/trace/trace.h | 13 +++
kernel/trace/trace_boot.c | 5 +
kernel/trace/trace_events.c | 76 ++++++++++++--------
8 files changed, 224 insertions(+), 70 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v8 1/6] tracing: Fix to set write permission to per-cpu buffer_size_kb
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
@ 2026-02-10 8:43 ` Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 2/6] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google)
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since the per-cpu buffer_size_kb file is writable for changing
per-cpu ring buffer size, the file should have the write access
permission.
Fixes: 21ccc9cd7211 ("tracing: Disable "other" permission bits in the tracefs files")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v8: Newly added.
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 845b8a165daf..fd470675809b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8613,7 +8613,7 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu)
trace_create_cpu_file("stats", TRACE_MODE_READ, d_cpu,
tr, cpu, &tracing_stats_fops);
- trace_create_cpu_file("buffer_size_kb", TRACE_MODE_READ, d_cpu,
+ trace_create_cpu_file("buffer_size_kb", TRACE_MODE_WRITE, d_cpu,
tr, cpu, &tracing_entries_fops);
if (tr->range_addr_start)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 2/6] tracing: Reset last_boot_info if ring buffer is reset
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 1/6] tracing: Fix to set write permission to per-cpu buffer_size_kb Masami Hiramatsu (Google)
@ 2026-02-10 8:43 ` Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE Masami Hiramatsu (Google)
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Commit 32dc0042528d ("tracing: Reset last-boot buffers when reading
out all cpu buffers") resets the last_boot_info when user read out
all data via trace_pipe* files. But it is not reset when user
resets the buffer from other files. (e.g. write `trace` file)
Reset it when the corresponding ring buffer is reset too.
Fixes: 32dc0042528d ("tracing: Reset last-boot buffers when reading out all cpu buffers")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v7:
- Remove unneeded update_last_data_if_empty() call from
tracing_snapshot_write() because snapshot is disabled on
persistent instances.
---
kernel/trace/trace.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index fd470675809b..e884d32b7895 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4127,6 +4127,8 @@ static int tracing_single_release_tr(struct inode *inode, struct file *file)
return single_release(inode, file);
}
+static bool update_last_data_if_empty(struct trace_array *tr);
+
static int tracing_open(struct inode *inode, struct file *file)
{
struct trace_array *tr = inode->i_private;
@@ -4151,6 +4153,8 @@ static int tracing_open(struct inode *inode, struct file *file)
tracing_reset_online_cpus(trace_buf);
else
tracing_reset_cpu(trace_buf, cpu);
+
+ update_last_data_if_empty(tr);
}
if (file->f_mode & FMODE_READ) {
@@ -5215,6 +5219,7 @@ tracing_set_trace_read(struct file *filp, char __user *ubuf,
int tracer_init(struct tracer *t, struct trace_array *tr)
{
tracing_reset_online_cpus(&tr->array_buffer);
+ update_last_data_if_empty(tr);
return t->init(tr);
}
@@ -7028,6 +7033,7 @@ int tracing_set_clock(struct trace_array *tr, const char *clockstr)
ring_buffer_set_clock(tr->snapshot_buffer.buffer, trace_clocks[i].func);
tracing_reset_online_cpus(&tr->snapshot_buffer);
#endif
+ update_last_data_if_empty(tr);
if (tr->scratch && !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT)) {
struct trace_scratch *tscratch = tr->scratch;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 1/6] tracing: Fix to set write permission to per-cpu buffer_size_kb Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 2/6] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google)
@ 2026-02-10 8:43 ` Masami Hiramatsu (Google)
2026-02-11 15:46 ` Steven Rostedt
2026-03-24 16:43 ` Steven Rostedt
2026-02-10 8:43 ` [PATCH v8 4/6] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google)
` (2 subsequent siblings)
5 siblings, 2 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Strictly checking the file read/write permission even if the owner has
CAP_DAC_OVERRIDE on tracefs as same as sysfs.
Tracefs is a pseudo filesystem, just like sysfs, so any file that the
system defines as unwritable should actually be unwritable by anyone.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
fs/tracefs/event_inode.c | 2 ++
fs/tracefs/inode.c | 36 +++++++++++++++++++++++++++++++++---
fs/tracefs/internal.h | 3 +++
3 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 61cbdafa2411..65e8be761e79 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -233,10 +233,12 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
static const struct inode_operations eventfs_dir_inode_operations = {
.lookup = eventfs_root_lookup,
.setattr = eventfs_set_attr,
+ .permission = tracefs_permission,
};
static const struct inode_operations eventfs_file_inode_operations = {
.setattr = eventfs_set_attr,
+ .permission = tracefs_permission,
};
static const struct file_operations eventfs_file_operations = {
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index d9d8932a7b9c..eb1ddc0cc13a 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -212,10 +212,40 @@ static void set_tracefs_inode_owner(struct inode *inode)
inode->i_gid = gid;
}
-static int tracefs_permission(struct mnt_idmap *idmap,
- struct inode *inode, int mask)
+int tracefs_permission(struct mnt_idmap *idmap,
+ struct inode *inode, int mask)
{
- set_tracefs_inode_owner(inode);
+ struct tracefs_inode *ti = get_tracefs(inode);
+ const struct file_operations *fops;
+
+ if (!(ti->flags & TRACEFS_EVENT_INODE))
+ set_tracefs_inode_owner(inode);
+
+ /*
+ * Like sysfs, file permission checks are performed even for superuser
+ * with CAP_DAC_OVERRIDE. See the KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK
+ * definition in linux/kernfs.h.
+ */
+ if (mask & MAY_OPEN) {
+ fops = inode->i_fop;
+
+ if (mask & MAY_WRITE) {
+ if (!(inode->i_mode & 0222))
+ return -EACCES;
+ if (!fops || (!fops->write && !fops->write_iter &&
+ !fops->mmap))
+ return -EACCES;
+ }
+
+ if (mask & MAY_READ) {
+ if (!(inode->i_mode & 0444))
+ return -EACCES;
+ if (!fops || (!fops->read && !fops->read_iter &&
+ !fops->mmap && !fops->splice_read))
+ return -EACCES;
+ }
+ }
+
return generic_permission(idmap, inode, mask);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index d83c2a25f288..1e49ba445ba3 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -76,4 +76,7 @@ struct inode *tracefs_get_inode(struct super_block *sb);
void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid);
void eventfs_d_release(struct dentry *dentry);
+int tracefs_permission(struct mnt_idmap *idmap,
+ struct inode *inode, int mask);
+
#endif /* _TRACEFS_INTERNAL_H */
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 4/6] tracing: Make the backup instance non-reusable
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
` (2 preceding siblings ...)
2026-02-10 8:43 ` [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE Masami Hiramatsu (Google)
@ 2026-02-10 8:43 ` Masami Hiramatsu (Google)
2026-02-10 8:44 ` [PATCH v8 5/6] tracing: Remove the backup instance automatically after read Masami Hiramatsu (Google)
2026-02-10 8:44 ` [PATCH v8 6/6] tracing/Documentation: Add a section about backup instance Masami Hiramatsu (Google)
5 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since there is no reason to reuse the backup instance, make it
readonly (but erasable).
Note that only backup instances are readonly, because
other trace instances will be empty unless it is writable.
Only backup instances have copy entries from the original.
With this change, most of the trace control files are removed
from the backup instance, including eventfs enable/filter etc.
# find /sys/kernel/tracing/instances/backup/events/ | wc -l
4093
# find /sys/kernel/tracing/instances/boot_map/events/ | wc -l
9573
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v8:
- Remove read-only checks in read() operation.
Changes in v7:
- Return -EACCES instead of -EPERM.
Changes in v6:
- Remove tracing_on file from readonly instances.
- Remove unused writable_mode from tracing_init_tracefs_percpu().
- Cleanup init_tracer_tracefs() and create_event_toplevel_files().
- Remove TRACE_MODE_WRITE_MASK.
- Add TRACE_ARRAY_FL_RDONLY.
Changes in v5:
- Rebased on the latest for-next (and hide show_event_filters/triggers
if the instance is readonly.
Changes in v4:
- Make trace data erasable. (not reusable)
Changes in v3:
- Resuse the beginning part of event_entries for readonly files.
- Remove readonly file_operations and checking readonly flag in
each write operation.
Changes in v2:
- Use readonly file_operations to prohibit writing instead of
checking flags in write() callbacks.
- Remove writable files from eventfs.
---
kernel/trace/trace.c | 71 +++++++++++++++++++++++-----------------
kernel/trace/trace.h | 7 ++++
kernel/trace/trace_boot.c | 5 ++-
kernel/trace/trace_events.c | 76 +++++++++++++++++++++++++------------------
4 files changed, 94 insertions(+), 65 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e884d32b7895..566d1e824360 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9836,17 +9836,22 @@ static __init void create_trace_instances(struct dentry *d_tracer)
static void
init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
{
+ umode_t writable_mode = TRACE_MODE_WRITE;
int cpu;
+ if (trace_array_is_readonly(tr))
+ writable_mode = TRACE_MODE_READ;
+
trace_create_file("available_tracers", TRACE_MODE_READ, d_tracer,
- tr, &show_traces_fops);
+ tr, &show_traces_fops);
- trace_create_file("current_tracer", TRACE_MODE_WRITE, d_tracer,
- tr, &set_tracer_fops);
+ trace_create_file("current_tracer", writable_mode, d_tracer,
+ tr, &set_tracer_fops);
- trace_create_file("tracing_cpumask", TRACE_MODE_WRITE, d_tracer,
+ trace_create_file("tracing_cpumask", writable_mode, d_tracer,
tr, &tracing_cpumask_fops);
+ /* Options are used for changing print-format even for readonly instance. */
trace_create_file("trace_options", TRACE_MODE_WRITE, d_tracer,
tr, &tracing_iter_fops);
@@ -9856,12 +9861,36 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
trace_create_file("trace_pipe", TRACE_MODE_READ, d_tracer,
tr, &tracing_pipe_fops);
- trace_create_file("buffer_size_kb", TRACE_MODE_WRITE, d_tracer,
+ trace_create_file("buffer_size_kb", writable_mode, d_tracer,
tr, &tracing_entries_fops);
trace_create_file("buffer_total_size_kb", TRACE_MODE_READ, d_tracer,
tr, &tracing_total_entries_fops);
+ trace_create_file("trace_clock", writable_mode, d_tracer, tr,
+ &trace_clock_fops);
+
+ trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr,
+ &trace_time_stamp_mode_fops);
+
+ tr->buffer_percent = 50;
+
+ trace_create_file("buffer_subbuf_size_kb", writable_mode, d_tracer,
+ tr, &buffer_subbuf_size_fops);
+
+ create_trace_options_dir(tr);
+
+ if (tr->range_addr_start)
+ trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer,
+ tr, &last_boot_fops);
+
+ for_each_tracing_cpu(cpu)
+ tracing_init_tracefs_percpu(tr, cpu);
+
+ /* Read-only instance has above files only. */
+ if (trace_array_is_readonly(tr))
+ return;
+
trace_create_file("free_buffer", 0200, d_tracer,
tr, &tracing_free_buffer_fops);
@@ -9873,49 +9902,29 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
trace_create_file("trace_marker_raw", 0220, d_tracer,
tr, &tracing_mark_raw_fops);
- trace_create_file("trace_clock", TRACE_MODE_WRITE, d_tracer, tr,
- &trace_clock_fops);
-
- trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer,
- tr, &rb_simple_fops);
-
- trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr,
- &trace_time_stamp_mode_fops);
-
- tr->buffer_percent = 50;
-
trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
- tr, &buffer_percent_fops);
-
- trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer,
- tr, &buffer_subbuf_size_fops);
+ tr, &buffer_percent_fops);
trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_syscall_buf_fops);
+ tr, &tracing_syscall_buf_fops);
- create_trace_options_dir(tr);
+ trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer,
+ tr, &rb_simple_fops);
trace_create_maxlat_file(tr, d_tracer);
if (ftrace_create_function_files(tr, d_tracer))
MEM_FAIL(1, "Could not allocate function filter files");
- if (tr->range_addr_start) {
- trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer,
- tr, &last_boot_fops);
#ifdef CONFIG_TRACER_SNAPSHOT
- } else {
+ if (!tr->range_addr_start)
trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer,
tr, &snapshot_fops);
#endif
- }
trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer,
tr, &tracing_err_log_fops);
- for_each_tracing_cpu(cpu)
- tracing_init_tracefs_percpu(tr, cpu);
-
ftrace_init_tracefs(tr, d_tracer);
}
@@ -10742,7 +10751,7 @@ __init static void enable_instances(void)
* Backup buffers can be freed but need vfree().
*/
if (backup)
- tr->flags |= TRACE_ARRAY_FL_VMALLOC;
+ tr->flags |= TRACE_ARRAY_FL_VMALLOC | TRACE_ARRAY_FL_RDONLY;
if (start || backup) {
tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 649fdd20fc91..393be92768f1 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -459,6 +459,7 @@ enum {
TRACE_ARRAY_FL_MOD_INIT = BIT(3),
TRACE_ARRAY_FL_MEMMAP = BIT(4),
TRACE_ARRAY_FL_VMALLOC = BIT(5),
+ TRACE_ARRAY_FL_RDONLY = BIT(6),
};
#ifdef CONFIG_MODULES
@@ -488,6 +489,12 @@ extern unsigned long trace_adjust_address(struct trace_array *tr, unsigned long
extern struct trace_array *printk_trace;
+static inline bool trace_array_is_readonly(struct trace_array *tr)
+{
+ /* backup instance is read only. */
+ return tr->flags & TRACE_ARRAY_FL_RDONLY;
+}
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index dbe29b4c6a7a..2ca2541c8a58 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -61,7 +61,8 @@ trace_boot_set_instance_options(struct trace_array *tr, struct xbc_node *node)
v = memparse(p, NULL);
if (v < PAGE_SIZE)
pr_err("Buffer size is too small: %s\n", p);
- if (tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
+ if (trace_array_is_readonly(tr) ||
+ tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
pr_err("Failed to resize trace buffer to %s\n", p);
}
@@ -597,7 +598,7 @@ trace_boot_enable_tracer(struct trace_array *tr, struct xbc_node *node)
p = xbc_node_find_value(node, "tracer", NULL);
if (p && *p != '\0') {
- if (tracing_set_tracer(tr, p) < 0)
+ if (trace_array_is_readonly(tr) || tracing_set_tracer(tr, p) < 0)
pr_err("Failed to set given tracer: %s\n", p);
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 61fe01dce7a6..b493cbdf0ea0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1396,6 +1396,9 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match,
{
int ret;
+ if (trace_array_is_readonly(tr))
+ return -EACCES;
+
mutex_lock(&event_mutex);
ret = __ftrace_set_clr_event_nolock(tr, match, sub, event, set, mod);
mutex_unlock(&event_mutex);
@@ -2968,8 +2971,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
} else
__get_system(system);
- /* ftrace only has directories no files */
- if (strcmp(name, "ftrace") == 0)
+ /* ftrace only has directories no files, readonly instance too. */
+ if (strcmp(name, "ftrace") == 0 || trace_array_is_readonly(tr))
nr_entries = 0;
else
nr_entries = ARRAY_SIZE(system_entries);
@@ -3134,28 +3137,30 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
int ret;
static struct eventfs_entry event_entries[] = {
{
- .name = "enable",
+ .name = "format",
.callback = event_callback,
- .release = event_release,
},
+#ifdef CONFIG_PERF_EVENTS
{
- .name = "filter",
+ .name = "id",
.callback = event_callback,
},
+#endif
+#define NR_RO_EVENT_ENTRIES (1 + IS_ENABLED(CONFIG_PERF_EVENTS))
+/* Readonly files must be above this line and counted by NR_RO_EVENT_ENTRIES. */
{
- .name = "trigger",
+ .name = "enable",
.callback = event_callback,
+ .release = event_release,
},
{
- .name = "format",
+ .name = "filter",
.callback = event_callback,
},
-#ifdef CONFIG_PERF_EVENTS
{
- .name = "id",
+ .name = "trigger",
.callback = event_callback,
},
-#endif
#ifdef CONFIG_HIST_TRIGGERS
{
.name = "hist",
@@ -3188,7 +3193,10 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
if (!e_events)
return -ENOMEM;
- nr_entries = ARRAY_SIZE(event_entries);
+ if (trace_array_is_readonly(tr))
+ nr_entries = NR_RO_EVENT_ENTRIES;
+ else
+ nr_entries = ARRAY_SIZE(event_entries);
name = trace_event_name(call);
ei = eventfs_create_dir(name, e_events, event_entries, nr_entries, file);
@@ -4527,31 +4535,44 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
int nr_entries;
static struct eventfs_entry events_entries[] = {
{
- .name = "enable",
+ .name = "header_page",
.callback = events_callback,
},
{
- .name = "header_page",
+ .name = "header_event",
.callback = events_callback,
},
+#define NR_RO_TOP_ENTRIES 2
+/* Readonly files must be above this line and counted by NR_RO_TOP_ENTRIES. */
{
- .name = "header_event",
+ .name = "enable",
.callback = events_callback,
},
};
- entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
- tr, &ftrace_set_event_fops);
- if (!entry)
- return -ENOMEM;
+ if (!trace_array_is_readonly(tr)) {
+ entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
+ tr, &ftrace_set_event_fops);
+ if (!entry)
+ return -ENOMEM;
- trace_create_file("show_event_filters", TRACE_MODE_READ, parent, tr,
- &ftrace_show_event_filters_fops);
+ /* There are not as crucial, just warn if they are not created */
+ trace_create_file("show_event_filters", TRACE_MODE_READ, parent, tr,
+ &ftrace_show_event_filters_fops);
- trace_create_file("show_event_triggers", TRACE_MODE_READ, parent, tr,
- &ftrace_show_event_triggers_fops);
+ trace_create_file("show_event_triggers", TRACE_MODE_READ, parent, tr,
+ &ftrace_show_event_triggers_fops);
- nr_entries = ARRAY_SIZE(events_entries);
+ trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent,
+ tr, &ftrace_set_event_pid_fops);
+
+ trace_create_file("set_event_notrace_pid",
+ TRACE_MODE_WRITE, parent, tr,
+ &ftrace_set_event_notrace_pid_fops);
+ nr_entries = ARRAY_SIZE(events_entries);
+ } else {
+ nr_entries = NR_RO_TOP_ENTRIES;
+ }
e_events = eventfs_create_events_dir("events", parent, events_entries,
nr_entries, tr);
@@ -4560,15 +4581,6 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
return -ENOMEM;
}
- /* There are not as crucial, just warn if they are not created */
-
- trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent,
- tr, &ftrace_set_event_pid_fops);
-
- trace_create_file("set_event_notrace_pid",
- TRACE_MODE_WRITE, parent, tr,
- &ftrace_set_event_notrace_pid_fops);
-
tr->event_dir = e_events;
return 0;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 5/6] tracing: Remove the backup instance automatically after read
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
` (3 preceding siblings ...)
2026-02-10 8:43 ` [PATCH v8 4/6] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google)
@ 2026-02-10 8:44 ` Masami Hiramatsu (Google)
2026-02-10 8:44 ` [PATCH v8 6/6] tracing/Documentation: Add a section about backup instance Masami Hiramatsu (Google)
5 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:44 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since the backup instance is readonly, after reading all data
via pipe, no data is left on the instance. Thus it can be
removed safely after closing all files.
This also removes it if user resets the ring buffer manually
via 'trace' file.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v6:
- Fix typo in comment.
- Only when there is a readonly trace array, initialize autoremove_wq.
- Fix to exit loop in trace_array_get() if tr is found in the list.
Changes in v4:
- Update description.
---
kernel/trace/trace.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++-
kernel/trace/trace.h | 6 +++++
2 files changed, 66 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 566d1e824360..c746cb4c6e38 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -578,6 +578,51 @@ void trace_set_ring_buffer_expanded(struct trace_array *tr)
tr->ring_buffer_expanded = true;
}
+static int __remove_instance(struct trace_array *tr);
+
+static void trace_array_autoremove(struct work_struct *work)
+{
+ struct trace_array *tr = container_of(work, struct trace_array, autoremove_work);
+
+ guard(mutex)(&event_mutex);
+ guard(mutex)(&trace_types_lock);
+
+ /*
+ * This can be fail if someone gets @tr before starting this
+ * function, but in that case, this will be kicked again when
+ * putting it. So we don't care about the result.
+ */
+ __remove_instance(tr);
+}
+
+static struct workqueue_struct *autoremove_wq;
+
+static void trace_array_kick_autoremove(struct trace_array *tr)
+{
+ if (autoremove_wq && !work_pending(&tr->autoremove_work))
+ queue_work(autoremove_wq, &tr->autoremove_work);
+}
+
+static void trace_array_cancel_autoremove(struct trace_array *tr)
+{
+ if (autoremove_wq && work_pending(&tr->autoremove_work))
+ cancel_work(&tr->autoremove_work);
+}
+
+static void trace_array_init_autoremove(struct trace_array *tr)
+{
+ INIT_WORK(&tr->autoremove_work, trace_array_autoremove);
+
+ /* Only readonly trace_array can kick the autoremove. */
+ if (!trace_array_is_readonly(tr) || autoremove_wq)
+ return;
+
+ autoremove_wq = alloc_workqueue("tr_autoremove_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+ if (!autoremove_wq)
+ pr_warn("Unable to allocate tr_autoremove_wq. autoremove disabled.\n");
+}
+
LIST_HEAD(ftrace_trace_arrays);
int trace_array_get(struct trace_array *this_tr)
@@ -587,7 +632,8 @@ int trace_array_get(struct trace_array *this_tr)
guard(mutex)(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (tr == this_tr) {
- tr->ref++;
+ if (!tr->free_on_close)
+ tr->ref++;
return 0;
}
}
@@ -599,6 +645,12 @@ static void __trace_array_put(struct trace_array *this_tr)
{
WARN_ON(!this_tr->ref);
this_tr->ref--;
+ /*
+ * When free_on_close is set, prepare removing the array
+ * when the last reference is released.
+ */
+ if (this_tr->ref == 1 && this_tr->free_on_close)
+ trace_array_kick_autoremove(this_tr);
}
/**
@@ -5463,6 +5515,10 @@ static void update_last_data(struct trace_array *tr)
/* Only if the buffer has previous boot data clear and update it. */
tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
+ /* If this is a backup instance, mark it for autoremove. */
+ if (tr->flags & TRACE_ARRAY_FL_VMALLOC)
+ tr->free_on_close = true;
+
/* Reset the module list and reload them */
if (tr->scratch) {
struct trace_scratch *tscratch = tr->scratch;
@@ -9596,6 +9652,8 @@ trace_array_create_systems(const char *name, const char *systems,
if (ftrace_allocate_ftrace_ops(tr) < 0)
goto out_free_tr;
+ trace_array_init_autoremove(tr);
+
ftrace_init_trace_array(tr);
init_trace_flags_index(tr);
@@ -9744,6 +9802,7 @@ static int __remove_instance(struct trace_array *tr)
if (update_marker_trace(tr, 0))
synchronize_rcu();
+ trace_array_cancel_autoremove(tr);
tracing_set_nop(tr);
clear_ftrace_function_probes(tr);
event_trace_del_tracer(tr);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 393be92768f1..48b94759ba1c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -450,6 +450,12 @@ struct trace_array {
* we do not waste memory on systems that are not using tracing.
*/
bool ring_buffer_expanded;
+ /*
+ * If the ring buffer is a read only backup instance, it will be
+ * removed after dumping all data via pipe, because no readable data.
+ */
+ bool free_on_close;
+ struct work_struct autoremove_work;
};
enum {
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 6/6] tracing/Documentation: Add a section about backup instance
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
` (4 preceding siblings ...)
2026-02-10 8:44 ` [PATCH v8 5/6] tracing: Remove the backup instance automatically after read Masami Hiramatsu (Google)
@ 2026-02-10 8:44 ` Masami Hiramatsu (Google)
5 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-10 8:44 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a section about backup instance to the debugging.rst.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v6:
- Fix typos.
---
Documentation/trace/debugging.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
index 4d88c346fc38..15857951b506 100644
--- a/Documentation/trace/debugging.rst
+++ b/Documentation/trace/debugging.rst
@@ -159,3 +159,22 @@ If setting it from the kernel command line, it is recommended to also
disable tracing with the "traceoff" flag, and enable tracing after boot up.
Otherwise the trace from the most recent boot will be mixed with the trace
from the previous boot, and may make it confusing to read.
+
+Using a backup instance for keeping previous boot data
+------------------------------------------------------
+
+It is also possible to record trace data at system boot time by specifying
+events with the persistent ring buffer, but in this case the data before the
+reboot will be lost before it can be read. This problem can be solved by a
+backup instance. From the kernel command line::
+
+ reserve_mem=12M:4096:trace trace_instance=boot_map@trace,sched,irq trace_instance=backup=boot_map
+
+On boot up, the previous data in the "boot_map" is copied to the "backup"
+instance, and the "sched:*" and "irq:*" events for the current boot are traced
+in the "boot_map". Thus the user can read the previous boot data from the "backup"
+instance without stopping the trace.
+
+Note that this "backup" instance is readonly, and will be removed automatically
+if you clear the trace data or read out all trace data from the "trace_pipe"
+or the "trace_pipe_raw" files.
\ No newline at end of file
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE
2026-02-10 8:43 ` [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE Masami Hiramatsu (Google)
@ 2026-02-11 15:46 ` Steven Rostedt
2026-02-12 6:15 ` Masami Hiramatsu
2026-03-24 16:43 ` Steven Rostedt
1 sibling, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2026-02-11 15:46 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Tue, 10 Feb 2026 17:43:51 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> Strictly checking the file read/write permission even if the owner has
> CAP_DAC_OVERRIDE on tracefs as same as sysfs.
> Tracefs is a pseudo filesystem, just like sysfs, so any file that the
> system defines as unwritable should actually be unwritable by anyone.
This is getting too complex and still doesn't work. As I said in my
other email, simply check for the trace_array being readonly on opens()
and return -EACCES if it is and was opened for write or read-write.
With this still not working this late in the game, it will need to wait
until the next merge window. I'll take the first two patches of this
series now though.
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE
2026-02-11 15:46 ` Steven Rostedt
@ 2026-02-12 6:15 ` Masami Hiramatsu
2026-03-05 17:33 ` Steven Rostedt
0 siblings, 1 reply; 11+ messages in thread
From: Masami Hiramatsu @ 2026-02-12 6:15 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Wed, 11 Feb 2026 10:46:23 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 10 Feb 2026 17:43:51 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
>
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >
> > Strictly checking the file read/write permission even if the owner has
> > CAP_DAC_OVERRIDE on tracefs as same as sysfs.
> > Tracefs is a pseudo filesystem, just like sysfs, so any file that the
> > system defines as unwritable should actually be unwritable by anyone.
>
> This is getting too complex and still doesn't work. As I said in my
> other email, simply check for the trace_array being readonly on opens()
> and return -EACCES if it is and was opened for write or read-write.
yeah, I understand I confused "permission" and "possibility".
>
> With this still not working this late in the game, it will need to wait
> until the next merge window. I'll take the first two patches of this
> series now though.
OK. I will send the next version without the first 2 patches.
Thank you,
>
> -- Steve
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE
2026-02-12 6:15 ` Masami Hiramatsu
@ 2026-03-05 17:33 ` Steven Rostedt
0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-03-05 17:33 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Thu, 12 Feb 2026 15:15:15 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > With this still not working this late in the game, it will need to wait
> > until the next merge window. I'll take the first two patches of this
> > series now though.
>
> OK. I will send the next version without the first 2 patches.
Hi Masami,
Did you send a new version of this yet?
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE
2026-02-10 8:43 ` [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE Masami Hiramatsu (Google)
2026-02-11 15:46 ` Steven Rostedt
@ 2026-03-24 16:43 ` Steven Rostedt
1 sibling, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-03-24 16:43 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Tue, 10 Feb 2026 17:43:51 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
Hi Masami,
Did you send a new version of this patch series yet? I don't see it.
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index d9d8932a7b9c..eb1ddc0cc13a 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -212,10 +212,40 @@ static void set_tracefs_inode_owner(struct inode *inode)
> inode->i_gid = gid;
> }
>
> -static int tracefs_permission(struct mnt_idmap *idmap,
> - struct inode *inode, int mask)
> +int tracefs_permission(struct mnt_idmap *idmap,
> + struct inode *inode, int mask)
> {
> - set_tracefs_inode_owner(inode);
> + struct tracefs_inode *ti = get_tracefs(inode);
> + const struct file_operations *fops;
> +
> + if (!(ti->flags & TRACEFS_EVENT_INODE))
> + set_tracefs_inode_owner(inode);
> +
> + /*
> + * Like sysfs, file permission checks are performed even for superuser
> + * with CAP_DAC_OVERRIDE. See the KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK
> + * definition in linux/kernfs.h.
> + */
> + if (mask & MAY_OPEN) {
> + fops = inode->i_fop;
> +
> + if (mask & MAY_WRITE) {
> + if (!(inode->i_mode & 0222))
> + return -EACCES;
> + if (!fops || (!fops->write && !fops->write_iter &&
> + !fops->mmap))
> + return -EACCES;
> + }
> +
> + if (mask & MAY_READ) {
> + if (!(inode->i_mode & 0444))
> + return -EACCES;
> + if (!fops || (!fops->read && !fops->read_iter &&
> + !fops->mmap && !fops->splice_read))
> + return -EACCES;
> + }
The above if block is way too coupled with the workings of fops and is very
fragile. Is it even needed? If there are no read or write functions,
wouldn't the vfs stop it anyway?
-- Steve
> + }
> +
> return generic_permission(idmap, inode, mask);
> }
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-03-24 16:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 8:43 [PATCH v8 0/6] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 1/6] tracing: Fix to set write permission to per-cpu buffer_size_kb Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 2/6] tracing: Reset last_boot_info if ring buffer is reset Masami Hiramatsu (Google)
2026-02-10 8:43 ` [PATCH v8 3/6] tracefs: Check file permission even if user has CAP_DAC_OVERRIDE Masami Hiramatsu (Google)
2026-02-11 15:46 ` Steven Rostedt
2026-02-12 6:15 ` Masami Hiramatsu
2026-03-05 17:33 ` Steven Rostedt
2026-03-24 16:43 ` Steven Rostedt
2026-02-10 8:43 ` [PATCH v8 4/6] tracing: Make the backup instance non-reusable Masami Hiramatsu (Google)
2026-02-10 8:44 ` [PATCH v8 5/6] tracing: Remove the backup instance automatically after read Masami Hiramatsu (Google)
2026-02-10 8:44 ` [PATCH v8 6/6] tracing/Documentation: Add a section about backup instance Masami Hiramatsu (Google)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox