* [PATCH v2 0/2] tracing: Remove backup instance after read all
@ 2026-01-08 14:22 Masami Hiramatsu (Google)
2026-01-08 14:23 ` [PATCH v2 1/2] tracing: Make the backup instance readonly Masami Hiramatsu (Google)
2026-01-08 14:23 ` [PATCH v2 2/2] tracing: Add autoremove feature to the backup instance Masami Hiramatsu (Google)
0 siblings, 2 replies; 7+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-01-08 14:22 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
Hi,
Here is the 2nd version of the series to improve backup instances of
the persistent ring buffer. The previous version is here:
https://lore.kernel.org/all/176779714767.4193242.1978666866487010024.stgit@mhiramat.tok.corp.google.com/
In this version, I updated [1/2] to use dedicated file operations
for read only instance instead of checking on each write function.
Also use dedicated entries files for eventfs to remove writable
control files from it. So it has only 'format' and 'id' files in
readonly backup instance.
Since backup instances are a kind of snapshot of the persistent
ring buffer, it should be readonly. And if it is readonly
there is no reason to keep it after reading all data via trace_pipe
because the data has been consumed.
Thus, [1/2] makes backup instances readonly (not able to write any
events, cleanup trace, change buffer size). Also, [2/2] removes the
backup instance after consuming all data via trace_pipe.
With this improvements, even if we makes a backup instance (using
the same amount of memory of the persistent ring buffer), it will
be removed after reading the data automatically.
---
Masami Hiramatsu (Google) (2):
tracing: Make the backup instance readonly
tracing: Add autoremove feature to the backup instance
kernel/trace/trace.c | 227 +++++++++++++++++++++++++++++++++++--------
kernel/trace/trace.h | 20 ++++
kernel/trace/trace_boot.c | 5 +
kernel/trace/trace_events.c | 75 ++++++++++----
4 files changed, 261 insertions(+), 66 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] tracing: Make the backup instance readonly
2026-01-08 14:22 [PATCH v2 0/2] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
@ 2026-01-08 14:23 ` Masami Hiramatsu (Google)
2026-01-13 0:31 ` Masami Hiramatsu
2026-01-08 14:23 ` [PATCH v2 2/2] tracing: Add autoremove feature to the backup instance Masami Hiramatsu (Google)
1 sibling, 1 reply; 7+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-01-08 14:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since there is no reason to reuse the backup instance, make it
readonly. Note that only backup instances are readonly, because
other trace instances will be empty unless it is writable.
Only backup instances have copy entries from the original.
With this change, most of the trace control files are removed
from the backup instance, including eventfs enable/filter etc.
# find /sys/kernel/tracing/instances/backup/events/ | wc -l
4093
# find /sys/kernel/tracing/instances/boot_map/events/ | wc -l
9573
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v2:
- Use readonly file_operations to prohibit writing instead of
checking flags in write() callbacks.
- Remove writable files from eventfs.
---
kernel/trace/trace.c | 163 ++++++++++++++++++++++++++++++++-----------
kernel/trace/trace.h | 14 +++-
kernel/trace/trace_boot.c | 5 +
kernel/trace/trace_events.c | 75 ++++++++++++++------
4 files changed, 192 insertions(+), 65 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 38f7a7a55c23..1b87595413fe 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4888,6 +4888,9 @@ static int tracing_open(struct inode *inode, struct file *file)
int cpu = tracing_get_cpu(inode);
struct array_buffer *trace_buf = &tr->array_buffer;
+ if (trace_array_is_readonly(tr))
+ return -EPERM;
+
#ifdef CONFIG_TRACER_MAX_TRACE
if (tr->current_trace->print_max)
trace_buf = &tr->max_buffer;
@@ -5055,6 +5058,15 @@ static const struct file_operations tracing_fops = {
.release = tracing_release,
};
+static const struct file_operations tracing_ro_fops = {
+ .open = tracing_open,
+ .read = seq_read,
+ .read_iter = seq_read_iter,
+ .splice_read = copy_splice_read,
+ .llseek = tracing_lseek,
+ .release = tracing_release,
+};
+
static const struct file_operations show_traces_fops = {
.open = show_traces_open,
.read = seq_read,
@@ -5162,6 +5174,13 @@ static const struct file_operations tracing_cpumask_fops = {
.llseek = generic_file_llseek,
};
+static const struct file_operations tracing_cpumask_ro_fops = {
+ .open = tracing_open_generic_tr,
+ .read = tracing_cpumask_read,
+ .release = tracing_release_generic_tr,
+ .llseek = generic_file_llseek,
+};
+
static int tracing_trace_options_show(struct seq_file *m, void *v)
{
struct tracer_opt *trace_opts;
@@ -8106,6 +8125,13 @@ static const struct file_operations set_tracer_fops = {
.release = tracing_release_generic_tr,
};
+static const struct file_operations set_tracer_ro_fops = {
+ .open = tracing_open_generic_tr,
+ .read = tracing_set_trace_read,
+ .llseek = generic_file_llseek,
+ .release = tracing_release_generic_tr,
+};
+
static const struct file_operations tracing_pipe_fops = {
.open = tracing_open_pipe,
.poll = tracing_poll_pipe,
@@ -8122,6 +8148,13 @@ static const struct file_operations tracing_entries_fops = {
.release = tracing_release_generic_tr,
};
+static const struct file_operations tracing_entries_ro_fops = {
+ .open = tracing_open_generic_tr,
+ .read = tracing_entries_read,
+ .llseek = generic_file_llseek,
+ .release = tracing_release_generic_tr,
+};
+
static const struct file_operations tracing_syscall_buf_fops = {
.open = tracing_open_generic_tr,
.read = tracing_syscall_buf_read,
@@ -8170,6 +8203,13 @@ static const struct file_operations trace_clock_fops = {
.write = tracing_clock_write,
};
+static const struct file_operations trace_clock_ro_fops = {
+ .open = tracing_clock_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = tracing_single_release_tr,
+};
+
static const struct file_operations trace_time_stamp_mode_fops = {
.open = tracing_time_stamp_mode_open,
.read = seq_read,
@@ -9353,12 +9393,16 @@ static void
tracing_init_tracefs_percpu(struct trace_array *tr, long cpu)
{
struct dentry *d_percpu = tracing_dentry_percpu(tr, cpu);
+ umode_t writable_mode = TRACE_MODE_WRITE;
struct dentry *d_cpu;
char cpu_dir[30]; /* 30 characters should be more than enough */
if (!d_percpu)
return;
+ if (trace_array_is_readonly(tr))
+ writable_mode = TRACE_MODE_READ;
+
snprintf(cpu_dir, 30, "cpu%ld", cpu);
d_cpu = tracefs_create_dir(cpu_dir, d_percpu);
if (!d_cpu) {
@@ -9371,7 +9415,7 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu)
tr, cpu, &tracing_pipe_fops);
/* per cpu trace */
- trace_create_cpu_file("trace", TRACE_MODE_WRITE, d_cpu,
+ trace_create_cpu_file("trace", writable_mode, d_cpu,
tr, cpu, &tracing_fops);
trace_create_cpu_file("trace_pipe_raw", TRACE_MODE_READ, d_cpu,
@@ -9566,21 +9610,31 @@ static const struct file_operations trace_options_core_fops = {
.llseek = generic_file_llseek,
};
-struct dentry *trace_create_file(const char *name,
- umode_t mode,
- struct dentry *parent,
- void *data,
- const struct file_operations *fops)
+struct dentry *__trace_create_file(const char *name,
+ umode_t mode,
+ struct dentry *parent,
+ void *data,
+ const struct file_operations *fops,
+ const struct file_operations *ro_fops)
{
+ bool readonly = !!(mode & TRACE_MODE_WRITE_MASK);
struct dentry *ret;
- ret = tracefs_create_file(name, mode, parent, data, fops);
+ ret = tracefs_create_file(name, mode, parent, data, readonly ? ro_fops : fops);
if (!ret)
pr_warn("Could not create tracefs '%s' entry\n", name);
return ret;
}
+struct dentry *trace_create_file(const char *name,
+ umode_t mode,
+ struct dentry *parent,
+ void *data,
+ const struct file_operations *fops)
+{
+ return __trace_create_file(name, mode, parent, data, fops, fops);
+}
static struct dentry *trace_options_init_dentry(struct trace_array *tr)
{
@@ -9811,6 +9865,9 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
unsigned long val;
int ret;
+ if (trace_array_is_readonly(tr))
+ return -EPERM;
+
ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
if (ret)
return ret;
@@ -9845,6 +9902,13 @@ static const struct file_operations rb_simple_fops = {
.llseek = default_llseek,
};
+static const struct file_operations rb_simple_ro_fops = {
+ .open = tracing_open_generic_tr,
+ .read = rb_simple_read,
+ .release = tracing_release_generic_tr,
+ .llseek = default_llseek,
+};
+
static ssize_t
buffer_percent_read(struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos)
@@ -9986,6 +10050,13 @@ static const struct file_operations buffer_subbuf_size_fops = {
.llseek = default_llseek,
};
+static const struct file_operations buffer_subbuf_size_ro_fops = {
+ .open = tracing_open_generic_tr,
+ .read = buffer_subbuf_size_read,
+ .release = tracing_release_generic_tr,
+ .llseek = default_llseek,
+};
+
static struct dentry *trace_instance_dir;
static void
@@ -10597,89 +10668,101 @@ static __init void create_trace_instances(struct dentry *d_tracer)
static void
init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
{
+ umode_t writable_mode = TRACE_MODE_WRITE;
+ bool readonly = trace_array_is_readonly(tr);
int cpu;
+ if (readonly)
+ writable_mode = TRACE_MODE_READ;
+
trace_create_file("available_tracers", TRACE_MODE_READ, d_tracer,
tr, &show_traces_fops);
- trace_create_file("current_tracer", TRACE_MODE_WRITE, d_tracer,
- tr, &set_tracer_fops);
+ __trace_create_file("current_tracer", writable_mode, d_tracer,
+ tr, &set_tracer_fops, &set_tracer_ro_fops);
- trace_create_file("tracing_cpumask", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_cpumask_fops);
+ __trace_create_file("tracing_cpumask", writable_mode, d_tracer,
+ tr, &tracing_cpumask_fops, &tracing_cpumask_ro_fops);
+ /* Options are used for changing print-format even for readonly instance. */
trace_create_file("trace_options", TRACE_MODE_WRITE, d_tracer,
tr, &tracing_iter_fops);
- trace_create_file("trace", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_fops);
+ __trace_create_file("trace", writable_mode, d_tracer,
+ tr, &tracing_fops, &tracing_ro_fops);
trace_create_file("trace_pipe", TRACE_MODE_READ, d_tracer,
tr, &tracing_pipe_fops);
- trace_create_file("buffer_size_kb", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_entries_fops);
+ __trace_create_file("buffer_size_kb", writable_mode, d_tracer,
+ tr, &tracing_entries_fops, &tracing_entries_ro_fops);
trace_create_file("buffer_total_size_kb", TRACE_MODE_READ, d_tracer,
tr, &tracing_total_entries_fops);
- trace_create_file("free_buffer", 0200, d_tracer,
- tr, &tracing_free_buffer_fops);
+ if (!readonly) {
+ trace_create_file("free_buffer", 0200, d_tracer,
+ tr, &tracing_free_buffer_fops);
+
+ trace_create_file("trace_marker", 0220, d_tracer,
+ tr, &tracing_mark_fops);
+
+ tr->trace_marker_file = __find_event_file(tr, "ftrace", "print");
- trace_create_file("trace_marker", 0220, d_tracer,
- tr, &tracing_mark_fops);
+ trace_create_file("trace_marker_raw", 0220, d_tracer,
+ tr, &tracing_mark_raw_fops);
- tr->trace_marker_file = __find_event_file(tr, "ftrace", "print");
+ trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
+ tr, &buffer_percent_fops);
- trace_create_file("trace_marker_raw", 0220, d_tracer,
- tr, &tracing_mark_raw_fops);
+ trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer,
+ tr, &tracing_syscall_buf_fops);
+ }
- trace_create_file("trace_clock", TRACE_MODE_WRITE, d_tracer, tr,
- &trace_clock_fops);
+ __trace_create_file("trace_clock", writable_mode, d_tracer, tr,
+ &trace_clock_fops, &trace_clock_ro_fops);
- trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer,
- tr, &rb_simple_fops);
+ __trace_create_file("tracing_on", writable_mode, d_tracer,
+ tr, &rb_simple_fops, &rb_simple_ro_fops);
trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr,
&trace_time_stamp_mode_fops);
tr->buffer_percent = 50;
- trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
- tr, &buffer_percent_fops);
-
- trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer,
- tr, &buffer_subbuf_size_fops);
-
- trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_syscall_buf_fops);
+ __trace_create_file("buffer_subbuf_size_kb", writable_mode, d_tracer,
+ tr, &buffer_subbuf_size_fops,
+ &buffer_subbuf_size_ro_fops);
create_trace_options_dir(tr);
#ifdef CONFIG_TRACER_MAX_TRACE
- trace_create_maxlat_file(tr, d_tracer);
+ if (!readonly)
+ trace_create_maxlat_file(tr, d_tracer);
#endif
- if (ftrace_create_function_files(tr, d_tracer))
+ if (!readonly && ftrace_create_function_files(tr, d_tracer))
MEM_FAIL(1, "Could not allocate function filter files");
if (tr->range_addr_start) {
trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer,
tr, &last_boot_fops);
#ifdef CONFIG_TRACER_SNAPSHOT
- } else {
+ } else if (!readonly) {
trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer,
tr, &snapshot_fops);
#endif
}
- trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer,
- tr, &tracing_err_log_fops);
+ if (!readonly)
+ trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer,
+ tr, &tracing_err_log_fops);
for_each_tracing_cpu(cpu)
tracing_init_tracefs_percpu(tr, cpu);
- ftrace_init_tracefs(tr, d_tracer);
+ if (!readonly)
+ ftrace_init_tracefs(tr, d_tracer);
}
#ifdef CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b6d42fe06115..4fae5cf1182c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -33,6 +33,7 @@
#define TRACE_MODE_WRITE 0640
#define TRACE_MODE_READ 0440
+#define TRACE_MODE_WRITE_MASK (TRACE_MODE_WRITE & ~TRACE_MODE_READ)
enum trace_type {
__TRACE_FIRST_TYPE = 0,
@@ -483,6 +484,12 @@ extern bool trace_clock_in_ns(struct trace_array *tr);
extern unsigned long trace_adjust_address(struct trace_array *tr, unsigned long addr);
+static inline bool trace_array_is_readonly(struct trace_array *tr)
+{
+ /* backup instance is read only. */
+ return tr->flags & TRACE_ARRAY_FL_VMALLOC;
+}
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
@@ -680,7 +687,12 @@ struct dentry *trace_create_file(const char *name,
struct dentry *parent,
void *data,
const struct file_operations *fops);
-
+struct dentry *__trace_create_file(const char *name,
+ umode_t mode,
+ struct dentry *parent,
+ void *data,
+ const struct file_operations *fops,
+ const struct file_operations *ro_fops);
/**
* tracer_tracing_is_on_cpu - show real state of ring buffer enabled on for a cpu
diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index dbe29b4c6a7a..2ca2541c8a58 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -61,7 +61,8 @@ trace_boot_set_instance_options(struct trace_array *tr, struct xbc_node *node)
v = memparse(p, NULL);
if (v < PAGE_SIZE)
pr_err("Buffer size is too small: %s\n", p);
- if (tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
+ if (trace_array_is_readonly(tr) ||
+ tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
pr_err("Failed to resize trace buffer to %s\n", p);
}
@@ -597,7 +598,7 @@ trace_boot_enable_tracer(struct trace_array *tr, struct xbc_node *node)
p = xbc_node_find_value(node, "tracer", NULL);
if (p && *p != '\0') {
- if (tracing_set_tracer(tr, p) < 0)
+ if (trace_array_is_readonly(tr) || tracing_set_tracer(tr, p) < 0)
pr_err("Failed to set given tracer: %s\n", p);
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9b07ad9eb284..741b16b54d90 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1379,6 +1379,9 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match,
{
int ret;
+ if (trace_array_is_readonly(tr))
+ return -EPERM;
+
mutex_lock(&event_mutex);
ret = __ftrace_set_clr_event_nolock(tr, match, sub, event, set, mod);
mutex_unlock(&event_mutex);
@@ -2817,8 +2820,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
} else
__get_system(system);
- /* ftrace only has directories no files */
- if (strcmp(name, "ftrace") == 0)
+ /* ftrace only has directories no files, readonly instance too. */
+ if (strcmp(name, "ftrace") == 0 || trace_array_is_readonly(tr))
nr_entries = 0;
else
nr_entries = ARRAY_SIZE(system_entries);
@@ -2979,7 +2982,6 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
struct eventfs_inode *e_events;
struct eventfs_inode *ei;
const char *name;
- int nr_entries;
int ret;
static struct eventfs_entry event_entries[] = {
{
@@ -3024,6 +3026,18 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
},
#endif
};
+ static struct eventfs_entry event_ro_entries[] = {
+ {
+ .name = "format",
+ .callback = event_callback,
+ },
+#ifdef CONFIG_PERF_EVENTS
+ {
+ .name = "id",
+ .callback = event_callback,
+ },
+#endif
+ };
/*
* If the trace point header did not define TRACE_SYSTEM
@@ -3037,10 +3051,14 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
if (!e_events)
return -ENOMEM;
- nr_entries = ARRAY_SIZE(event_entries);
-
name = trace_event_name(call);
- ei = eventfs_create_dir(name, e_events, event_entries, nr_entries, file);
+
+ if (trace_array_is_readonly(tr))
+ ei = eventfs_create_dir(name, e_events, event_ro_entries,
+ ARRAY_SIZE(event_ro_entries), file);
+ else
+ ei = eventfs_create_dir(name, e_events, event_entries,
+ ARRAY_SIZE(event_entries), file);
if (IS_ERR(ei)) {
pr_warn("Could not create tracefs '%s' directory\n", name);
return -1;
@@ -4378,7 +4396,6 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
{
struct eventfs_inode *e_events;
struct dentry *entry;
- int nr_entries;
static struct eventfs_entry events_entries[] = {
{
.name = "enable",
@@ -4393,30 +4410,44 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
.callback = events_callback,
},
};
+ static struct eventfs_entry events_ro_entries[] = {
+ {
+ .name = "header_page",
+ .callback = events_callback,
+ },
+ {
+ .name = "header_event",
+ .callback = events_callback,
+ },
+ };
- entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
- tr, &ftrace_set_event_fops);
- if (!entry)
- return -ENOMEM;
-
- nr_entries = ARRAY_SIZE(events_entries);
-
- e_events = eventfs_create_events_dir("events", parent, events_entries,
- nr_entries, tr);
+ if (trace_array_is_readonly(tr))
+ e_events = eventfs_create_events_dir("events", parent, events_ro_entries,
+ ARRAY_SIZE(events_ro_entries), tr);
+ else
+ e_events = eventfs_create_events_dir("events", parent, events_entries,
+ ARRAY_SIZE(events_entries), tr);
if (IS_ERR(e_events)) {
pr_warn("Could not create tracefs 'events' directory\n");
return -ENOMEM;
}
- /* There are not as crucial, just warn if they are not created */
+ if (!trace_array_is_readonly(tr)) {
+
+ entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
+ tr, &ftrace_set_event_fops);
+ if (!entry)
+ return -ENOMEM;
- trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent,
- tr, &ftrace_set_event_pid_fops);
+ /* There are not as crucial, just warn if they are not created */
- trace_create_file("set_event_notrace_pid",
- TRACE_MODE_WRITE, parent, tr,
- &ftrace_set_event_notrace_pid_fops);
+ trace_create_file("set_event_pid", TRACE_MODE_WRITE, parent,
+ tr, &ftrace_set_event_pid_fops);
+ trace_create_file("set_event_notrace_pid",
+ TRACE_MODE_WRITE, parent, tr,
+ &ftrace_set_event_notrace_pid_fops);
+ }
tr->event_dir = e_events;
return 0;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] tracing: Add autoremove feature to the backup instance
2026-01-08 14:22 [PATCH v2 0/2] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
2026-01-08 14:23 ` [PATCH v2 1/2] tracing: Make the backup instance readonly Masami Hiramatsu (Google)
@ 2026-01-08 14:23 ` Masami Hiramatsu (Google)
1 sibling, 0 replies; 7+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-01-08 14:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since the backup instance is readonly, after reading all data
via pipe, no data is left on the instance. Thus it can be
removed safely after closing all files.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/trace.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++-
kernel/trace/trace.h | 6 +++++
2 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1b87595413fe..c0b05a560203 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -590,6 +590,55 @@ void trace_set_ring_buffer_expanded(struct trace_array *tr)
tr->ring_buffer_expanded = true;
}
+static int __remove_instance(struct trace_array *tr);
+
+static void trace_array_autoremove(struct work_struct *work)
+{
+ struct trace_array *tr = container_of(work, struct trace_array, autoremove_work);
+
+ guard(mutex)(&event_mutex);
+ guard(mutex)(&trace_types_lock);
+
+ /*
+ * This can be fail if someone gets @tr before starting this
+ * function, but in that case, this will be kicked again when
+ * putting it. So we don't care the result.
+ */
+ __remove_instance(tr);
+}
+
+static struct workqueue_struct *autoremove_wq;
+
+static void trace_array_init_autoremove(struct trace_array *tr)
+{
+ INIT_WORK(&tr->autoremove_work, trace_array_autoremove);
+}
+
+static void trace_array_kick_autoremove(struct trace_array *tr)
+{
+ if (!work_pending(&tr->autoremove_work) && autoremove_wq)
+ queue_work(autoremove_wq, &tr->autoremove_work);
+}
+
+static void trace_array_cancel_autoremove(struct trace_array *tr)
+{
+ if (work_pending(&tr->autoremove_work))
+ cancel_work(&tr->autoremove_work);
+}
+
+__init static int trace_array_init_autoremove_wq(void)
+{
+ autoremove_wq = alloc_workqueue("tr_autoremove_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+ if (!autoremove_wq) {
+ pr_err("Unable to allocate tr_autoremove_wq\n");
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+late_initcall_sync(trace_array_init_autoremove_wq);
+
LIST_HEAD(ftrace_trace_arrays);
int trace_array_get(struct trace_array *this_tr)
@@ -598,7 +647,7 @@ int trace_array_get(struct trace_array *this_tr)
guard(mutex)(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
- if (tr == this_tr) {
+ if (tr == this_tr && !tr->free_on_close) {
tr->ref++;
return 0;
}
@@ -611,6 +660,12 @@ static void __trace_array_put(struct trace_array *this_tr)
{
WARN_ON(!this_tr->ref);
this_tr->ref--;
+ /*
+ * When free_on_close is set, prepare removing the array
+ * when the last reference is released.
+ */
+ if (this_tr->ref == 1 && this_tr->free_on_close)
+ trace_array_kick_autoremove(this_tr);
}
/**
@@ -6225,6 +6280,10 @@ static void update_last_data(struct trace_array *tr)
/* Only if the buffer has previous boot data clear and update it. */
tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
+ /* If this is a backup instance, mark it for autoremove. */
+ if (tr->flags & TRACE_ARRAY_FL_VMALLOC)
+ tr->free_on_close = true;
+
/* Reset the module list and reload them */
if (tr->scratch) {
struct trace_scratch *tscratch = tr->scratch;
@@ -10428,6 +10487,8 @@ trace_array_create_systems(const char *name, const char *systems,
if (ftrace_allocate_ftrace_ops(tr) < 0)
goto out_free_tr;
+ trace_array_init_autoremove(tr);
+
ftrace_init_trace_array(tr);
init_trace_flags_index(tr);
@@ -10576,6 +10637,7 @@ static int __remove_instance(struct trace_array *tr)
if (update_marker_trace(tr, 0))
synchronize_rcu();
+ trace_array_cancel_autoremove(tr);
tracing_set_nop(tr);
clear_ftrace_function_probes(tr);
event_trace_del_tracer(tr);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4fae5cf1182c..b6b56e790b13 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -447,6 +447,12 @@ struct trace_array {
* we do not waste memory on systems that are not using tracing.
*/
bool ring_buffer_expanded;
+ /*
+ * If the ring buffer is a read only backup instance, it will be
+ * removed after dumping all data via pipe, because no readable data.
+ */
+ bool free_on_close;
+ struct work_struct autoremove_work;
};
enum {
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] tracing: Make the backup instance readonly
2026-01-08 14:23 ` [PATCH v2 1/2] tracing: Make the backup instance readonly Masami Hiramatsu (Google)
@ 2026-01-13 0:31 ` Masami Hiramatsu
2026-01-13 0:45 ` Steven Rostedt
0 siblings, 1 reply; 7+ messages in thread
From: Masami Hiramatsu @ 2026-01-13 0:31 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
On Thu, 8 Jan 2026 23:23:03 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 9b07ad9eb284..741b16b54d90 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1379,6 +1379,9 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match,
> {
> int ret;
>
> + if (trace_array_is_readonly(tr))
> + return -EPERM;
> +
> mutex_lock(&event_mutex);
> ret = __ftrace_set_clr_event_nolock(tr, match, sub, event, set, mod);
> mutex_unlock(&event_mutex);
> @@ -2817,8 +2820,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
> } else
> __get_system(system);
>
> - /* ftrace only has directories no files */
> - if (strcmp(name, "ftrace") == 0)
> + /* ftrace only has directories no files, readonly instance too. */
> + if (strcmp(name, "ftrace") == 0 || trace_array_is_readonly(tr))
> nr_entries = 0;
> else
> nr_entries = ARRAY_SIZE(system_entries);
> @@ -2979,7 +2982,6 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
> struct eventfs_inode *e_events;
> struct eventfs_inode *ei;
> const char *name;
> - int nr_entries;
> int ret;
> static struct eventfs_entry event_entries[] = {
> {
> @@ -3024,6 +3026,18 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
> },
> #endif
> };
> + static struct eventfs_entry event_ro_entries[] = {
> + {
> + .name = "format",
> + .callback = event_callback,
> + },
> +#ifdef CONFIG_PERF_EVENTS
> + {
> + .name = "id",
> + .callback = event_callback,
> + },
> +#endif
> + };
Thinking about this hack again, it would be easier to maintain if I add
a readonly flag to each eventfs_entry and specify readonly when creating
the eventfs top directory.
Let me update it.
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] tracing: Make the backup instance readonly
2026-01-13 0:31 ` Masami Hiramatsu
@ 2026-01-13 0:45 ` Steven Rostedt
2026-01-13 0:47 ` Steven Rostedt
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2026-01-13 0:45 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Tue, 13 Jan 2026 09:31:00 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > + static struct eventfs_entry event_ro_entries[] = {
> > + {
> > + .name = "format",
> > + .callback = event_callback,
> > + },
> > +#ifdef CONFIG_PERF_EVENTS
> > + {
> > + .name = "id",
> > + .callback = event_callback,
> > + },
> > +#endif
> > + };
>
> Thinking about this hack again, it would be easier to maintain if I add
> a readonly flag to each eventfs_entry and specify readonly when creating
> the eventfs top directory.
>
> Let me update it.
Let's not add any flags to the eventfs_entry. That is created for every
event, and events are already too big in size. I don't want to increase the
size of each event for this feature.
-- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] tracing: Make the backup instance readonly
2026-01-13 0:45 ` Steven Rostedt
@ 2026-01-13 0:47 ` Steven Rostedt
2026-01-13 6:04 ` Masami Hiramatsu
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2026-01-13 0:47 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Mon, 12 Jan 2026 19:45:51 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> Let's not add any flags to the eventfs_entry. That is created for every
> event, and events are already too big in size. I don't want to increase the
> size of each event for this feature.
Actually, there's not many of these. But they are all static variables. How
are you going to differentiate them?
-- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] tracing: Make the backup instance readonly
2026-01-13 0:47 ` Steven Rostedt
@ 2026-01-13 6:04 ` Masami Hiramatsu
0 siblings, 0 replies; 7+ messages in thread
From: Masami Hiramatsu @ 2026-01-13 6:04 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
On Mon, 12 Jan 2026 19:47:22 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 12 Jan 2026 19:45:51 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Let's not add any flags to the eventfs_entry. That is created for every
> > event, and events are already too big in size. I don't want to increase the
> > size of each event for this feature.
>
> Actually, there's not many of these. But they are all static variables. How
> are you going to differentiate them?
Since the format/id files are always readonly, so we can put a mode flag on
each entry.
Anyway, I will just share the readonly entries in the same array, but
put them at the beginning of it.
Let me update it to v3.
Thanks,
>
> -- Steve
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-13 6:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-08 14:22 [PATCH v2 0/2] tracing: Remove backup instance after read all Masami Hiramatsu (Google)
2026-01-08 14:23 ` [PATCH v2 1/2] tracing: Make the backup instance readonly Masami Hiramatsu (Google)
2026-01-13 0:31 ` Masami Hiramatsu
2026-01-13 0:45 ` Steven Rostedt
2026-01-13 0:47 ` Steven Rostedt
2026-01-13 6:04 ` Masami Hiramatsu
2026-01-08 14:23 ` [PATCH v2 2/2] tracing: Add autoremove feature to the backup instance Masami Hiramatsu (Google)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox