From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [for-next][PATCH 31/31] tracing: Allow the top level trace_marker to write into another instances
Date: Fri, 09 May 2025 09:13:20 -0400 [thread overview]
Message-ID: <20250509131318.712657970@goodmis.org> (raw)
In-Reply-To: 20250509131249.340302366@goodmis.org
From: Steven Rostedt <rostedt@goodmis.org>
There are applications that have it hard coded to write into the top level
trace_marker instance (/sys/kernel/tracing/trace_marker). This can be
annoying if a profiler is using that instance for other work, or if it
needs all writes to go into a new instance.
A new option is created called "copy_trace_marker". By default, the top
level has this set, as that is the default buffer that writing into the
top level trace_marker file will go to. But now if an instance is created
and sets this option, all writes into the top level trace_marker will also
be written into that instance buffer just as if an application were to
write into the instance's trace_marker file.
If the top level instance disables this option, then writes to its own
trace_marker and trace_marker_raw files will not go into its buffer.
If no instance has this option set, then the write will return an error
and errno will contain ENODEV.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250508095639.39f84eda@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Documentation/trace/ftrace.rst | 13 +++
kernel/trace/trace.c | 144 ++++++++++++++++++++++++++-------
kernel/trace/trace.h | 2 +
3 files changed, 128 insertions(+), 31 deletions(-)
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index c9e88bf65709..af66a05e18cc 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1205,6 +1205,19 @@ Here are the available options:
default instance. The only way the top level instance has this flag
cleared, is by it being set in another instance.
+ copy_trace_marker
+ If there are applications that hard code writing into the top level
+ trace_marker file (/sys/kernel/tracing/trace_marker or trace_marker_raw),
+ and the tooling would like it to go into an instance, this option can
+ be used. Create an instance and set this option, and then all writes
+ into the top level trace_marker file will also be redirected into this
+ instance.
+
+ Note, by default this option is set for the top level instance. If it
+ is disabled, then writes to the trace_marker or trace_marker_raw files
+ will not be written into the top level file. If no instance has this
+ option set, then a write will error with the errno of ENODEV.
+
annotate
It is sometimes confusing when the CPU buffers are full
and one CPU buffer had a lot of events recently, thus
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0cd681516438..cf51c30b137f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -493,7 +493,8 @@ EXPORT_SYMBOL_GPL(unregister_ftrace_export);
TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | \
TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE | \
TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS | \
- TRACE_ITER_HASH_PTR | TRACE_ITER_TRACE_PRINTK)
+ TRACE_ITER_HASH_PTR | TRACE_ITER_TRACE_PRINTK | \
+ TRACE_ITER_COPY_MARKER)
/* trace_options that are only supported by global_trace */
#define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \
@@ -501,7 +502,8 @@ EXPORT_SYMBOL_GPL(unregister_ftrace_export);
/* trace_flags that are default zero for instances */
#define ZEROED_TRACE_FLAGS \
- (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK | TRACE_ITER_TRACE_PRINTK)
+ (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK | TRACE_ITER_TRACE_PRINTK | \
+ TRACE_ITER_COPY_MARKER)
/*
* The global_trace is the descriptor that holds the top-level tracing
@@ -513,6 +515,9 @@ static struct trace_array global_trace = {
static struct trace_array *printk_trace = &global_trace;
+/* List of trace_arrays interested in the top level trace_marker */
+static LIST_HEAD(marker_copies);
+
static __always_inline bool printk_binsafe(struct trace_array *tr)
{
/*
@@ -534,6 +539,28 @@ static void update_printk_trace(struct trace_array *tr)
tr->trace_flags |= TRACE_ITER_TRACE_PRINTK;
}
+/* Returns true if the status of tr changed */
+static bool update_marker_trace(struct trace_array *tr, int enabled)
+{
+ lockdep_assert_held(&event_mutex);
+
+ if (enabled) {
+ if (!list_empty(&tr->marker_list))
+ return false;
+
+ list_add_rcu(&tr->marker_list, &marker_copies);
+ tr->trace_flags |= TRACE_ITER_COPY_MARKER;
+ return true;
+ }
+
+ if (list_empty(&tr->marker_list))
+ return false;
+
+ list_del_init(&tr->marker_list);
+ tr->trace_flags &= ~TRACE_ITER_COPY_MARKER;
+ return true;
+}
+
void trace_set_ring_buffer_expanded(struct trace_array *tr)
{
if (!tr)
@@ -5220,7 +5247,8 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
{
if ((mask == TRACE_ITER_RECORD_TGID) ||
(mask == TRACE_ITER_RECORD_CMD) ||
- (mask == TRACE_ITER_TRACE_PRINTK))
+ (mask == TRACE_ITER_TRACE_PRINTK) ||
+ (mask == TRACE_ITER_COPY_MARKER))
lockdep_assert_held(&event_mutex);
/* do nothing if flag is already set */
@@ -5251,6 +5279,9 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
}
}
+ if (mask == TRACE_ITER_COPY_MARKER)
+ update_marker_trace(tr, enabled);
+
if (enabled)
tr->trace_flags |= mask;
else
@@ -7134,11 +7165,9 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
#define TRACE_MARKER_MAX_SIZE 4096
-static ssize_t
-tracing_mark_write(struct file *filp, const char __user *ubuf,
- size_t cnt, loff_t *fpos)
+static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user *ubuf,
+ size_t cnt, unsigned long ip)
{
- struct trace_array *tr = filp->private_data;
struct ring_buffer_event *event;
enum event_trigger_type tt = ETT_NONE;
struct trace_buffer *buffer;
@@ -7152,18 +7181,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
#define FAULTED_STR "<faulted>"
#define FAULTED_SIZE (sizeof(FAULTED_STR) - 1) /* '\0' is already accounted for */
- if (tracing_disabled)
- return -EINVAL;
-
- if (!(tr->trace_flags & TRACE_ITER_MARKERS))
- return -EINVAL;
-
- if ((ssize_t)cnt < 0)
- return -EINVAL;
-
- if (cnt > TRACE_MARKER_MAX_SIZE)
- cnt = TRACE_MARKER_MAX_SIZE;
-
meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */
again:
size = cnt + meta_size;
@@ -7196,7 +7213,7 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
}
entry = ring_buffer_event_data(event);
- entry->ip = _THIS_IP_;
+ entry->ip = ip;
len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt);
if (len) {
@@ -7229,18 +7246,12 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
}
static ssize_t
-tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
+tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
{
struct trace_array *tr = filp->private_data;
- struct ring_buffer_event *event;
- struct trace_buffer *buffer;
- struct raw_data_entry *entry;
- ssize_t written;
- int size;
- int len;
-
-#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+ ssize_t written = -ENODEV;
+ unsigned long ip;
if (tracing_disabled)
return -EINVAL;
@@ -7248,10 +7259,42 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
if (!(tr->trace_flags & TRACE_ITER_MARKERS))
return -EINVAL;
- /* The marker must at least have a tag id */
- if (cnt < sizeof(unsigned int))
+ if ((ssize_t)cnt < 0)
return -EINVAL;
+ if (cnt > TRACE_MARKER_MAX_SIZE)
+ cnt = TRACE_MARKER_MAX_SIZE;
+
+ /* The selftests expect this function to be the IP address */
+ ip = _THIS_IP_;
+
+ /* The global trace_marker can go to multiple instances */
+ if (tr == &global_trace) {
+ guard(rcu)();
+ list_for_each_entry_rcu(tr, &marker_copies, marker_list) {
+ written = write_marker_to_buffer(tr, ubuf, cnt, ip);
+ if (written < 0)
+ break;
+ }
+ } else {
+ written = write_marker_to_buffer(tr, ubuf, cnt, ip);
+ }
+
+ return written;
+}
+
+static ssize_t write_raw_marker_to_buffer(struct trace_array *tr,
+ const char __user *ubuf, size_t cnt)
+{
+ struct ring_buffer_event *event;
+ struct trace_buffer *buffer;
+ struct raw_data_entry *entry;
+ ssize_t written;
+ int size;
+ int len;
+
+#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+
size = sizeof(*entry) + cnt;
if (cnt < FAULT_SIZE_ID)
size += FAULT_SIZE_ID - cnt;
@@ -7282,6 +7325,40 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
return written;
}
+static ssize_t
+tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *fpos)
+{
+ struct trace_array *tr = filp->private_data;
+ ssize_t written = -ENODEV;
+
+#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+
+ if (tracing_disabled)
+ return -EINVAL;
+
+ if (!(tr->trace_flags & TRACE_ITER_MARKERS))
+ return -EINVAL;
+
+ /* The marker must at least have a tag id */
+ if (cnt < sizeof(unsigned int))
+ return -EINVAL;
+
+ /* The global trace_marker_raw can go to multiple instances */
+ if (tr == &global_trace) {
+ guard(rcu)();
+ list_for_each_entry_rcu(tr, &marker_copies, marker_list) {
+ written = write_raw_marker_to_buffer(tr, ubuf, cnt);
+ if (written < 0)
+ break;
+ }
+ } else {
+ written = write_raw_marker_to_buffer(tr, ubuf, cnt);
+ }
+
+ return written;
+}
+
static int tracing_clock_show(struct seq_file *m, void *v)
{
struct trace_array *tr = m->private;
@@ -9775,6 +9852,7 @@ trace_array_create_systems(const char *name, const char *systems,
INIT_LIST_HEAD(&tr->events);
INIT_LIST_HEAD(&tr->hist_vars);
INIT_LIST_HEAD(&tr->err_log);
+ INIT_LIST_HEAD(&tr->marker_list);
#ifdef CONFIG_MODULES
INIT_LIST_HEAD(&tr->mod_events);
@@ -9934,6 +10012,9 @@ static int __remove_instance(struct trace_array *tr)
if (printk_trace == tr)
update_printk_trace(&global_trace);
+ if (update_marker_trace(tr, 0))
+ synchronize_rcu();
+
tracing_set_nop(tr);
clear_ftrace_function_probes(tr);
event_trace_del_tracer(tr);
@@ -10999,6 +11080,7 @@ __init static int tracer_alloc_buffers(void)
INIT_LIST_HEAD(&global_trace.events);
INIT_LIST_HEAD(&global_trace.hist_vars);
INIT_LIST_HEAD(&global_trace.err_log);
+ list_add(&global_trace.marker_list, &marker_copies);
list_add(&global_trace.list, &ftrace_trace_arrays);
apply_trace_boot_options();
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 86e9d7dcddba..bd084953a98b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -403,6 +403,7 @@ struct trace_array {
struct trace_options *topts;
struct list_head systems;
struct list_head events;
+ struct list_head marker_list;
struct trace_event_file *trace_marker_file;
cpumask_var_t tracing_cpumask; /* only trace on set CPUs */
/* one per_cpu trace_pipe can be opened by only one user */
@@ -1384,6 +1385,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
C(MARKERS, "markers"), \
C(EVENT_FORK, "event-fork"), \
C(TRACE_PRINTK, "trace_printk_dest"), \
+ C(COPY_MARKER, "copy_trace_marker"),\
C(PAUSE_ON_TRACE, "pause-on-trace"), \
C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \
FUNCTION_FLAGS \
--
2.47.2
prev parent reply other threads:[~2025-05-09 13:13 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 13:12 [for-next][PATCH 00/31] tracing: Updates for v6.16 Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 01/31] tracing: Update function trace addresses with module addresses Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 02/31] tracing: Show function names when possible when listing fields Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 03/31] tracing: Only return an adjusted address if it matches the kernel address Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 04/31] tracing: Adjust addresses for printing out fields Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 05/31] tracing: Show preempt and irq events callsites from the offsets in field print Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 06/31] tracing: Always use memcpy() in histogram add_to_key() Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 07/31] tracing: Move histogram trigger variables from stack to per CPU structure Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 08/31] tracing: Add common_comm to histograms Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 09/31] ftrace: Show subops in enabled_functions Steven Rostedt
2025-05-09 13:12 ` [for-next][PATCH 10/31] ftrace: Expose call graph depth as unsigned int Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 11/31] ftrace: Comment that ftrace_func_mapper is freed with free_ftrace_hash() Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 12/31] tracing/osnoise: Allow arbitrarily long CPU string Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 13/31] tracing/mmiotrace: Remove reference to unused per CPU data pointer Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 14/31] ftrace: Do not bother checking per CPU "disabled" flag Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 15/31] tracing: Just use this_cpu_read() to access ignore_pid Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 16/31] tracing: Add tracer_tracing_disable/enable() functions Steven Rostedt
2025-05-09 15:49 ` Doug Anderson
2025-05-09 13:13 ` [for-next][PATCH 17/31] tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one() Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 18/31] tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled Steven Rostedt
2025-05-09 15:49 ` Doug Anderson
2025-05-09 15:57 ` Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 19/31] ftrace: Do not disabled function graph based on "disabled" field Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 20/31] tracing: Do not use per CPU array_buffer.data->disabled for cpumask Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 21/31] ring-buffer: Add ring_buffer_record_is_on_cpu() Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 22/31] tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 23/31] tracing: Convert the per CPU "disabled" counter to local from atomic Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 24/31] tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 25/31] tracing: Remove unused buffer_page field from trace_array_cpu structure Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 26/31] tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 27/31] tracing: Rename event_trigger_alloc() to trigger_data_alloc() Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 28/31] tracing: Fix error handling in event_trigger_parse() Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 29/31] tracing: Remove unnecessary "goto out" that simply returns ret is trigger code Steven Rostedt
2025-05-09 13:13 ` [for-next][PATCH 30/31] tracing: Add a helper function to handle the dereference arg in verifier Steven Rostedt
2025-05-09 13:13 ` Steven Rostedt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250509131318.712657970@goodmis.org \
--to=rostedt@goodmis.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.