* [for-next][PATCH 00/11] tracing: Updates for v6.9
@ 2024-02-21 14:07 Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 01/11] eventfs: Add WARN_ON_ONCE() to checks in eventfs_root_lookup() Steven Rostedt
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:07 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
Steven Rostedt (Google) (5):
eventfs: Add WARN_ON_ONCE() to checks in eventfs_root_lookup()
eventfs: Create eventfs_root_inode to store dentry
tracing: Have saved_cmdlines arrays all in one allocation
tracing: Move open coded processing of tgid_map into helper function
tracing: Move saved_cmdline code into trace_sched_switch.c
Vincent Donnefort (6):
ring-buffer: Zero ring-buffer sub-buffers
ring-buffer: Introducing ring-buffer mapping functions
tracing: Add snapshot refcount
tracing: Allow user-space mapping of the ring-buffer
Documentation: tracing: Add ring-buffer mapping
ring-buffer/selftest: Add ring-buffer mapping test
----
Documentation/trace/index.rst | 1 +
Documentation/trace/ring-buffer-map.rst | 106 ++++
fs/tracefs/event_inode.c | 70 ++-
fs/tracefs/internal.h | 2 -
include/linux/ring_buffer.h | 7 +
include/uapi/linux/trace_mmap.h | 48 ++
kernel/trace/ring_buffer.c | 385 ++++++++++++-
kernel/trace/trace.c | 743 +++++++------------------
kernel/trace/trace.h | 19 +-
kernel/trace/trace_events_trigger.c | 58 +-
kernel/trace/trace_sched_switch.c | 515 +++++++++++++++++
tools/testing/selftests/ring-buffer/Makefile | 8 +
tools/testing/selftests/ring-buffer/config | 2 +
tools/testing/selftests/ring-buffer/map_test.c | 273 +++++++++
14 files changed, 1671 insertions(+), 566 deletions(-)
create mode 100644 Documentation/trace/ring-buffer-map.rst
create mode 100644 include/uapi/linux/trace_mmap.h
create mode 100644 tools/testing/selftests/ring-buffer/Makefile
create mode 100644 tools/testing/selftests/ring-buffer/config
create mode 100644 tools/testing/selftests/ring-buffer/map_test.c
^ permalink raw reply [flat|nested] 12+ messages in thread
* [for-next][PATCH 01/11] eventfs: Add WARN_ON_ONCE() to checks in eventfs_root_lookup()
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
@ 2024-02-21 14:07 ` Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 02/11] eventfs: Create eventfs_root_inode to store dentry Steven Rostedt
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:07 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Al Viro, Christian Brauner, Ajay Kaher, Al Viro
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
There's a couple of if statements in eventfs_root_lookup() that should
never be true. Instead of removing them, add WARN_ON_ONCE() around them.
One is a tracefs_inode not being for eventfs.
The other is a child being freed but still on the parent's children
list. When a child is freed, it is removed from the list under the
same mutex that is held during the iteration.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
Link: https://lore.kernel.org/linux-trace-kernel/20240201123346.724afa46@gandalf.local.home
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 110e8a272189..9d9c7dc3114b 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -483,7 +483,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
struct dentry *result = NULL;
ti = get_tracefs(dir);
- if (!(ti->flags & TRACEFS_EVENT_INODE))
+ if (WARN_ON_ONCE(!(ti->flags & TRACEFS_EVENT_INODE)))
return ERR_PTR(-EIO);
mutex_lock(&eventfs_mutex);
@@ -495,7 +495,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
list_for_each_entry(ei_child, &ei->children, list) {
if (strcmp(ei_child->name, name) != 0)
continue;
- if (ei_child->is_freed)
+ /* A child is freed and removed from the list at the same time */
+ if (WARN_ON_ONCE(ei_child->is_freed))
goto out;
result = lookup_dir_entry(dentry, ei, ei_child);
goto out;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 02/11] eventfs: Create eventfs_root_inode to store dentry
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 01/11] eventfs: Add WARN_ON_ONCE() to checks in eventfs_root_lookup() Steven Rostedt
@ 2024-02-21 14:07 ` Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 03/11] tracing: Have saved_cmdlines arrays all in one allocation Steven Rostedt
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:07 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Christian Brauner, Al Viro, Ajay Kaher
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Only the root "events" directory stores a dentry. There's no reason to
hold a dentry pointer for every eventfs_inode as it is never set except
for the root "events" eventfs_inode.
Create a eventfs_root_inode structure that holds the events_dir dentry.
The "events" eventfs_inode *is* special, let it have its own descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.658992558@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 65 +++++++++++++++++++++++++++++++++-------
fs/tracefs/internal.h | 2 --
2 files changed, 55 insertions(+), 12 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 9d9c7dc3114b..dc067eeb6387 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -35,6 +35,17 @@ static DEFINE_MUTEX(eventfs_mutex);
/* Choose something "unique" ;-) */
#define EVENTFS_FILE_INODE_INO 0x12c4e37
+struct eventfs_root_inode {
+ struct eventfs_inode ei;
+ struct dentry *events_dir;
+};
+
+static struct eventfs_root_inode *get_root_inode(struct eventfs_inode *ei)
+{
+ WARN_ON_ONCE(!ei->is_events);
+ return container_of(ei, struct eventfs_root_inode, ei);
+}
+
/* Just try to make something consistent and unique */
static int eventfs_dir_ino(struct eventfs_inode *ei)
{
@@ -73,12 +84,18 @@ enum {
static void release_ei(struct kref *ref)
{
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
+ struct eventfs_root_inode *rei;
WARN_ON_ONCE(!ei->is_freed);
kfree(ei->entry_attrs);
kfree_const(ei->name);
- kfree_rcu(ei, rcu);
+ if (ei->is_events) {
+ rei = get_root_inode(ei);
+ kfree_rcu(rei, ei.rcu);
+ } else {
+ kfree_rcu(ei, rcu);
+ }
}
static inline void put_ei(struct eventfs_inode *ei)
@@ -408,19 +425,43 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry,
return NULL;
}
+static inline struct eventfs_inode *init_ei(struct eventfs_inode *ei, const char *name)
+{
+ ei->name = kstrdup_const(name, GFP_KERNEL);
+ if (!ei->name)
+ return NULL;
+ kref_init(&ei->kref);
+ return ei;
+}
+
static inline struct eventfs_inode *alloc_ei(const char *name)
{
struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+ struct eventfs_inode *result;
if (!ei)
return NULL;
- ei->name = kstrdup_const(name, GFP_KERNEL);
- if (!ei->name) {
+ result = init_ei(ei, name);
+ if (!result)
kfree(ei);
+
+ return result;
+}
+
+static inline struct eventfs_inode *alloc_root_ei(const char *name)
+{
+ struct eventfs_root_inode *rei = kzalloc(sizeof(*rei), GFP_KERNEL);
+ struct eventfs_inode *ei;
+
+ if (!rei)
return NULL;
- }
- kref_init(&ei->kref);
+
+ rei->ei.is_events = 1;
+ ei = init_ei(&rei->ei, name);
+ if (!ei)
+ kfree(rei);
+
return ei;
}
@@ -710,6 +751,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
int size, void *data)
{
struct dentry *dentry = tracefs_start_creating(name, parent);
+ struct eventfs_root_inode *rei;
struct eventfs_inode *ei;
struct tracefs_inode *ti;
struct inode *inode;
@@ -722,7 +764,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
if (IS_ERR(dentry))
return ERR_CAST(dentry);
- ei = alloc_ei(name);
+ ei = alloc_root_ei(name);
if (!ei)
goto fail;
@@ -731,10 +773,11 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
goto fail;
// Note: we have a ref to the dentry from tracefs_start_creating()
- ei->events_dir = dentry;
+ rei = get_root_inode(ei);
+ rei->events_dir = dentry;
+
ei->entries = entries;
ei->nr_entries = size;
- ei->is_events = 1;
ei->data = data;
/* Save the ownership of this directory */
@@ -845,13 +888,15 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
*/
void eventfs_remove_events_dir(struct eventfs_inode *ei)
{
+ struct eventfs_root_inode *rei;
struct dentry *dentry;
- dentry = ei->events_dir;
+ rei = get_root_inode(ei);
+ dentry = rei->events_dir;
if (!dentry)
return;
- ei->events_dir = NULL;
+ rei->events_dir = NULL;
eventfs_remove_dir(ei);
/*
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index beb3dcd0e434..15c26f9aaad4 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -36,7 +36,6 @@ struct eventfs_attr {
* @children: link list into the child eventfs_inode
* @entries: the array of entries representing the files in the directory
* @name: the name of the directory to create
- * @events_dir: the dentry of the events directory
* @entry_attrs: Saved mode and ownership of the @d_children
* @data: The private data to pass to the callbacks
* @attr: Saved mode and ownership of eventfs_inode itself
@@ -54,7 +53,6 @@ struct eventfs_inode {
struct list_head children;
const struct eventfs_entry *entries;
const char *name;
- struct dentry *events_dir;
struct eventfs_attr *entry_attrs;
void *data;
struct eventfs_attr attr;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 03/11] tracing: Have saved_cmdlines arrays all in one allocation
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 01/11] eventfs: Add WARN_ON_ONCE() to checks in eventfs_root_lookup() Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 02/11] eventfs: Create eventfs_root_inode to store dentry Steven Rostedt
@ 2024-02-21 14:07 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 04/11] tracing: Move open coded processing of tgid_map into helper function Steven Rostedt
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:07 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Tim Chen, Vincent Donnefort, Sven Schnelle, Mete Durlu
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The saved_cmdlines have three arrays for mapping PIDs to COMMs:
- map_pid_to_cmdline[]
- map_cmdline_to_pid[]
- saved_cmdlines
The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
full pid as it can be larger than PID_MAX_DEFAULT. And the
saved_cmdlines[] just holds the COMMs associated to the pids.
Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
together (in reality the saved_cmdlines is just in the memory of the
rounding of the allocation of the structure as it is always allocated in
powers of two). The map_cmdline_to_pid[] array is allocated separately.
Since the rounding to a power of two is rather large (it allows for 8000
elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
(This drops it to 6000 by default, which is still plenty for most use
cases). This saves even more memory as the map_cmdline_to_pid[] array
doesn't need to be allocated.
Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.068211d9@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240220140703.182330529@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8198bfc54b58..52faa30e64ed 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2325,6 +2325,10 @@ struct saved_cmdlines_buffer {
};
static struct saved_cmdlines_buffer *savedcmd;
+/* Holds the size of a cmdline and pid element */
+#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s) \
+ (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
+
static inline char *get_saved_cmdlines(int idx)
{
return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN];
@@ -2339,7 +2343,6 @@ static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
{
int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
- kfree(s->map_cmdline_to_pid);
kmemleak_free(s);
free_pages((unsigned long)s, order);
}
@@ -2352,7 +2355,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
int order;
/* Figure out how much is needed to hold the given number of cmdlines */
- orig_size = sizeof(*s) + val * TASK_COMM_LEN;
+ orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
order = get_order(orig_size);
size = 1 << (order + PAGE_SHIFT);
page = alloc_pages(GFP_KERNEL, order);
@@ -2364,16 +2367,11 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
memset(s, 0, sizeof(*s));
/* Round up to actual allocation */
- val = (size - sizeof(*s)) / TASK_COMM_LEN;
+ val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
s->cmdline_num = val;
- s->map_cmdline_to_pid = kmalloc_array(val,
- sizeof(*s->map_cmdline_to_pid),
- GFP_KERNEL);
- if (!s->map_cmdline_to_pid) {
- free_saved_cmdlines_buffer(s);
- return NULL;
- }
+ /* Place map_cmdline_to_pid array right after saved_cmdlines */
+ s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
s->cmdline_idx = 0;
memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 04/11] tracing: Move open coded processing of tgid_map into helper function
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (2 preceding siblings ...)
2024-02-21 14:07 ` [for-next][PATCH 03/11] tracing: Have saved_cmdlines arrays all in one allocation Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 05/11] tracing: Move saved_cmdline code into trace_sched_switch.c Steven Rostedt
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Tim Chen, Vincent Donnefort, Sven Schnelle, Mete Durlu
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
In preparation of moving the saved_cmdlines logic out of trace.c and into
trace_sched_switch.c, replace the open coded manipulation of tgid_map in
set_tracer_flag() into a helper function trace_alloc_tgid_map() so that it
can be easily moved into trace_sched_switch.c without changing existing
functions in trace.c.
No functional changes.
Link: https://lore.kernel.org/linux-trace-kernel/20240220140703.338116216@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 52faa30e64ed..06c593fc93d0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5432,10 +5432,31 @@ int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set)
return 0;
}
-int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
+static int trace_alloc_tgid_map(void)
{
int *map;
+ if (tgid_map)
+ return 0;
+
+ tgid_map_max = pid_max;
+ map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
+ GFP_KERNEL);
+ if (!map)
+ return -ENOMEM;
+
+ /*
+ * Pairs with smp_load_acquire() in
+ * trace_find_tgid_ptr() to ensure that if it observes
+ * the tgid_map we just allocated then it also observes
+ * the corresponding tgid_map_max value.
+ */
+ smp_store_release(&tgid_map, map);
+ return 0;
+}
+
+int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
+{
if ((mask == TRACE_ITER_RECORD_TGID) ||
(mask == TRACE_ITER_RECORD_CMD))
lockdep_assert_held(&event_mutex);
@@ -5458,20 +5479,7 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
trace_event_enable_cmd_record(enabled);
if (mask == TRACE_ITER_RECORD_TGID) {
- if (!tgid_map) {
- tgid_map_max = pid_max;
- map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
- GFP_KERNEL);
-
- /*
- * Pairs with smp_load_acquire() in
- * trace_find_tgid_ptr() to ensure that if it observes
- * the tgid_map we just allocated then it also observes
- * the corresponding tgid_map_max value.
- */
- smp_store_release(&tgid_map, map);
- }
- if (!tgid_map) {
+ if (trace_alloc_tgid_map() < 0) {
tr->trace_flags &= ~TRACE_ITER_RECORD_TGID;
return -ENOMEM;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 05/11] tracing: Move saved_cmdline code into trace_sched_switch.c
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (3 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 04/11] tracing: Move open coded processing of tgid_map into helper function Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 06/11] ring-buffer: Zero ring-buffer sub-buffers Steven Rostedt
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Tim Chen, Vincent Donnefort, Sven Schnelle, Mete Durlu
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The code that handles saved_cmdlines is split between the trace.c file and
the trace_sched_switch.c. There's some history to this. The
trace_sched_switch.c was originally created to handle the sched_switch
tracer that was deprecated due to sched_switch trace event making it
obsolete. But that file did not get deleted as it had some code to help
with saved_cmdlines. But trace.c has grown tremendously since then. Just
move all the saved_cmdlines code into trace_sched_switch.c as that's the
only reason that file still exists, and trace.c has gotten too big.
No functional changes.
Link: https://lore.kernel.org/linux-trace-kernel/20240220140703.497966629@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 515 +-----------------------------
kernel/trace/trace.h | 10 +
kernel/trace/trace_sched_switch.c | 515 ++++++++++++++++++++++++++++++
3 files changed, 528 insertions(+), 512 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 06c593fc93d0..50fab999e72e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -39,7 +39,6 @@
#include <linux/ctype.h>
#include <linux/init.h>
#include <linux/panic_notifier.h>
-#include <linux/kmemleak.h>
#include <linux/poll.h>
#include <linux/nmi.h>
#include <linux/fs.h>
@@ -105,7 +104,7 @@ dummy_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
* tracing is active, only save the comm when a trace event
* occurred.
*/
-static DEFINE_PER_CPU(bool, trace_taskinfo_save);
+DEFINE_PER_CPU(bool, trace_taskinfo_save);
/*
* Kill all tracing for good (never come back).
@@ -2299,96 +2298,6 @@ void tracing_reset_all_online_cpus(void)
mutex_unlock(&trace_types_lock);
}
-/*
- * The tgid_map array maps from pid to tgid; i.e. the value stored at index i
- * is the tgid last observed corresponding to pid=i.
- */
-static int *tgid_map;
-
-/* The maximum valid index into tgid_map. */
-static size_t tgid_map_max;
-
-#define SAVED_CMDLINES_DEFAULT 128
-#define NO_CMDLINE_MAP UINT_MAX
-/*
- * Preemption must be disabled before acquiring trace_cmdline_lock.
- * The various trace_arrays' max_lock must be acquired in a context
- * where interrupt is disabled.
- */
-static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-struct saved_cmdlines_buffer {
- unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
- unsigned *map_cmdline_to_pid;
- unsigned cmdline_num;
- int cmdline_idx;
- char saved_cmdlines[];
-};
-static struct saved_cmdlines_buffer *savedcmd;
-
-/* Holds the size of a cmdline and pid element */
-#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s) \
- (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
-
-static inline char *get_saved_cmdlines(int idx)
-{
- return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN];
-}
-
-static inline void set_cmdline(int idx, const char *cmdline)
-{
- strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN);
-}
-
-static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
-{
- int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
-
- kmemleak_free(s);
- free_pages((unsigned long)s, order);
-}
-
-static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
-{
- struct saved_cmdlines_buffer *s;
- struct page *page;
- int orig_size, size;
- int order;
-
- /* Figure out how much is needed to hold the given number of cmdlines */
- orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
- order = get_order(orig_size);
- size = 1 << (order + PAGE_SHIFT);
- page = alloc_pages(GFP_KERNEL, order);
- if (!page)
- return NULL;
-
- s = page_address(page);
- kmemleak_alloc(s, size, 1, GFP_KERNEL);
- memset(s, 0, sizeof(*s));
-
- /* Round up to actual allocation */
- val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
- s->cmdline_num = val;
-
- /* Place map_cmdline_to_pid array right after saved_cmdlines */
- s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
-
- s->cmdline_idx = 0;
- memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
- sizeof(s->map_pid_to_cmdline));
- memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
- val * sizeof(*s->map_cmdline_to_pid));
-
- return s;
-}
-
-static int trace_create_savedcmd(void)
-{
- savedcmd = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT);
-
- return savedcmd ? 0 : -ENOMEM;
-}
-
int is_tracing_stopped(void)
{
return global_trace.stop_count;
@@ -2481,201 +2390,6 @@ void tracing_stop(void)
return tracing_stop_tr(&global_trace);
}
-static int trace_save_cmdline(struct task_struct *tsk)
-{
- unsigned tpid, idx;
-
- /* treat recording of idle task as a success */
- if (!tsk->pid)
- return 1;
-
- tpid = tsk->pid & (PID_MAX_DEFAULT - 1);
-
- /*
- * It's not the end of the world if we don't get
- * the lock, but we also don't want to spin
- * nor do we want to disable interrupts,
- * so if we miss here, then better luck next time.
- *
- * This is called within the scheduler and wake up, so interrupts
- * had better been disabled and run queue lock been held.
- */
- lockdep_assert_preemption_disabled();
- if (!arch_spin_trylock(&trace_cmdline_lock))
- return 0;
-
- idx = savedcmd->map_pid_to_cmdline[tpid];
- if (idx == NO_CMDLINE_MAP) {
- idx = (savedcmd->cmdline_idx + 1) % savedcmd->cmdline_num;
-
- savedcmd->map_pid_to_cmdline[tpid] = idx;
- savedcmd->cmdline_idx = idx;
- }
-
- savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
- set_cmdline(idx, tsk->comm);
-
- arch_spin_unlock(&trace_cmdline_lock);
-
- return 1;
-}
-
-static void __trace_find_cmdline(int pid, char comm[])
-{
- unsigned map;
- int tpid;
-
- if (!pid) {
- strcpy(comm, "<idle>");
- return;
- }
-
- if (WARN_ON_ONCE(pid < 0)) {
- strcpy(comm, "<XXX>");
- return;
- }
-
- tpid = pid & (PID_MAX_DEFAULT - 1);
- map = savedcmd->map_pid_to_cmdline[tpid];
- if (map != NO_CMDLINE_MAP) {
- tpid = savedcmd->map_cmdline_to_pid[map];
- if (tpid == pid) {
- strscpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
- return;
- }
- }
- strcpy(comm, "<...>");
-}
-
-void trace_find_cmdline(int pid, char comm[])
-{
- preempt_disable();
- arch_spin_lock(&trace_cmdline_lock);
-
- __trace_find_cmdline(pid, comm);
-
- arch_spin_unlock(&trace_cmdline_lock);
- preempt_enable();
-}
-
-static int *trace_find_tgid_ptr(int pid)
-{
- /*
- * Pairs with the smp_store_release in set_tracer_flag() to ensure that
- * if we observe a non-NULL tgid_map then we also observe the correct
- * tgid_map_max.
- */
- int *map = smp_load_acquire(&tgid_map);
-
- if (unlikely(!map || pid > tgid_map_max))
- return NULL;
-
- return &map[pid];
-}
-
-int trace_find_tgid(int pid)
-{
- int *ptr = trace_find_tgid_ptr(pid);
-
- return ptr ? *ptr : 0;
-}
-
-static int trace_save_tgid(struct task_struct *tsk)
-{
- int *ptr;
-
- /* treat recording of idle task as a success */
- if (!tsk->pid)
- return 1;
-
- ptr = trace_find_tgid_ptr(tsk->pid);
- if (!ptr)
- return 0;
-
- *ptr = tsk->tgid;
- return 1;
-}
-
-static bool tracing_record_taskinfo_skip(int flags)
-{
- if (unlikely(!(flags & (TRACE_RECORD_CMDLINE | TRACE_RECORD_TGID))))
- return true;
- if (!__this_cpu_read(trace_taskinfo_save))
- return true;
- return false;
-}
-
-/**
- * tracing_record_taskinfo - record the task info of a task
- *
- * @task: task to record
- * @flags: TRACE_RECORD_CMDLINE for recording comm
- * TRACE_RECORD_TGID for recording tgid
- */
-void tracing_record_taskinfo(struct task_struct *task, int flags)
-{
- bool done;
-
- if (tracing_record_taskinfo_skip(flags))
- return;
-
- /*
- * Record as much task information as possible. If some fail, continue
- * to try to record the others.
- */
- done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(task);
- done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(task);
-
- /* If recording any information failed, retry again soon. */
- if (!done)
- return;
-
- __this_cpu_write(trace_taskinfo_save, false);
-}
-
-/**
- * tracing_record_taskinfo_sched_switch - record task info for sched_switch
- *
- * @prev: previous task during sched_switch
- * @next: next task during sched_switch
- * @flags: TRACE_RECORD_CMDLINE for recording comm
- * TRACE_RECORD_TGID for recording tgid
- */
-void tracing_record_taskinfo_sched_switch(struct task_struct *prev,
- struct task_struct *next, int flags)
-{
- bool done;
-
- if (tracing_record_taskinfo_skip(flags))
- return;
-
- /*
- * Record as much task information as possible. If some fail, continue
- * to try to record the others.
- */
- done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(prev);
- done &= !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(next);
- done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(prev);
- done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(next);
-
- /* If recording any information failed, retry again soon. */
- if (!done)
- return;
-
- __this_cpu_write(trace_taskinfo_save, false);
-}
-
-/* Helpers to record a specific task information */
-void tracing_record_cmdline(struct task_struct *task)
-{
- tracing_record_taskinfo(task, TRACE_RECORD_CMDLINE);
-}
-
-void tracing_record_tgid(struct task_struct *task)
-{
- tracing_record_taskinfo(task, TRACE_RECORD_TGID);
-}
-
/*
* Several functions return TRACE_TYPE_PARTIAL_LINE if the trace_seq
* overflowed, and TRACE_TYPE_HANDLED otherwise. This helper function
@@ -5432,29 +5146,6 @@ int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set)
return 0;
}
-static int trace_alloc_tgid_map(void)
-{
- int *map;
-
- if (tgid_map)
- return 0;
-
- tgid_map_max = pid_max;
- map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
- GFP_KERNEL);
- if (!map)
- return -ENOMEM;
-
- /*
- * Pairs with smp_load_acquire() in
- * trace_find_tgid_ptr() to ensure that if it observes
- * the tgid_map we just allocated then it also observes
- * the corresponding tgid_map_max value.
- */
- smp_store_release(&tgid_map, map);
- return 0;
-}
-
int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
{
if ((mask == TRACE_ITER_RECORD_TGID) ||
@@ -5479,6 +5170,7 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
trace_event_enable_cmd_record(enabled);
if (mask == TRACE_ITER_RECORD_TGID) {
+
if (trace_alloc_tgid_map() < 0) {
tr->trace_flags &= ~TRACE_ITER_RECORD_TGID;
return -ENOMEM;
@@ -5924,207 +5616,6 @@ static const struct file_operations tracing_readme_fops = {
.llseek = generic_file_llseek,
};
-static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
-{
- int pid = ++(*pos);
-
- return trace_find_tgid_ptr(pid);
-}
-
-static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
-{
- int pid = *pos;
-
- return trace_find_tgid_ptr(pid);
-}
-
-static void saved_tgids_stop(struct seq_file *m, void *v)
-{
-}
-
-static int saved_tgids_show(struct seq_file *m, void *v)
-{
- int *entry = (int *)v;
- int pid = entry - tgid_map;
- int tgid = *entry;
-
- if (tgid == 0)
- return SEQ_SKIP;
-
- seq_printf(m, "%d %d\n", pid, tgid);
- return 0;
-}
-
-static const struct seq_operations tracing_saved_tgids_seq_ops = {
- .start = saved_tgids_start,
- .stop = saved_tgids_stop,
- .next = saved_tgids_next,
- .show = saved_tgids_show,
-};
-
-static int tracing_saved_tgids_open(struct inode *inode, struct file *filp)
-{
- int ret;
-
- ret = tracing_check_open_get_tr(NULL);
- if (ret)
- return ret;
-
- return seq_open(filp, &tracing_saved_tgids_seq_ops);
-}
-
-
-static const struct file_operations tracing_saved_tgids_fops = {
- .open = tracing_saved_tgids_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release,
-};
-
-static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos)
-{
- unsigned int *ptr = v;
-
- if (*pos || m->count)
- ptr++;
-
- (*pos)++;
-
- for (; ptr < &savedcmd->map_cmdline_to_pid[savedcmd->cmdline_num];
- ptr++) {
- if (*ptr == -1 || *ptr == NO_CMDLINE_MAP)
- continue;
-
- return ptr;
- }
-
- return NULL;
-}
-
-static void *saved_cmdlines_start(struct seq_file *m, loff_t *pos)
-{
- void *v;
- loff_t l = 0;
-
- preempt_disable();
- arch_spin_lock(&trace_cmdline_lock);
-
- v = &savedcmd->map_cmdline_to_pid[0];
- while (l <= *pos) {
- v = saved_cmdlines_next(m, v, &l);
- if (!v)
- return NULL;
- }
-
- return v;
-}
-
-static void saved_cmdlines_stop(struct seq_file *m, void *v)
-{
- arch_spin_unlock(&trace_cmdline_lock);
- preempt_enable();
-}
-
-static int saved_cmdlines_show(struct seq_file *m, void *v)
-{
- char buf[TASK_COMM_LEN];
- unsigned int *pid = v;
-
- __trace_find_cmdline(*pid, buf);
- seq_printf(m, "%d %s\n", *pid, buf);
- return 0;
-}
-
-static const struct seq_operations tracing_saved_cmdlines_seq_ops = {
- .start = saved_cmdlines_start,
- .next = saved_cmdlines_next,
- .stop = saved_cmdlines_stop,
- .show = saved_cmdlines_show,
-};
-
-static int tracing_saved_cmdlines_open(struct inode *inode, struct file *filp)
-{
- int ret;
-
- ret = tracing_check_open_get_tr(NULL);
- if (ret)
- return ret;
-
- return seq_open(filp, &tracing_saved_cmdlines_seq_ops);
-}
-
-static const struct file_operations tracing_saved_cmdlines_fops = {
- .open = tracing_saved_cmdlines_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release,
-};
-
-static ssize_t
-tracing_saved_cmdlines_size_read(struct file *filp, char __user *ubuf,
- size_t cnt, loff_t *ppos)
-{
- char buf[64];
- int r;
-
- preempt_disable();
- arch_spin_lock(&trace_cmdline_lock);
- r = scnprintf(buf, sizeof(buf), "%u\n", savedcmd->cmdline_num);
- arch_spin_unlock(&trace_cmdline_lock);
- preempt_enable();
-
- return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
-}
-
-static int tracing_resize_saved_cmdlines(unsigned int val)
-{
- struct saved_cmdlines_buffer *s, *savedcmd_temp;
-
- s = allocate_cmdlines_buffer(val);
- if (!s)
- return -ENOMEM;
-
- preempt_disable();
- arch_spin_lock(&trace_cmdline_lock);
- savedcmd_temp = savedcmd;
- savedcmd = s;
- arch_spin_unlock(&trace_cmdline_lock);
- preempt_enable();
- free_saved_cmdlines_buffer(savedcmd_temp);
-
- return 0;
-}
-
-static ssize_t
-tracing_saved_cmdlines_size_write(struct file *filp, const char __user *ubuf,
- size_t cnt, loff_t *ppos)
-{
- unsigned long val;
- int ret;
-
- ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
- if (ret)
- return ret;
-
- /* must have at least 1 entry or less than PID_MAX_DEFAULT */
- if (!val || val > PID_MAX_DEFAULT)
- return -EINVAL;
-
- ret = tracing_resize_saved_cmdlines((unsigned int)val);
- if (ret < 0)
- return ret;
-
- *ppos += cnt;
-
- return cnt;
-}
-
-static const struct file_operations tracing_saved_cmdlines_size_fops = {
- .open = tracing_open_generic,
- .read = tracing_saved_cmdlines_size_read,
- .write = tracing_saved_cmdlines_size_write,
-};
-
#ifdef CONFIG_TRACE_EVAL_MAP_FILE
static union trace_eval_map_item *
update_eval_map(union trace_eval_map_item *ptr)
@@ -10693,7 +10184,7 @@ __init static int tracer_alloc_buffers(void)
out_free_pipe_cpumask:
free_cpumask_var(global_trace.pipe_cpumask);
out_free_savedcmd:
- free_saved_cmdlines_buffer(savedcmd);
+ trace_free_saved_cmdlines_buffer();
out_free_temp_buffer:
ring_buffer_free(temp_buffer);
out_rm_hp_state:
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 00f873910c5d..e4f0714d7a49 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1375,6 +1375,16 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
trace_buffer_unlock_commit_regs(tr, buffer, event, trace_ctx, NULL);
}
+DECLARE_PER_CPU(bool, trace_taskinfo_save);
+int trace_save_cmdline(struct task_struct *tsk);
+int trace_create_savedcmd(void);
+int trace_alloc_tgid_map(void);
+void trace_free_saved_cmdlines_buffer(void);
+
+extern const struct file_operations tracing_saved_cmdlines_fops;
+extern const struct file_operations tracing_saved_tgids_fops;
+extern const struct file_operations tracing_saved_cmdlines_size_fops;
+
DECLARE_PER_CPU(struct ring_buffer_event *, trace_buffered_event);
DECLARE_PER_CPU(int, trace_buffered_event_cnt);
void trace_buffered_event_disable(void);
diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index c9ffdcfe622e..8a407adb0e1c 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -8,6 +8,7 @@
#include <linux/module.h>
#include <linux/kallsyms.h>
#include <linux/uaccess.h>
+#include <linux/kmemleak.h>
#include <linux/ftrace.h>
#include <trace/events/sched.h>
@@ -148,3 +149,517 @@ void tracing_stop_tgid_record(void)
{
tracing_stop_sched_switch(RECORD_TGID);
}
+
+/*
+ * The tgid_map array maps from pid to tgid; i.e. the value stored at index i
+ * is the tgid last observed corresponding to pid=i.
+ */
+static int *tgid_map;
+
+/* The maximum valid index into tgid_map. */
+static size_t tgid_map_max;
+
+#define SAVED_CMDLINES_DEFAULT 128
+#define NO_CMDLINE_MAP UINT_MAX
+/*
+ * Preemption must be disabled before acquiring trace_cmdline_lock.
+ * The various trace_arrays' max_lock must be acquired in a context
+ * where interrupt is disabled.
+ */
+static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
+struct saved_cmdlines_buffer {
+ unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
+ unsigned *map_cmdline_to_pid;
+ unsigned cmdline_num;
+ int cmdline_idx;
+ char saved_cmdlines[];
+};
+static struct saved_cmdlines_buffer *savedcmd;
+
+/* Holds the size of a cmdline and pid element */
+#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s) \
+ (TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
+
+static inline char *get_saved_cmdlines(int idx)
+{
+ return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN];
+}
+
+static inline void set_cmdline(int idx, const char *cmdline)
+{
+ strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN);
+}
+
+static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
+{
+ int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
+
+ kmemleak_free(s);
+ free_pages((unsigned long)s, order);
+}
+
+static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
+{
+ struct saved_cmdlines_buffer *s;
+ struct page *page;
+ int orig_size, size;
+ int order;
+
+ /* Figure out how much is needed to hold the given number of cmdlines */
+ orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
+ order = get_order(orig_size);
+ size = 1 << (order + PAGE_SHIFT);
+ page = alloc_pages(GFP_KERNEL, order);
+ if (!page)
+ return NULL;
+
+ s = page_address(page);
+ kmemleak_alloc(s, size, 1, GFP_KERNEL);
+ memset(s, 0, sizeof(*s));
+
+ /* Round up to actual allocation */
+ val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
+ s->cmdline_num = val;
+
+ /* Place map_cmdline_to_pid array right after saved_cmdlines */
+ s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
+
+ s->cmdline_idx = 0;
+ memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
+ sizeof(s->map_pid_to_cmdline));
+ memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
+ val * sizeof(*s->map_cmdline_to_pid));
+
+ return s;
+}
+
+int trace_create_savedcmd(void)
+{
+ savedcmd = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT);
+
+ return savedcmd ? 0 : -ENOMEM;
+}
+
+int trace_save_cmdline(struct task_struct *tsk)
+{
+ unsigned tpid, idx;
+
+ /* treat recording of idle task as a success */
+ if (!tsk->pid)
+ return 1;
+
+ tpid = tsk->pid & (PID_MAX_DEFAULT - 1);
+
+ /*
+ * It's not the end of the world if we don't get
+ * the lock, but we also don't want to spin
+ * nor do we want to disable interrupts,
+ * so if we miss here, then better luck next time.
+ *
+ * This is called within the scheduler and wake up, so interrupts
+ * had better been disabled and run queue lock been held.
+ */
+ lockdep_assert_preemption_disabled();
+ if (!arch_spin_trylock(&trace_cmdline_lock))
+ return 0;
+
+ idx = savedcmd->map_pid_to_cmdline[tpid];
+ if (idx == NO_CMDLINE_MAP) {
+ idx = (savedcmd->cmdline_idx + 1) % savedcmd->cmdline_num;
+
+ savedcmd->map_pid_to_cmdline[tpid] = idx;
+ savedcmd->cmdline_idx = idx;
+ }
+
+ savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
+ set_cmdline(idx, tsk->comm);
+
+ arch_spin_unlock(&trace_cmdline_lock);
+
+ return 1;
+}
+
+static void __trace_find_cmdline(int pid, char comm[])
+{
+ unsigned map;
+ int tpid;
+
+ if (!pid) {
+ strcpy(comm, "<idle>");
+ return;
+ }
+
+ if (WARN_ON_ONCE(pid < 0)) {
+ strcpy(comm, "<XXX>");
+ return;
+ }
+
+ tpid = pid & (PID_MAX_DEFAULT - 1);
+ map = savedcmd->map_pid_to_cmdline[tpid];
+ if (map != NO_CMDLINE_MAP) {
+ tpid = savedcmd->map_cmdline_to_pid[map];
+ if (tpid == pid) {
+ strscpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
+ return;
+ }
+ }
+ strcpy(comm, "<...>");
+}
+
+void trace_find_cmdline(int pid, char comm[])
+{
+ preempt_disable();
+ arch_spin_lock(&trace_cmdline_lock);
+
+ __trace_find_cmdline(pid, comm);
+
+ arch_spin_unlock(&trace_cmdline_lock);
+ preempt_enable();
+}
+
+static int *trace_find_tgid_ptr(int pid)
+{
+ /*
+ * Pairs with the smp_store_release in set_tracer_flag() to ensure that
+ * if we observe a non-NULL tgid_map then we also observe the correct
+ * tgid_map_max.
+ */
+ int *map = smp_load_acquire(&tgid_map);
+
+ if (unlikely(!map || pid > tgid_map_max))
+ return NULL;
+
+ return &map[pid];
+}
+
+int trace_find_tgid(int pid)
+{
+ int *ptr = trace_find_tgid_ptr(pid);
+
+ return ptr ? *ptr : 0;
+}
+
+static int trace_save_tgid(struct task_struct *tsk)
+{
+ int *ptr;
+
+ /* treat recording of idle task as a success */
+ if (!tsk->pid)
+ return 1;
+
+ ptr = trace_find_tgid_ptr(tsk->pid);
+ if (!ptr)
+ return 0;
+
+ *ptr = tsk->tgid;
+ return 1;
+}
+
+static bool tracing_record_taskinfo_skip(int flags)
+{
+ if (unlikely(!(flags & (TRACE_RECORD_CMDLINE | TRACE_RECORD_TGID))))
+ return true;
+ if (!__this_cpu_read(trace_taskinfo_save))
+ return true;
+ return false;
+}
+
+/**
+ * tracing_record_taskinfo - record the task info of a task
+ *
+ * @task: task to record
+ * @flags: TRACE_RECORD_CMDLINE for recording comm
+ * TRACE_RECORD_TGID for recording tgid
+ */
+void tracing_record_taskinfo(struct task_struct *task, int flags)
+{
+ bool done;
+
+ if (tracing_record_taskinfo_skip(flags))
+ return;
+
+ /*
+ * Record as much task information as possible. If some fail, continue
+ * to try to record the others.
+ */
+ done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(task);
+ done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(task);
+
+ /* If recording any information failed, retry again soon. */
+ if (!done)
+ return;
+
+ __this_cpu_write(trace_taskinfo_save, false);
+}
+
+/**
+ * tracing_record_taskinfo_sched_switch - record task info for sched_switch
+ *
+ * @prev: previous task during sched_switch
+ * @next: next task during sched_switch
+ * @flags: TRACE_RECORD_CMDLINE for recording comm
+ * TRACE_RECORD_TGID for recording tgid
+ */
+void tracing_record_taskinfo_sched_switch(struct task_struct *prev,
+ struct task_struct *next, int flags)
+{
+ bool done;
+
+ if (tracing_record_taskinfo_skip(flags))
+ return;
+
+ /*
+ * Record as much task information as possible. If some fail, continue
+ * to try to record the others.
+ */
+ done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(prev);
+ done &= !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(next);
+ done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(prev);
+ done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(next);
+
+ /* If recording any information failed, retry again soon. */
+ if (!done)
+ return;
+
+ __this_cpu_write(trace_taskinfo_save, false);
+}
+
+/* Helpers to record a specific task information */
+void tracing_record_cmdline(struct task_struct *task)
+{
+ tracing_record_taskinfo(task, TRACE_RECORD_CMDLINE);
+}
+
+void tracing_record_tgid(struct task_struct *task)
+{
+ tracing_record_taskinfo(task, TRACE_RECORD_TGID);
+}
+
+int trace_alloc_tgid_map(void)
+{
+ int *map;
+
+ if (tgid_map)
+ return 0;
+
+ tgid_map_max = pid_max;
+ map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
+ GFP_KERNEL);
+ if (!map)
+ return -ENOMEM;
+
+ /*
+ * Pairs with smp_load_acquire() in
+ * trace_find_tgid_ptr() to ensure that if it observes
+ * the tgid_map we just allocated then it also observes
+ * the corresponding tgid_map_max value.
+ */
+ smp_store_release(&tgid_map, map);
+ return 0;
+}
+
+static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ int pid = ++(*pos);
+
+ return trace_find_tgid_ptr(pid);
+}
+
+static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
+{
+ int pid = *pos;
+
+ return trace_find_tgid_ptr(pid);
+}
+
+static void saved_tgids_stop(struct seq_file *m, void *v)
+{
+}
+
+static int saved_tgids_show(struct seq_file *m, void *v)
+{
+ int *entry = (int *)v;
+ int pid = entry - tgid_map;
+ int tgid = *entry;
+
+ if (tgid == 0)
+ return SEQ_SKIP;
+
+ seq_printf(m, "%d %d\n", pid, tgid);
+ return 0;
+}
+
+static const struct seq_operations tracing_saved_tgids_seq_ops = {
+ .start = saved_tgids_start,
+ .stop = saved_tgids_stop,
+ .next = saved_tgids_next,
+ .show = saved_tgids_show,
+};
+
+static int tracing_saved_tgids_open(struct inode *inode, struct file *filp)
+{
+ int ret;
+
+ ret = tracing_check_open_get_tr(NULL);
+ if (ret)
+ return ret;
+
+ return seq_open(filp, &tracing_saved_tgids_seq_ops);
+}
+
+
+const struct file_operations tracing_saved_tgids_fops = {
+ .open = tracing_saved_tgids_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ unsigned int *ptr = v;
+
+ if (*pos || m->count)
+ ptr++;
+
+ (*pos)++;
+
+ for (; ptr < &savedcmd->map_cmdline_to_pid[savedcmd->cmdline_num];
+ ptr++) {
+ if (*ptr == -1 || *ptr == NO_CMDLINE_MAP)
+ continue;
+
+ return ptr;
+ }
+
+ return NULL;
+}
+
+static void *saved_cmdlines_start(struct seq_file *m, loff_t *pos)
+{
+ void *v;
+ loff_t l = 0;
+
+ preempt_disable();
+ arch_spin_lock(&trace_cmdline_lock);
+
+ v = &savedcmd->map_cmdline_to_pid[0];
+ while (l <= *pos) {
+ v = saved_cmdlines_next(m, v, &l);
+ if (!v)
+ return NULL;
+ }
+
+ return v;
+}
+
+static void saved_cmdlines_stop(struct seq_file *m, void *v)
+{
+ arch_spin_unlock(&trace_cmdline_lock);
+ preempt_enable();
+}
+
+static int saved_cmdlines_show(struct seq_file *m, void *v)
+{
+ char buf[TASK_COMM_LEN];
+ unsigned int *pid = v;
+
+ __trace_find_cmdline(*pid, buf);
+ seq_printf(m, "%d %s\n", *pid, buf);
+ return 0;
+}
+
+static const struct seq_operations tracing_saved_cmdlines_seq_ops = {
+ .start = saved_cmdlines_start,
+ .next = saved_cmdlines_next,
+ .stop = saved_cmdlines_stop,
+ .show = saved_cmdlines_show,
+};
+
+static int tracing_saved_cmdlines_open(struct inode *inode, struct file *filp)
+{
+ int ret;
+
+ ret = tracing_check_open_get_tr(NULL);
+ if (ret)
+ return ret;
+
+ return seq_open(filp, &tracing_saved_cmdlines_seq_ops);
+}
+
+const struct file_operations tracing_saved_cmdlines_fops = {
+ .open = tracing_saved_cmdlines_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static ssize_t
+tracing_saved_cmdlines_size_read(struct file *filp, char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ char buf[64];
+ int r;
+
+ preempt_disable();
+ arch_spin_lock(&trace_cmdline_lock);
+ r = scnprintf(buf, sizeof(buf), "%u\n", savedcmd->cmdline_num);
+ arch_spin_unlock(&trace_cmdline_lock);
+ preempt_enable();
+
+ return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+void trace_free_saved_cmdlines_buffer(void)
+{
+ free_saved_cmdlines_buffer(savedcmd);
+}
+
+static int tracing_resize_saved_cmdlines(unsigned int val)
+{
+ struct saved_cmdlines_buffer *s, *savedcmd_temp;
+
+ s = allocate_cmdlines_buffer(val);
+ if (!s)
+ return -ENOMEM;
+
+ preempt_disable();
+ arch_spin_lock(&trace_cmdline_lock);
+ savedcmd_temp = savedcmd;
+ savedcmd = s;
+ arch_spin_unlock(&trace_cmdline_lock);
+ preempt_enable();
+ free_saved_cmdlines_buffer(savedcmd_temp);
+
+ return 0;
+}
+
+static ssize_t
+tracing_saved_cmdlines_size_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ unsigned long val;
+ int ret;
+
+ ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+ if (ret)
+ return ret;
+
+ /* must have at least 1 entry or less than PID_MAX_DEFAULT */
+ if (!val || val > PID_MAX_DEFAULT)
+ return -EINVAL;
+
+ ret = tracing_resize_saved_cmdlines((unsigned int)val);
+ if (ret < 0)
+ return ret;
+
+ *ppos += cnt;
+
+ return cnt;
+}
+
+const struct file_operations tracing_saved_cmdlines_size_fops = {
+ .open = tracing_open_generic,
+ .read = tracing_saved_cmdlines_size_read,
+ .write = tracing_saved_cmdlines_size_write,
+};
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 06/11] ring-buffer: Zero ring-buffer sub-buffers
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (4 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 05/11] tracing: Move saved_cmdline code into trace_sched_switch.c Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 07/11] ring-buffer: Introducing ring-buffer mapping functions Steven Rostedt
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
In preparation for the ring-buffer memory mapping where each subbuf will
be accessible to user-space, zero all the page allocations.
Link: https://lore.kernel.org/linux-trace-kernel/20240220202310.2489614-2-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fd4bfe3ecf01..ca796675c0a1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1472,7 +1472,8 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
list_add(&bpage->list, pages);
- page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), mflags,
+ page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+ mflags | __GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page)
goto free_pages;
@@ -1557,7 +1558,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
cpu_buffer->reader_page = bpage;
- page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, cpu_buffer->buffer->subbuf_order);
+ page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO,
+ cpu_buffer->buffer->subbuf_order);
if (!page)
goto fail_free_reader;
bpage->page = page_address(page);
@@ -5525,7 +5527,8 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
if (bpage->data)
goto out;
- page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_NORETRY,
+ page = alloc_pages_node(cpu_to_node(cpu),
+ GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page) {
kfree(bpage);
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 07/11] ring-buffer: Introducing ring-buffer mapping functions
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (5 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 06/11] ring-buffer: Zero ring-buffer sub-buffers Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 08/11] tracing: Add snapshot refcount Steven Rostedt
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:
ring_buffer_{map,unmap}()
ring_buffer_map_fault()
And controls on the ring-buffer:
ring_buffer_map_get_reader() /* swap reader and head */
Mapping the ring-buffer also involves:
A unique ID for each subbuf of the ring-buffer, currently they are
only identified through their in-kernel VA.
A meta-page, where are stored ring-buffer statistics and a
description for the current reader
The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.
Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.
Link: https://lore.kernel.org/linux-trace-kernel/20240220202310.2489614-3-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/ring_buffer.h | 7 +
include/uapi/linux/trace_mmap.h | 46 ++++
kernel/trace/ring_buffer.c | 376 +++++++++++++++++++++++++++++++-
3 files changed, 426 insertions(+), 3 deletions(-)
create mode 100644 include/uapi/linux/trace_mmap.h
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..0841ba8bab14 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
#include <linux/seq_file.h>
#include <linux/poll.h>
+#include <uapi/linux/trace_mmap.h>
+
struct trace_buffer;
struct ring_buffer_iter;
@@ -221,4 +223,9 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
#define trace_rb_cpu_prepare NULL
#endif
+int ring_buffer_map(struct trace_buffer *buffer, int cpu);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
+ unsigned long pgoff);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
#endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index 000000000000..ffcd8dfcaa4f
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include <linux/types.h>
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size: Size of this meta-page.
+ * @meta_struct_len: Size of this structure.
+ * @subbuf_size: Size of each sub-buffer.
+ * @nr_subbufs: Number of subbfs in the ring-buffer, including the reader.
+ * @reader.lost_events: Number of events lost at the time of the reader swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : @nr_subbufs - 1]
+ * @reader.read: Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries: Number of entries in the ring-buffer.
+ * @overrun: Number of entries lost in the ring-buffer.
+ * @read: Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+ __u32 meta_page_size;
+ __u32 meta_struct_len;
+
+ __u32 subbuf_size;
+ __u32 nr_subbufs;
+
+ struct {
+ __u64 lost_events;
+ __u32 id;
+ __u32 read;
+ } reader;
+
+ __u64 flags;
+
+ __u64 entries;
+ __u64 overrun;
+ __u64 read;
+
+ __u64 Reserved1;
+ __u64 Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index ca796675c0a1..1d7d7a701867 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
#include <linux/ring_buffer.h>
#include <linux/trace_clock.h>
#include <linux/sched/clock.h>
+#include <linux/cacheflush.h>
#include <linux/trace_seq.h>
#include <linux/spinlock.h>
#include <linux/irq_work.h>
@@ -338,6 +339,7 @@ struct buffer_page {
local_t entries; /* entries on this page */
unsigned long real_end; /* real end of data */
unsigned order; /* order of the page */
+ u32 id; /* ID for external mapping */
struct buffer_data_page *page; /* Actual data page */
};
@@ -484,6 +486,12 @@ struct ring_buffer_per_cpu {
u64 read_stamp;
/* pages removed since last reset */
unsigned long pages_removed;
+
+ unsigned int mapped;
+ struct mutex mapping_lock;
+ unsigned long *subbuf_ids; /* ID to subbuf addr */
+ struct trace_buffer_meta *meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
long nr_pages_to_update;
struct list_head new_pages; /* new pages to add */
@@ -1548,6 +1556,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
init_waitqueue_head(&cpu_buffer->irq_work.waiters);
init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
+ mutex_init(&cpu_buffer->mapping_lock);
bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
GFP_KERNEL, cpu_to_node(cpu));
@@ -1738,8 +1747,6 @@ bool ring_buffer_time_stamp_abs(struct trace_buffer *buffer)
return buffer->time_stamp_abs;
}
-static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
-
static inline unsigned long rb_page_entries(struct buffer_page *bpage)
{
return local_read(&bpage->entries) & RB_WRITE_MASK;
@@ -5160,6 +5167,22 @@ static void rb_clear_buffer_page(struct buffer_page *page)
page->read = 0;
}
+static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ struct trace_buffer_meta *meta = cpu_buffer->meta_page;
+
+ meta->reader.read = cpu_buffer->reader_page->read;
+ meta->reader.id = cpu_buffer->reader_page->id;
+ meta->reader.lost_events = cpu_buffer->lost_events;
+
+ meta->entries = local_read(&cpu_buffer->entries);
+ meta->overrun = local_read(&cpu_buffer->overrun);
+ meta->read = cpu_buffer->read;
+
+ /* Some archs do not have data cache coherency between kernel and user-space */
+ flush_dcache_folio(virt_to_folio(cpu_buffer->meta_page));
+}
+
static void
rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
{
@@ -5204,6 +5227,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->lost_events = 0;
cpu_buffer->last_overrun = 0;
+ if (cpu_buffer->mapped)
+ rb_update_meta_page(cpu_buffer);
+
rb_head_page_activate(cpu_buffer);
cpu_buffer->pages_removed = 0;
}
@@ -5418,6 +5444,12 @@ int ring_buffer_swap_cpu(struct trace_buffer *buffer_a,
cpu_buffer_a = buffer_a->buffers[cpu];
cpu_buffer_b = buffer_b->buffers[cpu];
+ /* It's up to the callers to not try to swap mapped buffers */
+ if (WARN_ON_ONCE(cpu_buffer_a->mapped || cpu_buffer_b->mapped)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
/* At least make sure the two buffers are somewhat the same */
if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
goto out;
@@ -5682,7 +5714,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
* Otherwise, we can simply swap the page with the one passed in.
*/
if (read || (len < (commit - read)) ||
- cpu_buffer->reader_page == cpu_buffer->commit_page) {
+ cpu_buffer->reader_page == cpu_buffer->commit_page ||
+ cpu_buffer->mapped) {
struct buffer_data_page *rpage = cpu_buffer->reader_page->page;
unsigned int rpos = read;
unsigned int pos = 0;
@@ -5901,6 +5934,11 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
cpu_buffer = buffer->buffers[cpu];
+ if (cpu_buffer->mapped) {
+ err = -EBUSY;
+ goto error;
+ }
+
/* Update the number of pages to match the new size */
nr_pages = old_size * buffer->buffers[cpu]->nr_pages;
nr_pages = DIV_ROUND_UP(nr_pages, buffer->subbuf_size);
@@ -6002,6 +6040,338 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
}
EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_set);
+#define subbuf_page(off, start) \
+ virt_to_page((void *)(start + (off << PAGE_SHIFT)))
+
+#define foreach_subbuf_page(sub_order, start, page) \
+ page = subbuf_page(0, (start)); \
+ for (int __off = 0; __off < (1 << (sub_order)); \
+ __off++, page = subbuf_page(__off, (start)))
+
+static inline void subbuf_map_prepare(unsigned long subbuf_start, int order)
+{
+ struct page *page;
+
+ /*
+ * When allocating order > 0 pages, only the first struct page has a
+ * refcount > 1. Increasing the refcount here ensures none of the struct
+ * page composing the sub-buffer is freeed when the mapping is closed.
+ */
+ foreach_subbuf_page(order, subbuf_start, page)
+ page_ref_inc(page);
+}
+
+static inline void subbuf_unmap(unsigned long subbuf_start, int order)
+{
+ struct page *page;
+
+ foreach_subbuf_page(order, subbuf_start, page) {
+ page_ref_dec(page);
+ page->mapping = NULL;
+ }
+}
+
+static void rb_free_subbuf_ids(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ int sub_id;
+
+ for (sub_id = 0; sub_id < cpu_buffer->nr_pages + 1; sub_id++)
+ subbuf_unmap(cpu_buffer->subbuf_ids[sub_id],
+ cpu_buffer->buffer->subbuf_order);
+
+ kfree(cpu_buffer->subbuf_ids);
+ cpu_buffer->subbuf_ids = NULL;
+}
+
+static int rb_alloc_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ struct page *page;
+
+ if (cpu_buffer->meta_page)
+ return 0;
+
+ page = alloc_page(GFP_USER | __GFP_ZERO);
+ if (!page)
+ return -ENOMEM;
+
+ cpu_buffer->meta_page = page_to_virt(page);
+
+ return 0;
+}
+
+static void rb_free_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ unsigned long addr = (unsigned long)cpu_buffer->meta_page;
+
+ if (!addr)
+ return;
+
+ virt_to_page((void *)addr)->mapping = NULL;
+ free_page(addr);
+ cpu_buffer->meta_page = NULL;
+}
+
+static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
+ unsigned long *subbuf_ids)
+{
+ struct trace_buffer_meta *meta = cpu_buffer->meta_page;
+ unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
+ struct buffer_page *first_subbuf, *subbuf;
+ int id = 0;
+
+ subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
+ subbuf_map_prepare(subbuf_ids[id], cpu_buffer->buffer->subbuf_order);
+ cpu_buffer->reader_page->id = id++;
+
+ first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
+ do {
+ if (WARN_ON(id >= nr_subbufs))
+ break;
+
+ subbuf_ids[id] = (unsigned long)subbuf->page;
+ subbuf->id = id;
+ subbuf_map_prepare(subbuf_ids[id], cpu_buffer->buffer->subbuf_order);
+
+ rb_inc_page(&subbuf);
+ id++;
+ } while (subbuf != first_subbuf);
+
+ /* install subbuf ID to kern VA translation */
+ cpu_buffer->subbuf_ids = subbuf_ids;
+
+ meta->meta_page_size = PAGE_SIZE;
+ meta->meta_struct_len = sizeof(*meta);
+ meta->nr_subbufs = nr_subbufs;
+ meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+
+ rb_update_meta_page(cpu_buffer);
+}
+
+static inline struct ring_buffer_per_cpu *
+rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return ERR_PTR(-EINVAL);
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ mutex_lock(&cpu_buffer->mapping_lock);
+
+ if (!cpu_buffer->mapped) {
+ mutex_unlock(&cpu_buffer->mapping_lock);
+ return ERR_PTR(-ENODEV);
+ }
+
+ return cpu_buffer;
+}
+
+static inline void rb_put_mapped_buffer(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ mutex_unlock(&cpu_buffer->mapping_lock);
+}
+
+/*
+ * Fast-path for rb_buffer_(un)map(). Called whenever the meta-page doesn't need
+ * to be set-up or torn-down.
+ */
+static int __rb_inc_dec_mapped(struct trace_buffer *buffer,
+ struct ring_buffer_per_cpu *cpu_buffer,
+ bool inc)
+{
+ unsigned long flags;
+
+ lockdep_assert_held(&cpu_buffer->mapping_lock);
+
+ if (inc && cpu_buffer->mapped == UINT_MAX)
+ return -EBUSY;
+
+ if (WARN_ON(!inc && cpu_buffer->mapped == 0))
+ return -EINVAL;
+
+ mutex_lock(&buffer->mutex);
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ if (inc)
+ cpu_buffer->mapped++;
+ else
+ cpu_buffer->mapped--;
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ mutex_unlock(&buffer->mutex);
+
+ return 0;
+}
+
+int ring_buffer_map(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long flags, *subbuf_ids;
+ int err = 0;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -EINVAL;
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ mutex_lock(&cpu_buffer->mapping_lock);
+
+ if (cpu_buffer->mapped) {
+ err = __rb_inc_dec_mapped(buffer, cpu_buffer, true);
+ mutex_unlock(&cpu_buffer->mapping_lock);
+ return err;
+ }
+
+ /* prevent another thread from changing buffer/sub-buffer sizes */
+ mutex_lock(&buffer->mutex);
+
+ err = rb_alloc_meta_page(cpu_buffer);
+ if (err)
+ goto unlock;
+
+ /* subbuf_ids include the reader while nr_pages does not */
+ subbuf_ids = kcalloc(cpu_buffer->nr_pages + 1, sizeof(*subbuf_ids), GFP_KERNEL);
+ if (!subbuf_ids) {
+ rb_free_meta_page(cpu_buffer);
+ err = -ENOMEM;
+ goto unlock;
+ }
+
+ atomic_inc(&cpu_buffer->resize_disabled);
+
+ /*
+ * Lock all readers to block any subbuf swap until the subbuf IDs are
+ * assigned.
+ */
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
+ cpu_buffer->mapped = 1;
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+unlock:
+ mutex_unlock(&buffer->mutex);
+ mutex_unlock(&cpu_buffer->mapping_lock);
+
+ return err;
+}
+
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long flags;
+ int err = 0;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -EINVAL;
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ mutex_lock(&cpu_buffer->mapping_lock);
+
+ if (!cpu_buffer->mapped) {
+ err = -ENODEV;
+ goto out;
+ } else if (cpu_buffer->mapped > 1) {
+ __rb_inc_dec_mapped(buffer, cpu_buffer, false);
+ goto out;
+ }
+
+ mutex_lock(&buffer->mutex);
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ cpu_buffer->mapped = 0;
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+ rb_free_subbuf_ids(cpu_buffer);
+ rb_free_meta_page(cpu_buffer);
+ atomic_dec(&cpu_buffer->resize_disabled);
+
+ mutex_unlock(&buffer->mutex);
+out:
+ mutex_unlock(&cpu_buffer->mapping_lock);
+
+ return err;
+}
+
+/*
+ * +--------------+ pgoff == 0
+ * | meta page |
+ * +--------------+ pgoff == 1
+ * | subbuffer 0 |
+ * +--------------+ pgoff == 1 + (1 << subbuf_order)
+ * | subbuffer 1 |
+ * ...
+ */
+struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
+ unsigned long pgoff)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ unsigned long subbuf_id, subbuf_offset, addr;
+ struct page *page;
+
+ if (!pgoff)
+ return virt_to_page((void *)cpu_buffer->meta_page);
+
+ pgoff--;
+
+ subbuf_id = pgoff >> buffer->subbuf_order;
+ if (subbuf_id > cpu_buffer->nr_pages)
+ return NULL;
+
+ subbuf_offset = pgoff & ((1UL << buffer->subbuf_order) - 1);
+ addr = cpu_buffer->subbuf_ids[subbuf_id] + (subbuf_offset * PAGE_SIZE);
+ page = virt_to_page((void *)addr);
+
+ return page;
+}
+
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ unsigned long reader_size;
+ unsigned long flags;
+
+ cpu_buffer = rb_get_mapped_buffer(buffer, cpu);
+ if (IS_ERR(cpu_buffer))
+ return (int)PTR_ERR(cpu_buffer);
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+consume:
+ if (rb_per_cpu_empty(cpu_buffer))
+ goto out;
+
+ reader_size = rb_page_size(cpu_buffer->reader_page);
+
+ /*
+ * There are data to be read on the current reader page, we can
+ * return to the caller. But before that, we assume the latter will read
+ * everything. Let's update the kernel reader accordingly.
+ */
+ if (cpu_buffer->reader_page->read < reader_size) {
+ while (cpu_buffer->reader_page->read < reader_size)
+ rb_advance_reader(cpu_buffer);
+ goto out;
+ }
+
+ if (WARN_ON(!rb_get_reader_page(cpu_buffer)))
+ goto out;
+
+ goto consume;
+out:
+ /* Some archs do not have data cache coherency between kernel and user-space */
+ flush_dcache_folio(virt_to_folio(cpu_buffer->reader_page->page));
+
+ rb_update_meta_page(cpu_buffer);
+
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ rb_put_mapped_buffer(cpu_buffer);
+
+ return 0;
+}
+
/*
* We only allocate new buffers, never free them if the CPU goes down.
* If we were to free the buffer, then the user would lose any trace that was in
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 08/11] tracing: Add snapshot refcount
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (6 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 07/11] ring-buffer: Introducing ring-buffer mapping functions Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 09/11] tracing: Allow user-space mapping of the ring-buffer Steven Rostedt
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
When a ring-buffer is memory mapped by user-space, no trace or
ring-buffer swap is possible. This means the snapshot feature is
mutually exclusive with the memory mapping. Having a refcount on
snapshot users will help to know if a mapping is possible or not.
Instead of relying on the global trace_types_lock, a new spinlock is
introduced to serialize accesses to trace_array->snapshot. This intends
to allow access to that variable in a context where the mmap lock is
already held.
Link: https://lore.kernel.org/linux-trace-kernel/20240220202310.2489614-4-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 99 ++++++++++++++++++++++++-----
kernel/trace/trace.h | 8 ++-
kernel/trace/trace_events_trigger.c | 58 ++++++++++++-----
3 files changed, 129 insertions(+), 36 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 50fab999e72e..2b922a79c553 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1300,6 +1300,50 @@ static void free_snapshot(struct trace_array *tr)
tr->allocated_snapshot = false;
}
+static int tracing_arm_snapshot_locked(struct trace_array *tr)
+{
+ int ret;
+
+ lockdep_assert_held(&trace_types_lock);
+
+ spin_lock(&tr->snapshot_trigger_lock);
+ if (tr->snapshot == UINT_MAX) {
+ spin_unlock(&tr->snapshot_trigger_lock);
+ return -EBUSY;
+ }
+
+ tr->snapshot++;
+ spin_unlock(&tr->snapshot_trigger_lock);
+
+ ret = tracing_alloc_snapshot_instance(tr);
+ if (ret) {
+ spin_lock(&tr->snapshot_trigger_lock);
+ tr->snapshot--;
+ spin_unlock(&tr->snapshot_trigger_lock);
+ }
+
+ return ret;
+}
+
+int tracing_arm_snapshot(struct trace_array *tr)
+{
+ int ret;
+
+ mutex_lock(&trace_types_lock);
+ ret = tracing_arm_snapshot_locked(tr);
+ mutex_unlock(&trace_types_lock);
+
+ return ret;
+}
+
+void tracing_disarm_snapshot(struct trace_array *tr)
+{
+ spin_lock(&tr->snapshot_trigger_lock);
+ if (!WARN_ON(!tr->snapshot))
+ tr->snapshot--;
+ spin_unlock(&tr->snapshot_trigger_lock);
+}
+
/**
* tracing_alloc_snapshot - allocate snapshot buffer.
*
@@ -1373,10 +1417,6 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data,
mutex_lock(&trace_types_lock);
- ret = tracing_alloc_snapshot_instance(tr);
- if (ret)
- goto fail_unlock;
-
if (tr->current_trace->use_max_tr) {
ret = -EBUSY;
goto fail_unlock;
@@ -1395,6 +1435,10 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data,
goto fail_unlock;
}
+ ret = tracing_arm_snapshot_locked(tr);
+ if (ret)
+ goto fail_unlock;
+
local_irq_disable();
arch_spin_lock(&tr->max_lock);
tr->cond_snapshot = cond_snapshot;
@@ -1439,6 +1483,8 @@ int tracing_snapshot_cond_disable(struct trace_array *tr)
arch_spin_unlock(&tr->max_lock);
local_irq_enable();
+ tracing_disarm_snapshot(tr);
+
return ret;
}
EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable);
@@ -1481,6 +1527,7 @@ int tracing_snapshot_cond_disable(struct trace_array *tr)
}
EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable);
#define free_snapshot(tr) do { } while (0)
+#define tracing_arm_snapshot_locked(tr) ({ -EBUSY; })
#endif /* CONFIG_TRACER_SNAPSHOT */
void tracer_tracing_off(struct trace_array *tr)
@@ -6092,11 +6139,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
*/
synchronize_rcu();
free_snapshot(tr);
+ tracing_disarm_snapshot(tr);
}
- if (t->use_max_tr && !tr->allocated_snapshot) {
- ret = tracing_alloc_snapshot_instance(tr);
- if (ret < 0)
+ if (t->use_max_tr) {
+ ret = tracing_arm_snapshot_locked(tr);
+ if (ret)
goto out;
}
#else
@@ -6105,8 +6153,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (t->init) {
ret = tracer_init(t, tr);
- if (ret)
+ if (ret) {
+#ifdef CONFIG_TRACER_MAX_TRACE
+ if (t->use_max_tr)
+ tracing_disarm_snapshot(tr);
+#endif
goto out;
+ }
}
tr->current_trace = t;
@@ -7208,10 +7261,11 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
if (tr->allocated_snapshot)
ret = resize_buffer_duplicate_size(&tr->max_buffer,
&tr->array_buffer, iter->cpu_file);
- else
- ret = tracing_alloc_snapshot_instance(tr);
- if (ret < 0)
+
+ ret = tracing_arm_snapshot_locked(tr);
+ if (ret)
break;
+
/* Now, we're going to swap */
if (iter->cpu_file == RING_BUFFER_ALL_CPUS) {
local_irq_disable();
@@ -7221,6 +7275,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
smp_call_function_single(iter->cpu_file, tracing_swap_cpu_buffer,
(void *)tr, 1);
}
+ tracing_disarm_snapshot(tr);
break;
default:
if (tr->allocated_snapshot) {
@@ -8345,8 +8400,13 @@ ftrace_trace_snapshot_callback(struct trace_array *tr, struct ftrace_hash *hash,
ops = param ? &snapshot_count_probe_ops : &snapshot_probe_ops;
- if (glob[0] == '!')
- return unregister_ftrace_function_probe_func(glob+1, tr, ops);
+ if (glob[0] == '!') {
+ ret = unregister_ftrace_function_probe_func(glob+1, tr, ops);
+ if (!ret)
+ tracing_disarm_snapshot(tr);
+
+ return ret;
+ }
if (!param)
goto out_reg;
@@ -8365,12 +8425,13 @@ ftrace_trace_snapshot_callback(struct trace_array *tr, struct ftrace_hash *hash,
return ret;
out_reg:
- ret = tracing_alloc_snapshot_instance(tr);
+ ret = tracing_arm_snapshot(tr);
if (ret < 0)
goto out;
ret = register_ftrace_function_probe(glob, tr, ops, count);
-
+ if (ret < 0)
+ tracing_disarm_snapshot(tr);
out:
return ret < 0 ? ret : 0;
}
@@ -9177,7 +9238,9 @@ trace_array_create_systems(const char *name, const char *systems)
raw_spin_lock_init(&tr->start_lock);
tr->max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
-
+#ifdef CONFIG_TRACER_MAX_TRACE
+ spin_lock_init(&tr->snapshot_trigger_lock);
+#endif
tr->current_trace = &nop_trace;
INIT_LIST_HEAD(&tr->systems);
@@ -10147,7 +10210,9 @@ __init static int tracer_alloc_buffers(void)
global_trace.current_trace = &nop_trace;
global_trace.max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
-
+#ifdef CONFIG_TRACER_MAX_TRACE
+ spin_lock_init(&global_trace.snapshot_trigger_lock);
+#endif
ftrace_init_global_array_ops(&global_trace);
init_trace_flags_index(&global_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e4f0714d7a49..64450615ca0c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -334,8 +334,8 @@ struct trace_array {
*/
struct array_buffer max_buffer;
bool allocated_snapshot;
-#endif
-#ifdef CONFIG_TRACER_MAX_TRACE
+ spinlock_t snapshot_trigger_lock;
+ unsigned int snapshot;
unsigned long max_latency;
#ifdef CONFIG_FSNOTIFY
struct dentry *d_max_latency;
@@ -1983,12 +1983,16 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
#ifdef CONFIG_TRACER_SNAPSHOT
void tracing_snapshot_instance(struct trace_array *tr);
int tracing_alloc_snapshot_instance(struct trace_array *tr);
+int tracing_arm_snapshot(struct trace_array *tr);
+void tracing_disarm_snapshot(struct trace_array *tr);
#else
static inline void tracing_snapshot_instance(struct trace_array *tr) { }
static inline int tracing_alloc_snapshot_instance(struct trace_array *tr)
{
return 0;
}
+static inline int tracing_arm_snapshot(struct trace_array *tr) { return 0; }
+static inline void tracing_disarm_snapshot(struct trace_array *tr) { }
#endif
#ifdef CONFIG_PREEMPT_TRACER
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index b33c3861fbbb..62e4f58b8671 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -597,20 +597,12 @@ static int register_trigger(char *glob,
return ret;
}
-/**
- * unregister_trigger - Generic event_command @unreg implementation
- * @glob: The raw string used to register the trigger
- * @test: Trigger-specific data used to find the trigger to remove
- * @file: The trace_event_file associated with the event
- *
- * Common implementation for event trigger unregistration.
- *
- * Usually used directly as the @unreg method in event command
- * implementations.
+/*
+ * True if the trigger was found and unregistered, else false.
*/
-static void unregister_trigger(char *glob,
- struct event_trigger_data *test,
- struct trace_event_file *file)
+static bool try_unregister_trigger(char *glob,
+ struct event_trigger_data *test,
+ struct trace_event_file *file)
{
struct event_trigger_data *data = NULL, *iter;
@@ -626,8 +618,32 @@ static void unregister_trigger(char *glob,
}
}
- if (data && data->ops->free)
- data->ops->free(data);
+ if (data) {
+ if (data->ops->free)
+ data->ops->free(data);
+
+ return true;
+ }
+
+ return false;
+}
+
+/**
+ * unregister_trigger - Generic event_command @unreg implementation
+ * @glob: The raw string used to register the trigger
+ * @test: Trigger-specific data used to find the trigger to remove
+ * @file: The trace_event_file associated with the event
+ *
+ * Common implementation for event trigger unregistration.
+ *
+ * Usually used directly as the @unreg method in event command
+ * implementations.
+ */
+static void unregister_trigger(char *glob,
+ struct event_trigger_data *test,
+ struct trace_event_file *file)
+{
+ try_unregister_trigger(glob, test, file);
}
/*
@@ -1470,7 +1486,7 @@ register_snapshot_trigger(char *glob,
struct event_trigger_data *data,
struct trace_event_file *file)
{
- int ret = tracing_alloc_snapshot_instance(file->tr);
+ int ret = tracing_arm_snapshot(file->tr);
if (ret < 0)
return ret;
@@ -1478,6 +1494,14 @@ register_snapshot_trigger(char *glob,
return register_trigger(glob, data, file);
}
+static void unregister_snapshot_trigger(char *glob,
+ struct event_trigger_data *data,
+ struct trace_event_file *file)
+{
+ if (try_unregister_trigger(glob, data, file))
+ tracing_disarm_snapshot(file->tr);
+}
+
static int
snapshot_trigger_print(struct seq_file *m, struct event_trigger_data *data)
{
@@ -1510,7 +1534,7 @@ static struct event_command trigger_snapshot_cmd = {
.trigger_type = ETT_SNAPSHOT,
.parse = event_trigger_parse,
.reg = register_snapshot_trigger,
- .unreg = unregister_trigger,
+ .unreg = unregister_snapshot_trigger,
.get_trigger_ops = snapshot_get_trigger_ops,
.set_filter = set_trigger_filter,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 09/11] tracing: Allow user-space mapping of the ring-buffer
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (7 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 08/11] tracing: Add snapshot refcount Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 10/11] Documentation: tracing: Add ring-buffer mapping Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 11/11] ring-buffer/selftest: Add ring-buffer mapping test Steven Rostedt
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
Currently, user-space extracts data from the ring-buffer via splice,
which is handy for storage or network sharing. However, due to splice
limitations, it is imposible to do real-time analysis without a copy.
A solution for that problem is to let the user-space map the ring-buffer
directly.
The mapping is exposed via the per-CPU file trace_pipe_raw. The first
element of the mapping is the meta-page. It is followed by each
subbuffer constituting the ring-buffer, ordered by their unique page ID:
* Meta-page -- include/uapi/linux/trace_mmap.h for a description
* Subbuf ID 0
* Subbuf ID 1
...
It is therefore easy to translate a subbuf ID into an offset in the
mapping:
reader_id = meta->reader->id;
reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
When new data is available, the mapper must call a newly introduced ioctl:
TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
point to the next reader containing unread data.
Mapping will prevent snapshot and buffer size modifications.
Link: https://lore.kernel.org/linux-trace-kernel/20240220202310.2489614-5-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/uapi/linux/trace_mmap.h | 2 +
kernel/trace/trace.c | 137 ++++++++++++++++++++++++++++++--
kernel/trace/trace.h | 1 +
3 files changed, 135 insertions(+), 5 deletions(-)
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
index ffcd8dfcaa4f..d25b9d504a7c 100644
--- a/include/uapi/linux/trace_mmap.h
+++ b/include/uapi/linux/trace_mmap.h
@@ -43,4 +43,6 @@ struct trace_buffer_meta {
__u64 Reserved2;
};
+#define TRACE_MMAP_IOCTL_GET_READER _IO('T', 0x1)
+
#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2b922a79c553..8fdd68dbcf6d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1175,6 +1175,12 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
return;
}
+ if (tr->mapped) {
+ trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
+ return;
+ }
+
local_irq_save(flags);
update_max_tr(tr, current, smp_processor_id(), cond_data);
local_irq_restore(flags);
@@ -1307,7 +1313,7 @@ static int tracing_arm_snapshot_locked(struct trace_array *tr)
lockdep_assert_held(&trace_types_lock);
spin_lock(&tr->snapshot_trigger_lock);
- if (tr->snapshot == UINT_MAX) {
+ if (tr->snapshot == UINT_MAX || tr->mapped) {
spin_unlock(&tr->snapshot_trigger_lock);
return -EBUSY;
}
@@ -6032,7 +6038,7 @@ static void tracing_set_nop(struct trace_array *tr)
{
if (tr->current_trace == &nop_trace)
return;
-
+
tr->current_trace->enabled--;
if (tr->current_trace->reset)
@@ -8151,15 +8157,31 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
return ret;
}
-/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
+ int err;
+
+ if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
+ if (!(file->f_flags & O_NONBLOCK)) {
+ err = ring_buffer_wait(iter->array_buffer->buffer,
+ iter->cpu_file,
+ iter->tr->buffer_percent);
+ if (err)
+ return err;
+ }
- if (cmd)
- return -ENOIOCTLCMD;
+ return ring_buffer_map_get_reader(iter->array_buffer->buffer,
+ iter->cpu_file);
+ } else if (cmd) {
+ return -ENOTTY;
+ }
+ /*
+ * An ioctl call with cmd 0 to the ring buffer file will wake up all
+ * waiters
+ */
mutex_lock(&trace_types_lock);
iter->wait_index++;
@@ -8172,6 +8194,110 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned
return 0;
}
+static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf)
+{
+ struct ftrace_buffer_info *info = vmf->vma->vm_file->private_data;
+ struct trace_iterator *iter = &info->iter;
+ vm_fault_t ret = VM_FAULT_SIGBUS;
+ struct page *page;
+
+ page = ring_buffer_map_fault(iter->array_buffer->buffer, iter->cpu_file,
+ vmf->pgoff);
+ if (!page)
+ return ret;
+
+ get_page(page);
+ vmf->page = page;
+ vmf->page->mapping = vmf->vma->vm_file->f_mapping;
+ vmf->page->index = vmf->pgoff;
+
+ return 0;
+}
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+static int get_snapshot_map(struct trace_array *tr)
+{
+ int err = 0;
+
+ /*
+ * Called with mmap_lock held. lockdep would be unhappy if we would now
+ * take trace_types_lock. Instead use the specific
+ * snapshot_trigger_lock.
+ */
+ spin_lock(&tr->snapshot_trigger_lock);
+
+ if (tr->snapshot || tr->mapped == UINT_MAX)
+ err = -EBUSY;
+ else
+ tr->mapped++;
+
+ spin_unlock(&tr->snapshot_trigger_lock);
+
+ /* Wait for update_max_tr() to observe iter->tr->mapped */
+ if (tr->mapped == 1)
+ synchronize_rcu();
+
+ return err;
+
+}
+static void put_snapshot_map(struct trace_array *tr)
+{
+ spin_lock(&tr->snapshot_trigger_lock);
+ if (!WARN_ON(!tr->mapped))
+ tr->mapped--;
+ spin_unlock(&tr->snapshot_trigger_lock);
+}
+#else
+static inline int get_snapshot_map(struct trace_array *tr) { return 0; }
+static inline void put_snapshot_map(struct trace_array *tr) { }
+#endif
+
+static void tracing_buffers_mmap_close(struct vm_area_struct *vma)
+{
+ struct ftrace_buffer_info *info = vma->vm_file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ ring_buffer_unmap(iter->array_buffer->buffer, iter->cpu_file);
+ put_snapshot_map(iter->tr);
+}
+
+static void tracing_buffers_mmap_open(struct vm_area_struct *vma)
+{
+ struct ftrace_buffer_info *info = vma->vm_file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ WARN_ON(ring_buffer_map(iter->array_buffer->buffer, iter->cpu_file));
+}
+
+static const struct vm_operations_struct tracing_buffers_vmops = {
+ .open = tracing_buffers_mmap_open,
+ .close = tracing_buffers_mmap_close,
+ .fault = tracing_buffers_mmap_fault,
+};
+
+static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+ struct ftrace_buffer_info *info = filp->private_data;
+ struct trace_iterator *iter = &info->iter;
+ int ret = 0;
+
+ if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC)
+ return -EPERM;
+
+ vm_flags_mod(vma, VM_DONTCOPY | VM_DONTDUMP, VM_MAYWRITE);
+ vma->vm_ops = &tracing_buffers_vmops;
+
+ ret = get_snapshot_map(iter->tr);
+ if (ret)
+ return ret;
+
+ ret = ring_buffer_map(iter->array_buffer->buffer, iter->cpu_file);
+ if (ret)
+ put_snapshot_map(iter->tr);
+
+ return ret;
+}
+
static const struct file_operations tracing_buffers_fops = {
.open = tracing_buffers_open,
.read = tracing_buffers_read,
@@ -8180,6 +8306,7 @@ static const struct file_operations tracing_buffers_fops = {
.splice_read = tracing_buffers_splice_read,
.unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
+ .mmap = tracing_buffers_mmap,
};
static ssize_t
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 64450615ca0c..749a182dab48 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -336,6 +336,7 @@ struct trace_array {
bool allocated_snapshot;
spinlock_t snapshot_trigger_lock;
unsigned int snapshot;
+ unsigned int mapped;
unsigned long max_latency;
#ifdef CONFIG_FSNOTIFY
struct dentry *d_max_latency;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 10/11] Documentation: tracing: Add ring-buffer mapping
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (8 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 09/11] tracing: Allow user-space mapping of the ring-buffer Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 11/11] ring-buffer/selftest: Add ring-buffer mapping test Steven Rostedt
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
It is now possible to mmap() a ring-buffer to stream its content. Add
some documentation and a code example.
Link: https://lore.kernel.org/linux-trace-kernel/20240220202310.2489614-6-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Documentation/trace/index.rst | 1 +
Documentation/trace/ring-buffer-map.rst | 106 ++++++++++++++++++++++++
2 files changed, 107 insertions(+)
create mode 100644 Documentation/trace/ring-buffer-map.rst
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 5092d6c13af5..0b300901fd75 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -29,6 +29,7 @@ Linux Tracing Technologies
timerlat-tracer
intel_th
ring-buffer-design
+ ring-buffer-map
stm
sys-t
coresight/index
diff --git a/Documentation/trace/ring-buffer-map.rst b/Documentation/trace/ring-buffer-map.rst
new file mode 100644
index 000000000000..0426ab4bcf3d
--- /dev/null
+++ b/Documentation/trace/ring-buffer-map.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Tracefs ring-buffer memory mapping
+==================================
+
+:Author: Vincent Donnefort <vdonnefort@google.com>
+
+Overview
+========
+Tracefs ring-buffer memory map provides an efficient method to stream data
+as no memory copy is necessary. The application mapping the ring-buffer becomes
+then a consumer for that ring-buffer, in a similar fashion to trace_pipe.
+
+Memory mapping setup
+====================
+The mapping works with a mmap() of the trace_pipe_raw interface.
+
+The first system page of the mapping contains ring-buffer statistics and
+description. It is referred as the meta-page. One of the most important field of
+the meta-page is the reader. It contains the sub-buffer ID which can be safely
+read by the mapper (see ring-buffer-design.rst).
+
+The meta-page is followed by all the sub-buffers, ordered by ascendant ID. It is
+therefore effortless to know where the reader starts in the mapping:
+
+.. code-block:: c
+
+ reader_id = meta->reader->id;
+ reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
+
+When the application is done with the current reader, it can get a new one using
+the trace_pipe_raw ioctl() TRACE_MMAP_IOCTL_GET_READER. This ioctl also updates
+the meta-page fields.
+
+Limitations
+===========
+When a mapping is in place on a Tracefs ring-buffer, it is not possible to
+either resize it (either by increasing the entire size of the ring-buffer or
+each subbuf). It is also not possible to use snapshot and causes splice to copy
+the ring buffer data instead of using the copyless swap from the ring buffer.
+
+Concurrent readers (either another application mapping that ring-buffer or the
+kernel with trace_pipe) are allowed but not recommended. They will compete for
+the ring-buffer and the output is unpredictable, just like concurrent readers on
+trace_pipe would be.
+
+Example
+=======
+
+.. code-block:: c
+
+ #include <fcntl.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <unistd.h>
+
+ #include <linux/trace_mmap.h>
+
+ #include <sys/mman.h>
+ #include <sys/ioctl.h>
+
+ #define TRACE_PIPE_RAW "/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw"
+
+ int main(void)
+ {
+ int page_size = getpagesize(), fd, reader_id;
+ unsigned long meta_len, data_len;
+ struct trace_buffer_meta *meta;
+ void *map, *reader, *data;
+
+ fd = open(TRACE_PIPE_RAW, O_RDONLY | O_NONBLOCK);
+ if (fd < 0)
+ exit(EXIT_FAILURE);
+
+ map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
+ if (map == MAP_FAILED)
+ exit(EXIT_FAILURE);
+
+ meta = (struct trace_buffer_meta *)map;
+ meta_len = meta->meta_page_size;
+
+ printf("entries: %llu\n", meta->entries);
+ printf("overrun: %llu\n", meta->overrun);
+ printf("read: %llu\n", meta->read);
+ printf("nr_subbufs: %u\n", meta->nr_subbufs);
+
+ data_len = meta->subbuf_size * meta->nr_subbufs;
+ data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, fd, meta_len);
+ if (data == MAP_FAILED)
+ exit(EXIT_FAILURE);
+
+ if (ioctl(fd, TRACE_MMAP_IOCTL_GET_READER) < 0)
+ exit(EXIT_FAILURE);
+
+ reader_id = meta->reader.id;
+ reader = data + meta->subbuf_size * reader_id;
+
+ printf("Current reader address: %p\n", reader);
+
+ munmap(data, data_len);
+ munmap(meta, meta_len);
+ close (fd);
+
+ return 0;
+ }
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [for-next][PATCH 11/11] ring-buffer/selftest: Add ring-buffer mapping test
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
` (9 preceding siblings ...)
2024-02-21 14:08 ` [for-next][PATCH 10/11] Documentation: tracing: Add ring-buffer mapping Steven Rostedt
@ 2024-02-21 14:08 ` Steven Rostedt
10 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:08 UTC (permalink / raw)
To: linux-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Shuah Khan, Shuah Khan, linux-kselftest, Vincent Donnefort
From: Vincent Donnefort <vdonnefort@google.com>
This test maps a ring-buffer and validate the meta-page after reset and
after emitting few events.
Link: https://lore.kernel.org/linux-trace-kernel/20240220202310.2489614-7-vdonnefort@google.com
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
tools/testing/selftests/ring-buffer/Makefile | 8 +
tools/testing/selftests/ring-buffer/config | 2 +
.../testing/selftests/ring-buffer/map_test.c | 273 ++++++++++++++++++
3 files changed, 283 insertions(+)
create mode 100644 tools/testing/selftests/ring-buffer/Makefile
create mode 100644 tools/testing/selftests/ring-buffer/config
create mode 100644 tools/testing/selftests/ring-buffer/map_test.c
diff --git a/tools/testing/selftests/ring-buffer/Makefile b/tools/testing/selftests/ring-buffer/Makefile
new file mode 100644
index 000000000000..627c5fa6d1ab
--- /dev/null
+++ b/tools/testing/selftests/ring-buffer/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -Wl,-no-as-needed -Wall
+CFLAGS += $(KHDR_INCLUDES)
+CFLAGS += -D_GNU_SOURCE
+
+TEST_GEN_PROGS = map_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/ring-buffer/config b/tools/testing/selftests/ring-buffer/config
new file mode 100644
index 000000000000..d936f8f00e78
--- /dev/null
+++ b/tools/testing/selftests/ring-buffer/config
@@ -0,0 +1,2 @@
+CONFIG_FTRACE=y
+CONFIG_TRACER_SNAPSHOT=y
diff --git a/tools/testing/selftests/ring-buffer/map_test.c b/tools/testing/selftests/ring-buffer/map_test.c
new file mode 100644
index 000000000000..56c44b29d998
--- /dev/null
+++ b/tools/testing/selftests/ring-buffer/map_test.c
@@ -0,0 +1,273 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Ring-buffer memory mapping tests
+ *
+ * Copyright (c) 2024 Vincent Donnefort <vdonnefort@google.com>
+ */
+#include <fcntl.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <linux/trace_mmap.h>
+
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+
+#include "../user_events/user_events_selftests.h" /* share tracefs setup */
+#include "../kselftest_harness.h"
+
+#define TRACEFS_ROOT "/sys/kernel/tracing"
+
+static int __tracefs_write(const char *path, const char *value)
+{
+ int fd, ret;
+
+ fd = open(path, O_WRONLY | O_TRUNC);
+ if (fd < 0)
+ return fd;
+
+ ret = write(fd, value, strlen(value));
+
+ close(fd);
+
+ return ret == -1 ? -errno : 0;
+}
+
+static int __tracefs_write_int(const char *path, int value)
+{
+ char *str;
+ int ret;
+
+ if (asprintf(&str, "%d", value) < 0)
+ return -1;
+
+ ret = __tracefs_write(path, str);
+
+ free(str);
+
+ return ret;
+}
+
+#define tracefs_write_int(path, value) \
+ ASSERT_EQ(__tracefs_write_int((path), (value)), 0)
+
+#define tracefs_write(path, value) \
+ ASSERT_EQ(__tracefs_write((path), (value)), 0)
+
+static int tracefs_reset(void)
+{
+ if (__tracefs_write_int(TRACEFS_ROOT"/tracing_on", 0))
+ return -1;
+ if (__tracefs_write(TRACEFS_ROOT"/trace", ""))
+ return -1;
+ if (__tracefs_write(TRACEFS_ROOT"/set_event", ""))
+ return -1;
+ if (__tracefs_write(TRACEFS_ROOT"/current_tracer", "nop"))
+ return -1;
+
+ return 0;
+}
+
+struct tracefs_cpu_map_desc {
+ struct trace_buffer_meta *meta;
+ void *data;
+ int cpu_fd;
+};
+
+int tracefs_cpu_map(struct tracefs_cpu_map_desc *desc, int cpu)
+{
+ unsigned long meta_len, data_len;
+ int page_size = getpagesize();
+ char *cpu_path;
+ void *map;
+
+ if (asprintf(&cpu_path,
+ TRACEFS_ROOT"/per_cpu/cpu%d/trace_pipe_raw",
+ cpu) < 0)
+ return -ENOMEM;
+
+ desc->cpu_fd = open(cpu_path, O_RDONLY | O_NONBLOCK);
+ free(cpu_path);
+ if (desc->cpu_fd < 0)
+ return -ENODEV;
+
+ map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, desc->cpu_fd, 0);
+ if (map == MAP_FAILED)
+ return -errno;
+
+ desc->meta = (struct trace_buffer_meta *)map;
+
+ meta_len = desc->meta->meta_page_size;
+ data_len = desc->meta->subbuf_size * desc->meta->nr_subbufs;
+
+ map = mmap(NULL, data_len, PROT_READ, MAP_SHARED, desc->cpu_fd, meta_len);
+ if (map == MAP_FAILED) {
+ munmap(desc->meta, desc->meta->meta_page_size);
+ return -EINVAL;
+ }
+
+ desc->data = map;
+
+ return 0;
+}
+
+void tracefs_cpu_unmap(struct tracefs_cpu_map_desc *desc)
+{
+ munmap(desc->data, desc->meta->subbuf_size * desc->meta->nr_subbufs);
+ munmap(desc->meta, desc->meta->meta_page_size);
+ close(desc->cpu_fd);
+}
+
+FIXTURE(map) {
+ struct tracefs_cpu_map_desc map_desc;
+ bool umount;
+};
+
+FIXTURE_VARIANT(map) {
+ int subbuf_size;
+};
+
+FIXTURE_VARIANT_ADD(map, subbuf_size_4k) {
+ .subbuf_size = 4,
+};
+
+FIXTURE_VARIANT_ADD(map, subbuf_size_8k) {
+ .subbuf_size = 8,
+};
+
+FIXTURE_SETUP(map)
+{
+ int cpu = sched_getcpu();
+ cpu_set_t cpu_mask;
+ bool fail, umount;
+ char *message;
+
+ if (!tracefs_enabled(&message, &fail, &umount)) {
+ if (fail) {
+ TH_LOG("Tracefs setup failed: %s", message);
+ ASSERT_FALSE(fail);
+ }
+ SKIP(return, "Skipping: %s", message);
+ }
+
+ self->umount = umount;
+
+ ASSERT_GE(cpu, 0);
+
+ ASSERT_EQ(tracefs_reset(), 0);
+
+ tracefs_write_int(TRACEFS_ROOT"/buffer_subbuf_size_kb", variant->subbuf_size);
+
+ ASSERT_EQ(tracefs_cpu_map(&self->map_desc, cpu), 0);
+
+ /*
+ * Ensure generated events will be found on this very same ring-buffer.
+ */
+ CPU_ZERO(&cpu_mask);
+ CPU_SET(cpu, &cpu_mask);
+ ASSERT_EQ(sched_setaffinity(0, sizeof(cpu_mask), &cpu_mask), 0);
+}
+
+FIXTURE_TEARDOWN(map)
+{
+ tracefs_reset();
+
+ if (self->umount)
+ tracefs_unmount();
+
+ tracefs_cpu_unmap(&self->map_desc);
+}
+
+TEST_F(map, meta_page_check)
+{
+ struct tracefs_cpu_map_desc *desc = &self->map_desc;
+ int cnt = 0;
+
+ ASSERT_EQ(desc->meta->entries, 0);
+ ASSERT_EQ(desc->meta->overrun, 0);
+ ASSERT_EQ(desc->meta->read, 0);
+
+ ASSERT_EQ(desc->meta->reader.id, 0);
+ ASSERT_EQ(desc->meta->reader.read, 0);
+
+ ASSERT_EQ(ioctl(desc->cpu_fd, TRACE_MMAP_IOCTL_GET_READER), 0);
+ ASSERT_EQ(desc->meta->reader.id, 0);
+
+ tracefs_write_int(TRACEFS_ROOT"/tracing_on", 1);
+ for (int i = 0; i < 16; i++)
+ tracefs_write_int(TRACEFS_ROOT"/trace_marker", i);
+again:
+ ASSERT_EQ(ioctl(desc->cpu_fd, TRACE_MMAP_IOCTL_GET_READER), 0);
+
+ ASSERT_EQ(desc->meta->entries, 16);
+ ASSERT_EQ(desc->meta->overrun, 0);
+ ASSERT_EQ(desc->meta->read, 16);
+
+ ASSERT_EQ(desc->meta->reader.id, 1);
+
+ if (!(cnt++))
+ goto again;
+}
+
+FIXTURE(snapshot) {
+ bool umount;
+};
+
+FIXTURE_SETUP(snapshot)
+{
+ bool fail, umount;
+ struct stat sb;
+ char *message;
+
+ if (stat(TRACEFS_ROOT"/snapshot", &sb))
+ SKIP(return, "Skipping: %s", "snapshot not available");
+
+ if (!tracefs_enabled(&message, &fail, &umount)) {
+ if (fail) {
+ TH_LOG("Tracefs setup failed: %s", message);
+ ASSERT_FALSE(fail);
+ }
+ SKIP(return, "Skipping: %s", message);
+ }
+
+ self->umount = umount;
+}
+
+FIXTURE_TEARDOWN(snapshot)
+{
+ __tracefs_write(TRACEFS_ROOT"/events/sched/sched_switch/trigger",
+ "!snapshot");
+ tracefs_reset();
+
+ if (self->umount)
+ tracefs_unmount();
+}
+
+TEST_F(snapshot, excludes_map)
+{
+ struct tracefs_cpu_map_desc map_desc;
+ int cpu = sched_getcpu();
+
+ ASSERT_GE(cpu, 0);
+ tracefs_write(TRACEFS_ROOT"/events/sched/sched_switch/trigger",
+ "snapshot");
+ ASSERT_EQ(tracefs_cpu_map(&map_desc, cpu), -EBUSY);
+}
+
+TEST_F(snapshot, excluded_by_map)
+{
+ struct tracefs_cpu_map_desc map_desc;
+ int cpu = sched_getcpu();
+
+ ASSERT_EQ(tracefs_cpu_map(&map_desc, cpu), 0);
+
+ ASSERT_EQ(__tracefs_write(TRACEFS_ROOT"/events/sched/sched_switch/trigger",
+ "snapshot"), -EBUSY);
+ ASSERT_EQ(__tracefs_write(TRACEFS_ROOT"/snapshot",
+ "1"), -EBUSY);
+}
+
+TEST_HARNESS_MAIN
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-02-21 14:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-21 14:07 [for-next][PATCH 00/11] tracing: Updates for v6.9 Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 01/11] eventfs: Add WARN_ON_ONCE() to checks in eventfs_root_lookup() Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 02/11] eventfs: Create eventfs_root_inode to store dentry Steven Rostedt
2024-02-21 14:07 ` [for-next][PATCH 03/11] tracing: Have saved_cmdlines arrays all in one allocation Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 04/11] tracing: Move open coded processing of tgid_map into helper function Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 05/11] tracing: Move saved_cmdline code into trace_sched_switch.c Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 06/11] ring-buffer: Zero ring-buffer sub-buffers Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 07/11] ring-buffer: Introducing ring-buffer mapping functions Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 08/11] tracing: Add snapshot refcount Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 09/11] tracing: Allow user-space mapping of the ring-buffer Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 10/11] Documentation: tracing: Add ring-buffer mapping Steven Rostedt
2024-02-21 14:08 ` [for-next][PATCH 11/11] ring-buffer/selftest: Add ring-buffer mapping test Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox