* [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory
@ 2025-02-05 22:50 Steven Rostedt
2025-02-05 22:50 ` [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta Steven Rostedt
` (7 more replies)
0 siblings, 8 replies; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
This updates the persistent instance to record what modules were
loaded and what addresses they were loaded at.
First the KASLR offset is recorded in the persistent ring buffer instead of
a text address. This can be used to calculated the address offset.
Next the persistent memory is divided up differently so that there's
a single global meta data for the entire buffer that can hold the
global data, and each per CPU meta data can just hold what it needs.
As the single global meta data is a full page, anything left over in
that page that the ring buffer doesn't need can be used by the tracing
system to store its own data.
As the KASLR offset is only needed by the tracer, that data is moved
from the ring buffer meta data into this new storage.
Next the modules that are loaded and where they are loaded is stored in this
new persistent storage.
The module list along with the KASLR offset is now exposed in the
last_boot_info if the buffer is from a previous boot. If it is from the
current boot, the file will only contain:
Offset: current
in order to not leak the KASLR offset.
Finally, when new modules are loaded while the trace is active, they too
will be added to this persistent memory. Note, if tracing is stopped, and
then restarted, it clears the module list and will reload all the modules
again so that it doesn't need to keep track of what is loaded or unloaded
while no tracing is going on.
Steven Rostedt (8):
ring-buffer: Use kaslr address instead of text delta
ring-buffer: Add buffer meta data for persistent ring buffer
ring-buffer: Add ring_buffer_meta_scratch()
tracing: Have persistent trace instances save KASLR offset
module: Add module_for_each_mod() function
tracing: Have persistent trace instances save module addresses
tracing: Show module names and addresses of last boot
tracing: Update modules to persistent instances when loaded
----
include/linux/module.h | 6 +
include/linux/ring_buffer.h | 3 +-
kernel/module/main.c | 14 +++
kernel/trace/ring_buffer.c | 216 ++++++++++++++++++++----------------
kernel/trace/trace.c | 259 ++++++++++++++++++++++++++++++++++++++++----
kernel/trace/trace.h | 15 ++-
kernel/trace/trace_events.c | 40 +++++--
7 files changed, 418 insertions(+), 135 deletions(-)
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 0:32 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer Steven Rostedt
` (6 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
Instead of saving off the text and data pointers and using them to compare
with the current boot's text and data pointers, just save off the KASLR
offset. Then that can be used to figure out how to read the previous boots
buffer.
The last_boot_info will now show this offset, but only if it is for a
previous boot:
# cat instances/boot_mapped/last_boot_info
Offset: 39000000
# echo function > instances/boot_mapped/current_tracer
# cat instances/boot_mapped/last_boot_info
Offset: current
If the KASLR offset saved is for the current boot, the last_boot_info will
show the value of "current".
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/ring_buffer.h | 3 +--
kernel/trace/ring_buffer.c | 31 ++++++++++++-------------------
kernel/trace/trace.c | 30 +++++++++++++++++++++---------
kernel/trace/trace.h | 9 +++++----
4 files changed, 39 insertions(+), 34 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 17fbb7855295..8de035f4f0d9 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -94,8 +94,7 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
unsigned long range_size,
struct lock_class_key *key);
-bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
- long *data);
+bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr);
/*
* Because the ring buffer is generic, if other users of the ring buffer get
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b8e0ae15ca5b..7146b780176f 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -31,6 +31,7 @@
#include <asm/local64.h>
#include <asm/local.h>
+#include <asm/setup.h>
#include "trace.h"
@@ -49,8 +50,7 @@ static void update_pages_handler(struct work_struct *work);
struct ring_buffer_meta {
int magic;
int struct_size;
- unsigned long text_addr;
- unsigned long data_addr;
+ unsigned long kaslr_addr;
unsigned long first_buffer;
unsigned long head_buffer;
unsigned long commit_buffer;
@@ -550,8 +550,7 @@ struct trace_buffer {
unsigned long range_addr_start;
unsigned long range_addr_end;
- long last_text_delta;
- long last_data_delta;
+ unsigned long kaslr_addr;
unsigned int subbuf_size;
unsigned int subbuf_order;
@@ -1874,16 +1873,13 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
}
}
-/* Used to calculate data delta */
-static char rb_data_ptr[] = "";
-
-#define THIS_TEXT_PTR ((unsigned long)rb_meta_init_text_addr)
-#define THIS_DATA_PTR ((unsigned long)rb_data_ptr)
-
static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
{
- meta->text_addr = THIS_TEXT_PTR;
- meta->data_addr = THIS_DATA_PTR;
+#ifdef CONFIG_RANDOMIZE_BASE
+ meta->kaslr_addr = kaslr_offset();
+#else
+ meta->kaslr_addr = 0;
+#endif
}
static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
@@ -1906,8 +1902,7 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
meta->first_buffer += delta;
meta->head_buffer += delta;
meta->commit_buffer += delta;
- buffer->last_text_delta = THIS_TEXT_PTR - meta->text_addr;
- buffer->last_data_delta = THIS_DATA_PTR - meta->data_addr;
+ buffer->kaslr_addr = meta->kaslr_addr;
continue;
}
@@ -2459,17 +2454,15 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
*
* Returns: The true if the delta is non zero
*/
-bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
- long *data)
+bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr)
{
if (!buffer)
return false;
- if (!buffer->last_text_delta)
+ if (!buffer->kaslr_addr)
return false;
- *text = buffer->last_text_delta;
- *data = buffer->last_data_delta;
+ *kaslr_addr = buffer->kaslr_addr;
return true;
}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1496a5ac33ae..a9e8eaf1d47e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -50,7 +50,7 @@
#include <linux/irq_work.h>
#include <linux/workqueue.h>
-#include <asm/setup.h> /* COMMAND_LINE_SIZE */
+#include <asm/setup.h> /* COMMAND_LINE_SIZE and kaslr_offset() */
#include "trace.h"
#include "trace_output.h"
@@ -4193,7 +4193,7 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter)
* safe to use if the array has delta offsets
* Force printing via the fields.
*/
- if ((tr->text_delta || tr->data_delta) &&
+ if ((tr->text_delta) &&
event->type > __TRACE_LAST_TYPE)
return print_event_fields(iter, event);
@@ -5996,7 +5996,7 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
static void update_last_data(struct trace_array *tr)
{
- if (!tr->text_delta && !tr->data_delta)
+ if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
return;
/*
@@ -6009,7 +6009,8 @@ static void update_last_data(struct trace_array *tr)
/* Using current data now */
tr->text_delta = 0;
- tr->data_delta = 0;
+
+ tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
}
/**
@@ -6827,8 +6828,17 @@ tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t
seq_buf_init(&seq, buf, 64);
- seq_buf_printf(&seq, "text delta:\t%ld\n", tr->text_delta);
- seq_buf_printf(&seq, "data delta:\t%ld\n", tr->data_delta);
+ /*
+ * Do not leak KASLR address. This only shows the KASLR address of
+ * the last boot. When the ring buffer is started, the LAST_BOOT
+ * flag gets cleared, and this should only report "current".
+ * Otherwise it shows the KASLR address from the previous boot which
+ * should not be the same as the current boot.
+ */
+ if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
+ seq_buf_puts(&seq, "Offset: current\n");
+ else
+ seq_buf_printf(&seq, "Offset: %lx\n", tr->kaslr_addr);
return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq));
}
@@ -9212,8 +9222,10 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
tr->range_addr_start,
tr->range_addr_size);
- ring_buffer_last_boot_delta(buf->buffer,
- &tr->text_delta, &tr->data_delta);
+#ifdef CONFIG_RANDOMIZE_BASE
+ if (ring_buffer_last_boot_delta(buf->buffer, &tr->kaslr_addr))
+ tr->text_delta = kaslr_offset() - tr->kaslr_addr;
+#endif
/*
* This is basically the same as a mapped buffer,
* with the same restrictions.
@@ -10461,7 +10473,7 @@ __init static void enable_instances(void)
* to it.
*/
if (start) {
- tr->flags |= TRACE_ARRAY_FL_BOOT;
+ tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT;
tr->ref++;
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9c21ba45b7af..abe8169c3e87 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -348,8 +348,8 @@ struct trace_array {
unsigned int mapped;
unsigned long range_addr_start;
unsigned long range_addr_size;
+ unsigned long kaslr_addr;
long text_delta;
- long data_delta;
struct trace_pid_list __rcu *filtered_pids;
struct trace_pid_list __rcu *filtered_no_pids;
@@ -433,9 +433,10 @@ struct trace_array {
};
enum {
- TRACE_ARRAY_FL_GLOBAL = BIT(0),
- TRACE_ARRAY_FL_BOOT = BIT(1),
- TRACE_ARRAY_FL_MOD_INIT = BIT(2),
+ TRACE_ARRAY_FL_GLOBAL = BIT(0),
+ TRACE_ARRAY_FL_BOOT = BIT(1),
+ TRACE_ARRAY_FL_LAST_BOOT = BIT(2),
+ TRACE_ARRAY_FL_MOD_INIT = BIT(3),
};
#ifdef CONFIG_MODULES
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
2025-02-05 22:50 ` [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 5:10 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 3/8] ring-buffer: Add ring_buffer_meta_scratch() Steven Rostedt
` (5 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
Instead of just having a meta data at the first page of each sub buffer
that has duplicate data, add a new meta page to the entire block of memory
that holds the duplicate data and remove it from the sub buffer meta data.
This will open up the extra memory in this first page to be used by the
tracer for its own persistent data.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 165 ++++++++++++++++++++++++++-----------
1 file changed, 115 insertions(+), 50 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7146b780176f..0446d053dbd6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -49,7 +49,12 @@ static void update_pages_handler(struct work_struct *work);
struct ring_buffer_meta {
int magic;
- int struct_size;
+ int struct_sizes;
+ unsigned long total_size;
+ unsigned long buffers_offset;
+};
+
+struct ring_buffer_cpu_meta {
unsigned long kaslr_addr;
unsigned long first_buffer;
unsigned long head_buffer;
@@ -517,7 +522,7 @@ struct ring_buffer_per_cpu {
struct mutex mapping_lock;
unsigned long *subbuf_ids; /* ID to subbuf VA */
struct trace_buffer_meta *meta_page;
- struct ring_buffer_meta *ring_meta;
+ struct ring_buffer_cpu_meta *ring_meta;
/* ring buffer pages to update, > 0 to add, < 0 to remove */
long nr_pages_to_update;
@@ -550,6 +555,8 @@ struct trace_buffer {
unsigned long range_addr_start;
unsigned long range_addr_end;
+ struct ring_buffer_meta *meta;
+
unsigned long kaslr_addr;
unsigned int subbuf_size;
@@ -1270,7 +1277,7 @@ static void rb_head_page_activate(struct ring_buffer_per_cpu *cpu_buffer)
rb_set_list_to_head(head->list.prev);
if (cpu_buffer->ring_meta) {
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
meta->head_buffer = (unsigned long)head->page;
}
}
@@ -1568,7 +1575,7 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
static unsigned long
rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
{
- addr += sizeof(struct ring_buffer_meta) +
+ addr += sizeof(struct ring_buffer_cpu_meta) +
sizeof(int) * nr_subbufs;
return ALIGN(addr, subbuf_size);
}
@@ -1579,19 +1586,22 @@ rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
{
int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
- unsigned long ptr = buffer->range_addr_start;
- struct ring_buffer_meta *meta;
+ struct ring_buffer_cpu_meta *meta;
+ struct ring_buffer_meta *bmeta;
+ unsigned long ptr;
int nr_subbufs;
- if (!ptr)
+ bmeta = buffer->meta;
+ if (!bmeta)
return NULL;
+ ptr = (unsigned long)bmeta + bmeta->buffers_offset;
+ meta = (struct ring_buffer_cpu_meta *)ptr;
+
/* When nr_pages passed in is zero, the first meta has already been initialized */
if (!nr_pages) {
- meta = (struct ring_buffer_meta *)ptr;
nr_subbufs = meta->nr_subbufs;
} else {
- meta = NULL;
/* Include the reader page */
nr_subbufs = nr_pages + 1;
}
@@ -1623,7 +1633,7 @@ static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
}
/* Return the start of subbufs given the meta pointer */
-static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
+static void *rb_subbufs_from_meta(struct ring_buffer_cpu_meta *meta)
{
int subbuf_size = meta->subbuf_size;
unsigned long ptr;
@@ -1639,7 +1649,7 @@ static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
*/
static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
{
- struct ring_buffer_meta *meta;
+ struct ring_buffer_cpu_meta *meta;
unsigned long ptr;
int subbuf_size;
@@ -1664,14 +1674,73 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
return (void *)ptr;
}
+/*
+ * See if the existing memory contains a valid meta section.
+ * if so, use that, otherwise initialize it.
+ */
+static bool rb_meta_init(struct trace_buffer *buffer)
+{
+ unsigned long ptr = buffer->range_addr_start;
+ struct ring_buffer_meta *bmeta;
+ unsigned long total_size;
+ int struct_sizes;
+
+ bmeta = (struct ring_buffer_meta *)ptr;
+ buffer->meta = bmeta;
+
+ total_size = buffer->range_addr_end - buffer->range_addr_start;
+
+ struct_sizes = sizeof(struct ring_buffer_cpu_meta);
+ struct_sizes |= sizeof(*bmeta) << 16;
+
+ /* The first buffer will start one page after the meta page */
+ ptr += sizeof(*bmeta);
+ ptr = ALIGN(ptr, PAGE_SIZE);
+
+ if (bmeta->magic != RING_BUFFER_META_MAGIC) {
+ pr_info("Ring buffer boot meta mismatch of magic\n");
+ goto init;
+ }
+
+ if (bmeta->struct_sizes != struct_sizes) {
+ pr_info("Ring buffer boot meta mismatch of struct size\n");
+ goto init;
+ }
+
+ if (bmeta->total_size != total_size) {
+ pr_info("Ring buffer boot meta mismatch of total size\n");
+ goto init;
+ }
+
+ if (bmeta->buffers_offset > bmeta->total_size) {
+ pr_info("Ring buffer boot meta mismatch of offset outside of total size\n");
+ goto init;
+ }
+
+ if (bmeta->buffers_offset != (void *)ptr - (void *)bmeta) {
+ pr_info("Ring buffer boot meta mismatch of first buffer offset\n");
+ goto init;
+ }
+
+ return true;
+
+ init:
+ bmeta->magic = RING_BUFFER_META_MAGIC;
+ bmeta->struct_sizes = struct_sizes;
+ bmeta->total_size = total_size;
+ bmeta->buffers_offset = (void *)ptr - (void *)bmeta;
+
+ return false;
+}
+
/*
* See if the existing memory contains valid ring buffer data.
* As the previous kernel must be the same as this kernel, all
* the calculations (size of buffers and number of buffers)
* must be the same.
*/
-static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
- struct trace_buffer *buffer, int nr_pages)
+static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
+ struct trace_buffer *buffer, int nr_pages)
{
int subbuf_size = PAGE_SIZE;
struct buffer_data_page *subbuf;
@@ -1679,20 +1748,6 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
unsigned long buffers_end;
int i;
- /* Check the meta magic and meta struct size */
- if (meta->magic != RING_BUFFER_META_MAGIC ||
- meta->struct_size != sizeof(*meta)) {
- pr_info("Ring buffer boot meta[%d] mismatch of magic or struct size\n", cpu);
- return false;
- }
-
- /* The subbuffer's size and number of subbuffers must match */
- if (meta->subbuf_size != subbuf_size ||
- meta->nr_subbufs != nr_pages + 1) {
- pr_info("Ring buffer boot meta [%d] mismatch of subbuf_size/nr_pages\n", cpu);
- return false;
- }
-
buffers_start = meta->first_buffer;
buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
@@ -1730,7 +1785,7 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
return true;
}
-static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+static int rb_meta_subbuf_idx(struct ring_buffer_cpu_meta *meta, void *subbuf);
static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu,
unsigned long long *timestamp, u64 *delta_ptr)
@@ -1797,7 +1852,7 @@ static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
/* If the meta data has been validated, now validate the events */
static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
{
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
struct buffer_page *head_page;
unsigned long entry_bytes = 0;
unsigned long entries = 0;
@@ -1873,7 +1928,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
}
}
-static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
+static void rb_meta_init_text_addr(struct ring_buffer_cpu_meta *meta)
{
#ifdef CONFIG_RANDOMIZE_BASE
meta->kaslr_addr = kaslr_offset();
@@ -1884,18 +1939,25 @@ static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
{
- struct ring_buffer_meta *meta;
+ struct ring_buffer_cpu_meta *meta;
+ struct ring_buffer_meta *bmeta;
unsigned long delta;
void *subbuf;
+ bool valid = false;
int cpu;
int i;
+ if (rb_meta_init(buffer))
+ valid = true;
+
+ bmeta = buffer->meta;
+
for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
void *next_meta;
meta = rb_range_meta(buffer, nr_pages, cpu);
- if (rb_meta_valid(meta, cpu, buffer, nr_pages)) {
+ if (valid && rb_cpu_meta_valid(meta, cpu, buffer, nr_pages)) {
/* Make the mappings match the current address */
subbuf = rb_subbufs_from_meta(meta);
delta = (unsigned long)subbuf - meta->first_buffer;
@@ -1913,9 +1975,6 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
memset(meta, 0, next_meta - (void *)meta);
- meta->magic = RING_BUFFER_META_MAGIC;
- meta->struct_size = sizeof(*meta);
-
meta->nr_subbufs = nr_pages + 1;
meta->subbuf_size = PAGE_SIZE;
@@ -1943,7 +2002,7 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
static void *rbm_start(struct seq_file *m, loff_t *pos)
{
struct ring_buffer_per_cpu *cpu_buffer = m->private;
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
unsigned long val;
if (!meta)
@@ -1968,7 +2027,7 @@ static void *rbm_next(struct seq_file *m, void *v, loff_t *pos)
static int rbm_show(struct seq_file *m, void *v)
{
struct ring_buffer_per_cpu *cpu_buffer = m->private;
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
unsigned long val = (unsigned long)v;
if (val == 1) {
@@ -2017,7 +2076,7 @@ int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, in
static void rb_meta_buffer_update(struct ring_buffer_per_cpu *cpu_buffer,
struct buffer_page *bpage)
{
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
if (meta->head_buffer == (unsigned long)bpage->page)
cpu_buffer->head_page = bpage;
@@ -2032,7 +2091,7 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
long nr_pages, struct list_head *pages)
{
struct trace_buffer *buffer = cpu_buffer->buffer;
- struct ring_buffer_meta *meta = NULL;
+ struct ring_buffer_cpu_meta *meta = NULL;
struct buffer_page *bpage, *tmp;
bool user_thread = current->mm != NULL;
gfp_t mflags;
@@ -2156,7 +2215,7 @@ static struct ring_buffer_per_cpu *
rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
- struct ring_buffer_meta *meta;
+ struct ring_buffer_cpu_meta *meta;
struct buffer_page *bpage;
struct page *page;
int ret;
@@ -2327,10 +2386,16 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
/* If start/end are specified, then that overrides size */
if (start && end) {
+ unsigned long buffers_start;
unsigned long ptr;
int n;
size = end - start;
+
+ /* Subtract the buffer meta data */
+ size -= PAGE_SIZE;
+ buffers_start = start + PAGE_SIZE;
+
size = size / nr_cpu_ids;
/*
@@ -2340,7 +2405,7 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
* needed, plus account for the integer array index that
* will be appended to the meta data.
*/
- nr_pages = (size - sizeof(struct ring_buffer_meta)) /
+ nr_pages = (size - sizeof(struct ring_buffer_cpu_meta)) /
(subbuf_size + sizeof(int));
/* Need at least two pages plus the reader page */
if (nr_pages < 3)
@@ -2348,8 +2413,8 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
again:
/* Make sure that the size fits aligned */
- for (n = 0, ptr = start; n < nr_cpu_ids; n++) {
- ptr += sizeof(struct ring_buffer_meta) +
+ for (n = 0, ptr = buffers_start; n < nr_cpu_ids; n++) {
+ ptr += sizeof(struct ring_buffer_cpu_meta) +
sizeof(int) * nr_pages;
ptr = ALIGN(ptr, subbuf_size);
ptr += subbuf_size * nr_pages;
@@ -3075,7 +3140,7 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
}
/* Return the index into the sub-buffers for a given sub-buffer */
-static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
+static int rb_meta_subbuf_idx(struct ring_buffer_cpu_meta *meta, void *subbuf)
{
void *subbuf_array;
@@ -3087,7 +3152,7 @@ static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
struct buffer_page *next_page)
{
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
unsigned long old_head = (unsigned long)next_page->page;
unsigned long new_head;
@@ -3104,7 +3169,7 @@ static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
static void rb_update_meta_reader(struct ring_buffer_per_cpu *cpu_buffer,
struct buffer_page *reader)
{
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
void *old_reader = cpu_buffer->reader_page->page;
void *new_reader = reader->page;
int id;
@@ -3733,7 +3798,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
rb_page_write(cpu_buffer->commit_page));
rb_inc_page(&cpu_buffer->commit_page);
if (cpu_buffer->ring_meta) {
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
meta->commit_buffer = (unsigned long)cpu_buffer->commit_page->page;
}
/* add barrier to keep gcc from optimizing too much */
@@ -5986,7 +6051,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
if (cpu_buffer->mapped) {
rb_update_meta_page(cpu_buffer);
if (cpu_buffer->ring_meta) {
- struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+ struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
meta->commit_buffer = meta->head_buffer;
}
}
@@ -6020,7 +6085,7 @@ static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
- struct ring_buffer_meta *meta;
+ struct ring_buffer_cpu_meta *meta;
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
@@ -6058,7 +6123,7 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
{
struct ring_buffer_per_cpu *cpu_buffer;
- struct ring_buffer_meta *meta;
+ struct ring_buffer_cpu_meta *meta;
int cpu;
/* prevent another thread from changing buffer sizes */
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 3/8] ring-buffer: Add ring_buffer_meta_scratch()
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
2025-02-05 22:50 ` [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta Steven Rostedt
2025-02-05 22:50 ` [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 5:13 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset Steven Rostedt
` (4 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
Now that there's one page at the start of the persistent memory used by
the ring buffer, make the part of that page that is not used by the ring
buffer available for the tracer to use. This will allow the tracer to
store its own persistent data.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 13 +++++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 8de035f4f0d9..b95f940fd07a 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -95,6 +95,7 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
struct lock_class_key *key);
bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr);
+void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size);
/*
* Because the ring buffer is generic, if other users of the ring buffer get
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0446d053dbd6..5a81ff785665 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1730,6 +1730,9 @@ static bool rb_meta_init(struct trace_buffer *buffer)
bmeta->total_size = total_size;
bmeta->buffers_offset = (void *)ptr - (void *)bmeta;
+ /* Zero out the scatch pad */
+ memset((void *)bmeta + sizeof(*bmeta), 0, PAGE_SIZE - sizeof(*bmeta));
+
return false;
}
@@ -2532,6 +2535,16 @@ bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kas
return true;
}
+void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size)
+{
+ if (!buffer || !buffer->meta)
+ return NULL;
+
+ *size = PAGE_SIZE - sizeof(*buffer->meta);
+
+ return (void *)buffer->meta + sizeof(*buffer->meta);
+}
+
/**
* ring_buffer_free - free a ring buffer.
* @buffer: the buffer to free.
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
` (2 preceding siblings ...)
2025-02-05 22:50 ` [PATCH 3/8] ring-buffer: Add ring_buffer_meta_scratch() Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 5:22 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 5/8] module: Add module_for_each_mod() function Steven Rostedt
` (3 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
There's no reason to save the KASLR offset for the ring buffer itself.
That is used by the tracer. Now that the tracer has a way to save data in
the persistent memory of the ring buffer, have the tracing infrastructure
take care of the saving of the KASLR offset.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/ring_buffer.h | 1 -
kernel/trace/ring_buffer.c | 47 -------------------------------------
kernel/trace/trace.c | 38 ++++++++++++++++++++++++++----
kernel/trace/trace.h | 6 +++--
4 files changed, 38 insertions(+), 54 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b95f940fd07a..d6d9c94e8d8a 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -94,7 +94,6 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
unsigned long range_size,
struct lock_class_key *key);
-bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr);
void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size);
/*
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a81ff785665..a42406287281 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -55,7 +55,6 @@ struct ring_buffer_meta {
};
struct ring_buffer_cpu_meta {
- unsigned long kaslr_addr;
unsigned long first_buffer;
unsigned long head_buffer;
unsigned long commit_buffer;
@@ -557,8 +556,6 @@ struct trace_buffer {
struct ring_buffer_meta *meta;
- unsigned long kaslr_addr;
-
unsigned int subbuf_size;
unsigned int subbuf_order;
unsigned int max_data_size;
@@ -1931,15 +1928,6 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
}
}
-static void rb_meta_init_text_addr(struct ring_buffer_cpu_meta *meta)
-{
-#ifdef CONFIG_RANDOMIZE_BASE
- meta->kaslr_addr = kaslr_offset();
-#else
- meta->kaslr_addr = 0;
-#endif
-}
-
static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
{
struct ring_buffer_cpu_meta *meta;
@@ -1967,7 +1955,6 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
meta->first_buffer += delta;
meta->head_buffer += delta;
meta->commit_buffer += delta;
- buffer->kaslr_addr = meta->kaslr_addr;
continue;
}
@@ -1984,7 +1971,6 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
subbuf = rb_subbufs_from_meta(meta);
meta->first_buffer = (unsigned long)subbuf;
- rb_meta_init_text_addr(meta);
/*
* The buffers[] array holds the order of the sub-buffers
@@ -2514,27 +2500,6 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
return alloc_buffer(size, flags, order, start, start + range_size, key);
}
-/**
- * ring_buffer_last_boot_delta - return the delta offset from last boot
- * @buffer: The buffer to return the delta from
- * @text: Return text delta
- * @data: Return data delta
- *
- * Returns: The true if the delta is non zero
- */
-bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr)
-{
- if (!buffer)
- return false;
-
- if (!buffer->kaslr_addr)
- return false;
-
- *kaslr_addr = buffer->kaslr_addr;
-
- return true;
-}
-
void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size)
{
if (!buffer || !buffer->meta)
@@ -6098,7 +6063,6 @@ static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
- struct ring_buffer_cpu_meta *meta;
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
@@ -6117,11 +6081,6 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&cpu_buffer->resize_disabled);
- /* Make sure persistent meta now uses this buffer's addresses */
- meta = rb_range_meta(buffer, 0, cpu_buffer->cpu);
- if (meta)
- rb_meta_init_text_addr(meta);
-
mutex_unlock(&buffer->mutex);
}
EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
@@ -6136,7 +6095,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
{
struct ring_buffer_per_cpu *cpu_buffer;
- struct ring_buffer_cpu_meta *meta;
int cpu;
/* prevent another thread from changing buffer sizes */
@@ -6164,11 +6122,6 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
reset_disabled_cpu_buffer(cpu_buffer);
- /* Make sure persistent meta now uses this buffer's addresses */
- meta = rb_range_meta(buffer, 0, cpu_buffer->cpu);
- if (meta)
- rb_meta_init_text_addr(meta);
-
atomic_dec(&cpu_buffer->record_disabled);
atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled);
}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a9e8eaf1d47e..cb9f8e6878a0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5994,8 +5994,14 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
return ret;
}
+struct trace_scratch {
+ unsigned long kaslr_addr;
+};
+
static void update_last_data(struct trace_array *tr)
{
+ struct trace_scratch *tscratch;
+
if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
return;
@@ -6010,6 +6016,17 @@ static void update_last_data(struct trace_array *tr)
/* Using current data now */
tr->text_delta = 0;
+ if (!tr->scratch)
+ return;
+
+ tscratch = tr->scratch;
+
+ /* Set the persistent ring buffer meta data to this address */
+#ifdef CONFIG_RANDOMIZE_BASE
+ tscratch->kaslr_addr = kaslr_offset();
+#else
+ tscratch->kaslr_addr = 0;
+#endif
tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
}
@@ -6823,6 +6840,7 @@ static ssize_t
tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
{
struct trace_array *tr = filp->private_data;
+ struct trace_scratch *tscratch = tr->scratch;
struct seq_buf seq;
char buf[64];
@@ -6835,10 +6853,10 @@ tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t
* Otherwise it shows the KASLR address from the previous boot which
* should not be the same as the current boot.
*/
- if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
+ if (!tscratch || !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
seq_buf_puts(&seq, "Offset: current\n");
else
- seq_buf_printf(&seq, "Offset: %lx\n", tr->kaslr_addr);
+ seq_buf_printf(&seq, "Offset: %lx\n", tscratch->kaslr_addr);
return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq));
}
@@ -9212,6 +9230,8 @@ static int
allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size)
{
enum ring_buffer_flags rb_flags;
+ unsigned int scratch_size;
+ void *scratch;
rb_flags = tr->trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0;
@@ -9222,10 +9242,20 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
tr->range_addr_start,
tr->range_addr_size);
+ scratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
+ if (scratch) {
+ tr->scratch = scratch;
+ tr->scratch_size = scratch_size;
+
#ifdef CONFIG_RANDOMIZE_BASE
- if (ring_buffer_last_boot_delta(buf->buffer, &tr->kaslr_addr))
- tr->text_delta = kaslr_offset() - tr->kaslr_addr;
+ {
+ struct trace_scratch *tscratch = tr->scratch;
+
+ if (tscratch->kaslr_addr)
+ tr->text_delta = kaslr_offset() - tscratch->kaslr_addr;
+ }
#endif
+ }
/*
* This is basically the same as a mapped buffer,
* with the same restrictions.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index abe8169c3e87..3a020fb82a34 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -348,8 +348,11 @@ struct trace_array {
unsigned int mapped;
unsigned long range_addr_start;
unsigned long range_addr_size;
- unsigned long kaslr_addr;
long text_delta;
+ void *scratch; /* pointer in persistent memory */
+ int scratch_size;
+
+ int buffer_disabled;
struct trace_pid_list __rcu *filtered_pids;
struct trace_pid_list __rcu *filtered_no_pids;
@@ -367,7 +370,6 @@ struct trace_array {
* CONFIG_TRACER_MAX_TRACE.
*/
arch_spinlock_t max_lock;
- int buffer_disabled;
#ifdef CONFIG_FTRACE_SYSCALLS
int sys_refcount_enter;
int sys_refcount_exit;
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
` (3 preceding siblings ...)
2025-02-05 22:50 ` [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 5:28 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 6/8] tracing: Have persistent trace instances save module addresses Steven Rostedt
` (2 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Daniel Gomez,
linux-modules
From: Steven Rostedt <rostedt@goodmis.org>
The tracing system needs a way to save all the currently loaded modules
and their addresses into persistent memory so that it can evaluate the
addresses on a reboot from a crash. When the persistent memory trace
starts, it will load the module addresses and names into the persistent
memory. To do so, it will call the module_for_each_mod() function and pass
it a function and data structure to get called on each loaded module. Then
it can record the memory.
This only implements that function.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: linux-modules@vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/module.h | 6 ++++++
kernel/module/main.c | 14 ++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/module.h b/include/linux/module.h
index 23792d5d7b74..9e1f42f03511 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -779,6 +779,8 @@ static inline void *module_writable_address(struct module *mod, void *loc)
return __module_writable_address(mod, loc);
}
+void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data);
+
#else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr)
@@ -891,6 +893,10 @@ static inline void *module_writable_address(struct module *mod, void *loc)
{
return loc;
}
+
+static inline void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
+{
+}
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 1fb9ad289a6f..ea1fe313fb89 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3809,6 +3809,20 @@ bool is_module_text_address(unsigned long addr)
return ret;
}
+void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
+{
+ struct module *mod;
+
+ preempt_disable();
+ list_for_each_entry_rcu(mod, &modules, list) {
+ if (mod->state == MODULE_STATE_UNFORMED)
+ continue;
+ if (func(mod, data))
+ break;
+ }
+ preempt_enable();
+}
+
/**
* __module_text_address() - get the module whose code contains an address.
* @addr: the address.
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 6/8] tracing: Have persistent trace instances save module addresses
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
` (4 preceding siblings ...)
2025-02-05 22:50 ` [PATCH 5/8] module: Add module_for_each_mod() function Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 8:26 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 7/8] tracing: Show module names and addresses of last boot Steven Rostedt
2025-02-05 22:50 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
For trace instances that are mapped to persistent memory, have them use
the scratch area to save the currently loaded modules. This will allow
where the modules have been loaded on the next boot so that their
addresses can be deciphered by using where they were loaded previously.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 99 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 87 insertions(+), 12 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index cb9f8e6878a0..a8e5f7ac2193 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5994,14 +5994,60 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
return ret;
}
+struct trace_mod_entry {
+ unsigned long mod_addr;
+ char mod_name[MODULE_NAME_LEN];
+};
+
struct trace_scratch {
unsigned long kaslr_addr;
+ unsigned long nr_entries;
+ struct trace_mod_entry entries[];
};
+static int save_mod(struct module *mod, void *data)
+{
+ struct trace_array *tr = data;
+ struct trace_scratch *tscratch;
+ struct trace_mod_entry *entry;
+ unsigned int size;
+
+ tscratch = tr->scratch;
+ if (!tscratch)
+ return -1;
+ size = tr->scratch_size;
+
+ if (struct_size(tscratch, entries, tscratch->nr_entries + 1) > size)
+ return -1;
+
+ entry = &tscratch->entries[tscratch->nr_entries];
+
+ tscratch->nr_entries++;
+
+ entry->mod_addr = (unsigned long)mod->mem[MOD_TEXT].base;
+ strscpy(entry->mod_name, mod->name);
+
+ return 0;
+}
+
static void update_last_data(struct trace_array *tr)
{
struct trace_scratch *tscratch;
+ if (!(tr->flags & TRACE_ARRAY_FL_BOOT))
+ return;
+
+ /* Reset the module list and reload them */
+ if (tr->scratch) {
+ struct trace_scratch *tscratch = tr->scratch;
+
+ memset(tscratch->entries, 0,
+ flex_array_size(tscratch, entries, tscratch->nr_entries));
+ tscratch->nr_entries = 0;
+
+ module_for_each_mod(save_mod, tr);
+ }
+
if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
return;
@@ -9226,6 +9272,46 @@ static struct dentry *trace_instance_dir;
static void
init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer);
+static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned int size)
+{
+ struct trace_scratch *tscratch = scratch;
+ struct trace_mod_entry *entry;
+
+ if (!scratch)
+ return;
+
+ tr->scratch = scratch;
+ tr->scratch_size = size;
+
+#ifdef CONFIG_RANDOMIZE_BASE
+ if (tscratch->kaslr_addr)
+ tr->text_delta = kaslr_offset() - tscratch->kaslr_addr;
+#endif
+
+ if (struct_size(tscratch, entries, tscratch->nr_entries) > size)
+ goto reset;
+
+ /* Check if each module name is a valid string */
+ for (int i = 0; i < tscratch->nr_entries; i++) {
+ int n;
+
+ entry = &tscratch->entries[i];
+
+ for (n = 0; n < MODULE_NAME_LEN; n++) {
+ if (entry->mod_name[n] == '\0')
+ break;
+ if (!isprint(entry->mod_name[n]))
+ goto reset;
+ }
+ if (n == MODULE_NAME_LEN)
+ goto reset;
+ }
+ return;
+ reset:
+ /* Invalid trace modules */
+ memset(scratch, 0, size);
+}
+
static int
allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size)
{
@@ -9243,19 +9329,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
tr->range_addr_size);
scratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
- if (scratch) {
- tr->scratch = scratch;
- tr->scratch_size = scratch_size;
+ setup_trace_scratch(tr, scratch, scratch_size);
-#ifdef CONFIG_RANDOMIZE_BASE
- {
- struct trace_scratch *tscratch = tr->scratch;
-
- if (tscratch->kaslr_addr)
- tr->text_delta = kaslr_offset() - tscratch->kaslr_addr;
- }
-#endif
- }
/*
* This is basically the same as a mapped buffer,
* with the same restrictions.
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 7/8] tracing: Show module names and addresses of last boot
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
` (5 preceding siblings ...)
2025-02-05 22:50 ` [PATCH 6/8] tracing: Have persistent trace instances save module addresses Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-07 1:51 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
Add the last boot module's names and addresses to the last_boot_info file.
This only shows the module information from a previous boot. If the buffer
is started and is recording the current boot, this file still will only
show "current".
# cat instances/boot_mapped/last_boot_info
Offset: 10c00000
ffffffffc00ca000 usb_serial_simple
ffffffffc00ae000 usbserial
ffffffffc008b000 bfq
# echo function > instances/boot_mapped/current_tracer
# cat instances/boot_mapped/last_boot_info
Offset: current
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 103 +++++++++++++++++++++++++++++++++++++------
1 file changed, 90 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a8e5f7ac2193..7b4027804cd4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6005,6 +6005,8 @@ struct trace_scratch {
struct trace_mod_entry entries[];
};
+static DEFINE_MUTEX(scratch_mutex);
+
static int save_mod(struct module *mod, void *data)
{
struct trace_array *tr = data;
@@ -6012,6 +6014,8 @@ static int save_mod(struct module *mod, void *data)
struct trace_mod_entry *entry;
unsigned int size;
+ guard(mutex)(&scratch_mutex);
+
tscratch = tr->scratch;
if (!tscratch)
return -1;
@@ -6882,15 +6886,47 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
-static ssize_t
-tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
+#define LAST_BOOT_HEADER ((void *)1)
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
{
- struct trace_array *tr = filp->private_data;
+ struct trace_array *tr = m->private;
struct trace_scratch *tscratch = tr->scratch;
- struct seq_buf seq;
- char buf[64];
+ unsigned int index = *pos;
+
+ (*pos)++;
+
+ if (*pos == 1)
+ return LAST_BOOT_HEADER;
+
+ /* Only show offsets of the last boot data */
+ if (!tscratch || !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
+ return NULL;
+
+ /* *pos 0 is for the header, 1 is for the first module */
+ index--;
+
+ if (index >= tscratch->nr_entries)
+ return NULL;
- seq_buf_init(&seq, buf, 64);
+ return &tscratch->entries[index];
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+ mutex_lock(&scratch_mutex);
+
+ return l_next(m, NULL, pos);
+}
+
+static void l_stop(struct seq_file *m, void *p)
+{
+ mutex_unlock(&scratch_mutex);
+}
+
+static void show_last_boot_header(struct seq_file *m, struct trace_array *tr)
+{
+ struct trace_scratch *tscratch = tr->scratch;
/*
* Do not leak KASLR address. This only shows the KASLR address of
@@ -6900,11 +6936,52 @@ tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t
* should not be the same as the current boot.
*/
if (!tscratch || !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
- seq_buf_puts(&seq, "Offset: current\n");
+ seq_puts(m, "Offset: current\n");
else
- seq_buf_printf(&seq, "Offset: %lx\n", tscratch->kaslr_addr);
+ seq_printf(m, "Offset: %lx\n", tscratch->kaslr_addr);
+}
- return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq));
+static int l_show(struct seq_file *m, void *v)
+{
+ struct trace_array *tr = m->private;
+ struct trace_mod_entry *entry = v;
+
+ if (v == LAST_BOOT_HEADER) {
+ show_last_boot_header(m, tr);
+ return 0;
+ }
+
+ seq_printf(m, "%lx %s\n", entry->mod_addr, entry->mod_name);
+ return 0;
+}
+
+static const struct seq_operations last_boot_seq_ops = {
+ .start = l_start,
+ .next = l_next,
+ .stop = l_stop,
+ .show = l_show,
+};
+
+static int tracing_last_boot_open(struct inode *inode, struct file *file)
+{
+ struct trace_array *tr = inode->i_private;
+ struct seq_file *m;
+ int ret;
+
+ ret = tracing_check_open_get_tr(tr);
+ if (ret)
+ return ret;
+
+ ret = seq_open(file, &last_boot_seq_ops);
+ if (ret) {
+ trace_array_put(tr);
+ return ret;
+ }
+
+ m = file->private_data;
+ m->private = tr;
+
+ return 0;
}
static int tracing_buffer_meta_open(struct inode *inode, struct file *filp)
@@ -7533,10 +7610,10 @@ static const struct file_operations trace_time_stamp_mode_fops = {
};
static const struct file_operations last_boot_fops = {
- .open = tracing_open_generic_tr,
- .read = tracing_last_boot_read,
- .llseek = generic_file_llseek,
- .release = tracing_release_generic_tr,
+ .open = tracing_last_boot_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = tracing_seq_release,
};
#ifdef CONFIG_TRACER_SNAPSHOT
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 8/8] tracing: Update modules to persistent instances when loaded
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
` (6 preceding siblings ...)
2025-02-05 22:50 ` [PATCH 7/8] tracing: Show module names and addresses of last boot Steven Rostedt
@ 2025-02-05 22:50 ` Steven Rostedt
2025-02-06 10:01 ` Masami Hiramatsu
7 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-05 22:50 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton
From: Steven Rostedt <rostedt@goodmis.org>
When a module is loaded and a persistent buffer is actively tracing, add
it to the list of modules in the persistent memory.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 25 +++++++++++++++++++++++
kernel/trace/trace.h | 2 ++
kernel/trace/trace_events.c | 40 ++++++++++++++++++++++++++-----------
3 files changed, 55 insertions(+), 12 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7b4027804cd4..443f2bc5b856 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10088,6 +10088,30 @@ static void trace_module_remove_evals(struct module *mod)
static inline void trace_module_remove_evals(struct module *mod) { }
#endif /* CONFIG_TRACE_EVAL_MAP_FILE */
+static bool trace_array_active(struct trace_array *tr)
+{
+ if (tr->current_trace != &nop_trace)
+ return true;
+
+ /* 0 is no events, 1 is all disabled */
+ return trace_events_enabled(tr, NULL) > 1;
+}
+
+static void trace_module_record(struct module *mod)
+{
+ struct trace_array *tr;
+
+ list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+ /* Update any persistent trace array that has already been started */
+ if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
+ TRACE_ARRAY_FL_BOOT) {
+ /* Only update if the trace array is active */
+ if (trace_array_active(tr))
+ save_mod(mod, tr);
+ }
+ }
+}
+
static int trace_module_notify(struct notifier_block *self,
unsigned long val, void *data)
{
@@ -10096,6 +10120,7 @@ static int trace_module_notify(struct notifier_block *self,
switch (val) {
case MODULE_STATE_COMING:
trace_module_add_evals(mod);
+ trace_module_record(mod);
break;
case MODULE_STATE_GOING:
trace_module_remove_evals(mod);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3a020fb82a34..90493220c362 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -786,6 +786,8 @@ extern void trace_find_cmdline(int pid, char comm[]);
extern int trace_find_tgid(int pid);
extern void trace_event_follow_fork(struct trace_array *tr, bool enable);
+extern int trace_events_enabled(struct trace_array *tr, const char *system);
+
#ifdef CONFIG_DYNAMIC_FTRACE
extern unsigned long ftrace_update_tot_cnt;
extern unsigned long ftrace_number_of_pages;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 4cb275316e51..107767afe0ab 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1811,28 +1811,28 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
return cnt;
}
-static ssize_t
-system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
- loff_t *ppos)
+/*
+ * Returns:
+ * 0 : no events exist?
+ * 1 : all events are disabled
+ * 2 : all events are enabled
+ * 3 : some events are enabled and some are enabled
+ */
+int trace_events_enabled(struct trace_array *tr, const char *system)
{
- const char set_to_char[4] = { '?', '0', '1', 'X' };
- struct trace_subsystem_dir *dir = filp->private_data;
- struct event_subsystem *system = dir->subsystem;
struct trace_event_call *call;
struct trace_event_file *file;
- struct trace_array *tr = dir->tr;
- char buf[2];
int set = 0;
- int ret;
- mutex_lock(&event_mutex);
+ guard(mutex)(&event_mutex);
+
list_for_each_entry(file, &tr->events, list) {
call = file->event_call;
if ((call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) ||
!trace_event_name(call) || !call->class || !call->class->reg)
continue;
- if (system && strcmp(call->class->system, system->name) != 0)
+ if (system && strcmp(call->class->system, system) != 0)
continue;
/*
@@ -1848,7 +1848,23 @@ system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
if (set == 3)
break;
}
- mutex_unlock(&event_mutex);
+
+ return set;
+}
+
+static ssize_t
+system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
+ loff_t *ppos)
+{
+ const char set_to_char[4] = { '?', '0', '1', 'X' };
+ struct trace_subsystem_dir *dir = filp->private_data;
+ struct event_subsystem *system = dir->subsystem;
+ struct trace_array *tr = dir->tr;
+ char buf[2];
+ int set;
+ int ret;
+
+ set = trace_events_enabled(tr, system ? system->name : NULL);
buf[0] = set_to_char[set];
buf[1] = '\n';
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta
2025-02-05 22:50 ` [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta Steven Rostedt
@ 2025-02-06 0:32 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 0:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:32 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Instead of saving off the text and data pointers and using them to compare
> with the current boot's text and data pointers, just save off the KASLR
> offset. Then that can be used to figure out how to read the previous boots
> buffer.
>
> The last_boot_info will now show this offset, but only if it is for a
> previous boot:
>
> # cat instances/boot_mapped/last_boot_info
> Offset: 39000000
>
> # echo function > instances/boot_mapped/current_tracer
> # cat instances/boot_mapped/last_boot_info
> Offset: current
>
> If the KASLR offset saved is for the current boot, the last_boot_info will
> show the value of "current".
Looks good to me.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Thank you!
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> include/linux/ring_buffer.h | 3 +--
> kernel/trace/ring_buffer.c | 31 ++++++++++++-------------------
> kernel/trace/trace.c | 30 +++++++++++++++++++++---------
> kernel/trace/trace.h | 9 +++++----
> 4 files changed, 39 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 17fbb7855295..8de035f4f0d9 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -94,8 +94,7 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
> unsigned long range_size,
> struct lock_class_key *key);
>
> -bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
> - long *data);
> +bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr);
>
> /*
> * Because the ring buffer is generic, if other users of the ring buffer get
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index b8e0ae15ca5b..7146b780176f 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -31,6 +31,7 @@
>
> #include <asm/local64.h>
> #include <asm/local.h>
> +#include <asm/setup.h>
>
> #include "trace.h"
>
> @@ -49,8 +50,7 @@ static void update_pages_handler(struct work_struct *work);
> struct ring_buffer_meta {
> int magic;
> int struct_size;
> - unsigned long text_addr;
> - unsigned long data_addr;
> + unsigned long kaslr_addr;
> unsigned long first_buffer;
> unsigned long head_buffer;
> unsigned long commit_buffer;
> @@ -550,8 +550,7 @@ struct trace_buffer {
> unsigned long range_addr_start;
> unsigned long range_addr_end;
>
> - long last_text_delta;
> - long last_data_delta;
> + unsigned long kaslr_addr;
>
> unsigned int subbuf_size;
> unsigned int subbuf_order;
> @@ -1874,16 +1873,13 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
> }
> }
>
> -/* Used to calculate data delta */
> -static char rb_data_ptr[] = "";
> -
> -#define THIS_TEXT_PTR ((unsigned long)rb_meta_init_text_addr)
> -#define THIS_DATA_PTR ((unsigned long)rb_data_ptr)
> -
> static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
> {
> - meta->text_addr = THIS_TEXT_PTR;
> - meta->data_addr = THIS_DATA_PTR;
> +#ifdef CONFIG_RANDOMIZE_BASE
> + meta->kaslr_addr = kaslr_offset();
> +#else
> + meta->kaslr_addr = 0;
> +#endif
> }
>
> static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> @@ -1906,8 +1902,7 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> meta->first_buffer += delta;
> meta->head_buffer += delta;
> meta->commit_buffer += delta;
> - buffer->last_text_delta = THIS_TEXT_PTR - meta->text_addr;
> - buffer->last_data_delta = THIS_DATA_PTR - meta->data_addr;
> + buffer->kaslr_addr = meta->kaslr_addr;
> continue;
> }
>
> @@ -2459,17 +2454,15 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
> *
> * Returns: The true if the delta is non zero
> */
> -bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
> - long *data)
> +bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr)
> {
> if (!buffer)
> return false;
>
> - if (!buffer->last_text_delta)
> + if (!buffer->kaslr_addr)
> return false;
>
> - *text = buffer->last_text_delta;
> - *data = buffer->last_data_delta;
> + *kaslr_addr = buffer->kaslr_addr;
>
> return true;
> }
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 1496a5ac33ae..a9e8eaf1d47e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -50,7 +50,7 @@
> #include <linux/irq_work.h>
> #include <linux/workqueue.h>
>
> -#include <asm/setup.h> /* COMMAND_LINE_SIZE */
> +#include <asm/setup.h> /* COMMAND_LINE_SIZE and kaslr_offset() */
>
> #include "trace.h"
> #include "trace_output.h"
> @@ -4193,7 +4193,7 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter)
> * safe to use if the array has delta offsets
> * Force printing via the fields.
> */
> - if ((tr->text_delta || tr->data_delta) &&
> + if ((tr->text_delta) &&
> event->type > __TRACE_LAST_TYPE)
> return print_event_fields(iter, event);
>
> @@ -5996,7 +5996,7 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
>
> static void update_last_data(struct trace_array *tr)
> {
> - if (!tr->text_delta && !tr->data_delta)
> + if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> return;
>
> /*
> @@ -6009,7 +6009,8 @@ static void update_last_data(struct trace_array *tr)
>
> /* Using current data now */
> tr->text_delta = 0;
> - tr->data_delta = 0;
> +
> + tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
> }
>
> /**
> @@ -6827,8 +6828,17 @@ tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t
>
> seq_buf_init(&seq, buf, 64);
>
> - seq_buf_printf(&seq, "text delta:\t%ld\n", tr->text_delta);
> - seq_buf_printf(&seq, "data delta:\t%ld\n", tr->data_delta);
> + /*
> + * Do not leak KASLR address. This only shows the KASLR address of
> + * the last boot. When the ring buffer is started, the LAST_BOOT
> + * flag gets cleared, and this should only report "current".
> + * Otherwise it shows the KASLR address from the previous boot which
> + * should not be the same as the current boot.
> + */
> + if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> + seq_buf_puts(&seq, "Offset: current\n");
> + else
> + seq_buf_printf(&seq, "Offset: %lx\n", tr->kaslr_addr);
>
> return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq));
> }
> @@ -9212,8 +9222,10 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
> tr->range_addr_start,
> tr->range_addr_size);
>
> - ring_buffer_last_boot_delta(buf->buffer,
> - &tr->text_delta, &tr->data_delta);
> +#ifdef CONFIG_RANDOMIZE_BASE
> + if (ring_buffer_last_boot_delta(buf->buffer, &tr->kaslr_addr))
> + tr->text_delta = kaslr_offset() - tr->kaslr_addr;
> +#endif
> /*
> * This is basically the same as a mapped buffer,
> * with the same restrictions.
> @@ -10461,7 +10473,7 @@ __init static void enable_instances(void)
> * to it.
> */
> if (start) {
> - tr->flags |= TRACE_ARRAY_FL_BOOT;
> + tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT;
> tr->ref++;
> }
>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 9c21ba45b7af..abe8169c3e87 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -348,8 +348,8 @@ struct trace_array {
> unsigned int mapped;
> unsigned long range_addr_start;
> unsigned long range_addr_size;
> + unsigned long kaslr_addr;
> long text_delta;
> - long data_delta;
>
> struct trace_pid_list __rcu *filtered_pids;
> struct trace_pid_list __rcu *filtered_no_pids;
> @@ -433,9 +433,10 @@ struct trace_array {
> };
>
> enum {
> - TRACE_ARRAY_FL_GLOBAL = BIT(0),
> - TRACE_ARRAY_FL_BOOT = BIT(1),
> - TRACE_ARRAY_FL_MOD_INIT = BIT(2),
> + TRACE_ARRAY_FL_GLOBAL = BIT(0),
> + TRACE_ARRAY_FL_BOOT = BIT(1),
> + TRACE_ARRAY_FL_LAST_BOOT = BIT(2),
> + TRACE_ARRAY_FL_MOD_INIT = BIT(3),
> };
>
> #ifdef CONFIG_MODULES
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer
2025-02-05 22:50 ` [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer Steven Rostedt
@ 2025-02-06 5:10 ` Masami Hiramatsu
2025-02-06 15:19 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 5:10 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:33 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Instead of just having a meta data at the first page of each sub buffer
> that has duplicate data, add a new meta page to the entire block of memory
> that holds the duplicate data and remove it from the sub buffer meta data.
>
> This will open up the extra memory in this first page to be used by the
> tracer for its own persistent data.
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> kernel/trace/ring_buffer.c | 165 ++++++++++++++++++++++++++-----------
> 1 file changed, 115 insertions(+), 50 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 7146b780176f..0446d053dbd6 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -49,7 +49,12 @@ static void update_pages_handler(struct work_struct *work);
>
Can you add a comment to explain the layout of these meta-data and subbufs?
> struct ring_buffer_meta {
> int magic;
> - int struct_size;
> + int struct_sizes;
> + unsigned long total_size;
> + unsigned long buffers_offset;
> +};
> +
> +struct ring_buffer_cpu_meta {
> unsigned long kaslr_addr;
> unsigned long first_buffer;
> unsigned long head_buffer;
> @@ -517,7 +522,7 @@ struct ring_buffer_per_cpu {
> struct mutex mapping_lock;
> unsigned long *subbuf_ids; /* ID to subbuf VA */
> struct trace_buffer_meta *meta_page;
> - struct ring_buffer_meta *ring_meta;
> + struct ring_buffer_cpu_meta *ring_meta;
>
> /* ring buffer pages to update, > 0 to add, < 0 to remove */
> long nr_pages_to_update;
> @@ -550,6 +555,8 @@ struct trace_buffer {
> unsigned long range_addr_start;
> unsigned long range_addr_end;
>
> + struct ring_buffer_meta *meta;
> +
> unsigned long kaslr_addr;
>
> unsigned int subbuf_size;
> @@ -1270,7 +1277,7 @@ static void rb_head_page_activate(struct ring_buffer_per_cpu *cpu_buffer)
> rb_set_list_to_head(head->list.prev);
>
> if (cpu_buffer->ring_meta) {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> meta->head_buffer = (unsigned long)head->page;
> }
> }
> @@ -1568,7 +1575,7 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
> static unsigned long
> rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
> {
> - addr += sizeof(struct ring_buffer_meta) +
> + addr += sizeof(struct ring_buffer_cpu_meta) +
> sizeof(int) * nr_subbufs;
> return ALIGN(addr, subbuf_size);
> }
> @@ -1579,19 +1586,22 @@ rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
> static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
> {
> int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
> - unsigned long ptr = buffer->range_addr_start;
> - struct ring_buffer_meta *meta;
> + struct ring_buffer_cpu_meta *meta;
> + struct ring_buffer_meta *bmeta;
> + unsigned long ptr;
> int nr_subbufs;
>
> - if (!ptr)
> + bmeta = buffer->meta;
> + if (!bmeta)
> return NULL;
>
> + ptr = (unsigned long)bmeta + bmeta->buffers_offset;
> + meta = (struct ring_buffer_cpu_meta *)ptr;
> +
> /* When nr_pages passed in is zero, the first meta has already been initialized */
> if (!nr_pages) {
> - meta = (struct ring_buffer_meta *)ptr;
> nr_subbufs = meta->nr_subbufs;
> } else {
> - meta = NULL;
> /* Include the reader page */
> nr_subbufs = nr_pages + 1;
> }
> @@ -1623,7 +1633,7 @@ static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
> }
>
> /* Return the start of subbufs given the meta pointer */
> -static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
> +static void *rb_subbufs_from_meta(struct ring_buffer_cpu_meta *meta)
> {
> int subbuf_size = meta->subbuf_size;
> unsigned long ptr;
> @@ -1639,7 +1649,7 @@ static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
> */
> static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
> {
> - struct ring_buffer_meta *meta;
> + struct ring_buffer_cpu_meta *meta;
> unsigned long ptr;
> int subbuf_size;
>
> @@ -1664,14 +1674,73 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
> return (void *)ptr;
> }
>
> +/*
> + * See if the existing memory contains a valid meta section.
> + * if so, use that, otherwise initialize it.
Note that this must be called only for the persistent ring buffer.
> + */
> +static bool rb_meta_init(struct trace_buffer *buffer)
> +{
> + unsigned long ptr = buffer->range_addr_start;
> + struct ring_buffer_meta *bmeta;
> + unsigned long total_size;
> + int struct_sizes;
> +
> + bmeta = (struct ring_buffer_meta *)ptr;
> + buffer->meta = bmeta;
> +
> + total_size = buffer->range_addr_end - buffer->range_addr_start;
> +
> + struct_sizes = sizeof(struct ring_buffer_cpu_meta);
> + struct_sizes |= sizeof(*bmeta) << 16;
> +
> + /* The first buffer will start one page after the meta page */
> + ptr += sizeof(*bmeta);
> + ptr = ALIGN(ptr, PAGE_SIZE);
> +
> + if (bmeta->magic != RING_BUFFER_META_MAGIC) {
> + pr_info("Ring buffer boot meta mismatch of magic\n");
> + goto init;
> + }
> +
> + if (bmeta->struct_sizes != struct_sizes) {
> + pr_info("Ring buffer boot meta mismatch of struct size\n");
> + goto init;
> + }
> +
> + if (bmeta->total_size != total_size) {
> + pr_info("Ring buffer boot meta mismatch of total size\n");
> + goto init;
> + }
> +
> + if (bmeta->buffers_offset > bmeta->total_size) {
> + pr_info("Ring buffer boot meta mismatch of offset outside of total size\n");
> + goto init;
> + }
> +
> + if (bmeta->buffers_offset != (void *)ptr - (void *)bmeta) {
> + pr_info("Ring buffer boot meta mismatch of first buffer offset\n");
> + goto init;
> + }
> +
> + return true;
> +
> + init:
> + bmeta->magic = RING_BUFFER_META_MAGIC;
> + bmeta->struct_sizes = struct_sizes;
> + bmeta->total_size = total_size;
> + bmeta->buffers_offset = (void *)ptr - (void *)bmeta;
> +
> + return false;
> +}
> +
> /*
> * See if the existing memory contains valid ring buffer data.
> * As the previous kernel must be the same as this kernel, all
> * the calculations (size of buffers and number of buffers)
> * must be the same.
> */
> -static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
> - struct trace_buffer *buffer, int nr_pages)
> +static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
> + struct trace_buffer *buffer, int nr_pages)
> {
> int subbuf_size = PAGE_SIZE;
> struct buffer_data_page *subbuf;
> @@ -1679,20 +1748,6 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
> unsigned long buffers_end;
> int i;
>
> - /* Check the meta magic and meta struct size */
> - if (meta->magic != RING_BUFFER_META_MAGIC ||
> - meta->struct_size != sizeof(*meta)) {
> - pr_info("Ring buffer boot meta[%d] mismatch of magic or struct size\n", cpu);
> - return false;
> - }
> -
> - /* The subbuffer's size and number of subbuffers must match */
> - if (meta->subbuf_size != subbuf_size ||
> - meta->nr_subbufs != nr_pages + 1) {
> - pr_info("Ring buffer boot meta [%d] mismatch of subbuf_size/nr_pages\n", cpu);
> - return false;
> - }
> -
> buffers_start = meta->first_buffer;
> buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
>
> @@ -1730,7 +1785,7 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
> return true;
> }
>
> -static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
> +static int rb_meta_subbuf_idx(struct ring_buffer_cpu_meta *meta, void *subbuf);
>
> static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu,
> unsigned long long *timestamp, u64 *delta_ptr)
> @@ -1797,7 +1852,7 @@ static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
> /* If the meta data has been validated, now validate the events */
> static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
> {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> struct buffer_page *head_page;
> unsigned long entry_bytes = 0;
> unsigned long entries = 0;
> @@ -1873,7 +1928,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
> }
> }
>
> -static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
> +static void rb_meta_init_text_addr(struct ring_buffer_cpu_meta *meta)
> {
> #ifdef CONFIG_RANDOMIZE_BASE
> meta->kaslr_addr = kaslr_offset();
> @@ -1884,18 +1939,25 @@ static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
>
> static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> {
> - struct ring_buffer_meta *meta;
> + struct ring_buffer_cpu_meta *meta;
> + struct ring_buffer_meta *bmeta;
> unsigned long delta;
> void *subbuf;
> + bool valid = false;
> int cpu;
> int i;
>
> + if (rb_meta_init(buffer))
> + valid = true;
> +
> + bmeta = buffer->meta;
> +
> for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
> void *next_meta;
>
> meta = rb_range_meta(buffer, nr_pages, cpu);
>
> - if (rb_meta_valid(meta, cpu, buffer, nr_pages)) {
> + if (valid && rb_cpu_meta_valid(meta, cpu, buffer, nr_pages)) {
> /* Make the mappings match the current address */
> subbuf = rb_subbufs_from_meta(meta);
> delta = (unsigned long)subbuf - meta->first_buffer;
> @@ -1913,9 +1975,6 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
>
> memset(meta, 0, next_meta - (void *)meta);
>
> - meta->magic = RING_BUFFER_META_MAGIC;
> - meta->struct_size = sizeof(*meta);
> -
> meta->nr_subbufs = nr_pages + 1;
> meta->subbuf_size = PAGE_SIZE;
>
> @@ -1943,7 +2002,7 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> static void *rbm_start(struct seq_file *m, loff_t *pos)
> {
> struct ring_buffer_per_cpu *cpu_buffer = m->private;
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> unsigned long val;
>
> if (!meta)
> @@ -1968,7 +2027,7 @@ static void *rbm_next(struct seq_file *m, void *v, loff_t *pos)
> static int rbm_show(struct seq_file *m, void *v)
> {
> struct ring_buffer_per_cpu *cpu_buffer = m->private;
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> unsigned long val = (unsigned long)v;
>
> if (val == 1) {
> @@ -2017,7 +2076,7 @@ int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, in
> static void rb_meta_buffer_update(struct ring_buffer_per_cpu *cpu_buffer,
> struct buffer_page *bpage)
> {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
>
> if (meta->head_buffer == (unsigned long)bpage->page)
> cpu_buffer->head_page = bpage;
> @@ -2032,7 +2091,7 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> long nr_pages, struct list_head *pages)
> {
> struct trace_buffer *buffer = cpu_buffer->buffer;
> - struct ring_buffer_meta *meta = NULL;
> + struct ring_buffer_cpu_meta *meta = NULL;
> struct buffer_page *bpage, *tmp;
> bool user_thread = current->mm != NULL;
> gfp_t mflags;
> @@ -2156,7 +2215,7 @@ static struct ring_buffer_per_cpu *
> rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
> {
> struct ring_buffer_per_cpu *cpu_buffer;
> - struct ring_buffer_meta *meta;
> + struct ring_buffer_cpu_meta *meta;
> struct buffer_page *bpage;
> struct page *page;
> int ret;
> @@ -2327,10 +2386,16 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
>
> /* If start/end are specified, then that overrides size */
> if (start && end) {
> + unsigned long buffers_start;
> unsigned long ptr;
> int n;
>
> size = end - start;
> +
> + /* Subtract the buffer meta data */
> + size -= PAGE_SIZE;
> + buffers_start = start + PAGE_SIZE;
> +
So the buffer meta data is PAGE_SIZE aligned?
> size = size / nr_cpu_ids;
>
> /*
> @@ -2340,7 +2405,7 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
> * needed, plus account for the integer array index that
> * will be appended to the meta data.
> */
> - nr_pages = (size - sizeof(struct ring_buffer_meta)) /
> + nr_pages = (size - sizeof(struct ring_buffer_cpu_meta)) /
> (subbuf_size + sizeof(int));
> /* Need at least two pages plus the reader page */
> if (nr_pages < 3)
> @@ -2348,8 +2413,8 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
>
> again:
> /* Make sure that the size fits aligned */
> - for (n = 0, ptr = start; n < nr_cpu_ids; n++) {
> - ptr += sizeof(struct ring_buffer_meta) +
> + for (n = 0, ptr = buffers_start; n < nr_cpu_ids; n++) {
> + ptr += sizeof(struct ring_buffer_cpu_meta) +
> sizeof(int) * nr_pages;
> ptr = ALIGN(ptr, subbuf_size);
> ptr += subbuf_size * nr_pages;
> @@ -3075,7 +3140,7 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
> }
>
> /* Return the index into the sub-buffers for a given sub-buffer */
> -static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
> +static int rb_meta_subbuf_idx(struct ring_buffer_cpu_meta *meta, void *subbuf)
> {
> void *subbuf_array;
>
> @@ -3087,7 +3152,7 @@ static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
> static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
> struct buffer_page *next_page)
> {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> unsigned long old_head = (unsigned long)next_page->page;
> unsigned long new_head;
>
> @@ -3104,7 +3169,7 @@ static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
> static void rb_update_meta_reader(struct ring_buffer_per_cpu *cpu_buffer,
> struct buffer_page *reader)
> {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> void *old_reader = cpu_buffer->reader_page->page;
> void *new_reader = reader->page;
> int id;
> @@ -3733,7 +3798,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
> rb_page_write(cpu_buffer->commit_page));
> rb_inc_page(&cpu_buffer->commit_page);
> if (cpu_buffer->ring_meta) {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> meta->commit_buffer = (unsigned long)cpu_buffer->commit_page->page;
> }
> /* add barrier to keep gcc from optimizing too much */
> @@ -5986,7 +6051,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> if (cpu_buffer->mapped) {
> rb_update_meta_page(cpu_buffer);
> if (cpu_buffer->ring_meta) {
> - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> meta->commit_buffer = meta->head_buffer;
> }
> }
> @@ -6020,7 +6085,7 @@ static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
> void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
> {
> struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
> - struct ring_buffer_meta *meta;
> + struct ring_buffer_cpu_meta *meta;
>
> if (!cpumask_test_cpu(cpu, buffer->cpumask))
> return;
> @@ -6058,7 +6123,7 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
> void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
> {
> struct ring_buffer_per_cpu *cpu_buffer;
> - struct ring_buffer_meta *meta;
> + struct ring_buffer_cpu_meta *meta;
> int cpu;
>
> /* prevent another thread from changing buffer sizes */
Thank you,
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/8] ring-buffer: Add ring_buffer_meta_scratch()
2025-02-05 22:50 ` [PATCH 3/8] ring-buffer: Add ring_buffer_meta_scratch() Steven Rostedt
@ 2025-02-06 5:13 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 5:13 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:34 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Now that there's one page at the start of the persistent memory used by
> the ring buffer, make the part of that page that is not used by the ring
> buffer available for the tracer to use. This will allow the tracer to
> store its own persistent data.
Looks good to me.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Thanks,
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> include/linux/ring_buffer.h | 1 +
> kernel/trace/ring_buffer.c | 13 +++++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 8de035f4f0d9..b95f940fd07a 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -95,6 +95,7 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
> struct lock_class_key *key);
>
> bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr);
> +void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size);
>
> /*
> * Because the ring buffer is generic, if other users of the ring buffer get
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 0446d053dbd6..5a81ff785665 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1730,6 +1730,9 @@ static bool rb_meta_init(struct trace_buffer *buffer)
> bmeta->total_size = total_size;
> bmeta->buffers_offset = (void *)ptr - (void *)bmeta;
>
> + /* Zero out the scatch pad */
> + memset((void *)bmeta + sizeof(*bmeta), 0, PAGE_SIZE - sizeof(*bmeta));
> +
> return false;
> }
>
> @@ -2532,6 +2535,16 @@ bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kas
> return true;
> }
>
> +void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size)
> +{
> + if (!buffer || !buffer->meta)
> + return NULL;
> +
> + *size = PAGE_SIZE - sizeof(*buffer->meta);
> +
> + return (void *)buffer->meta + sizeof(*buffer->meta);
> +}
> +
> /**
> * ring_buffer_free - free a ring buffer.
> * @buffer: the buffer to free.
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset
2025-02-05 22:50 ` [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset Steven Rostedt
@ 2025-02-06 5:22 ` Masami Hiramatsu
2025-02-06 15:24 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 5:22 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:35 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> There's no reason to save the KASLR offset for the ring buffer itself.
> That is used by the tracer. Now that the tracer has a way to save data in
> the persistent memory of the ring buffer, have the tracing infrastructure
> take care of the saving of the KASLR offset.
>
Looks good to me. But note that the scratchpad size may not enough for
module table later, because 1 module requires at least the name[]
(64byte - sizeof(ulong)) and the base address (ulong). This means
1 entry consumes 64byte. Thus there can be only 63 entries + meta
data in 4K page. My ubuntu loads 189(!) modules;
$ lsmod | wc -l
190
so we want 255 entries, which requires 16KB.
Thank you,
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> include/linux/ring_buffer.h | 1 -
> kernel/trace/ring_buffer.c | 47 -------------------------------------
> kernel/trace/trace.c | 38 ++++++++++++++++++++++++++----
> kernel/trace/trace.h | 6 +++--
> 4 files changed, 38 insertions(+), 54 deletions(-)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index b95f940fd07a..d6d9c94e8d8a 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -94,7 +94,6 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
> unsigned long range_size,
> struct lock_class_key *key);
>
> -bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr);
> void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size);
>
> /*
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 5a81ff785665..a42406287281 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -55,7 +55,6 @@ struct ring_buffer_meta {
> };
>
> struct ring_buffer_cpu_meta {
> - unsigned long kaslr_addr;
> unsigned long first_buffer;
> unsigned long head_buffer;
> unsigned long commit_buffer;
> @@ -557,8 +556,6 @@ struct trace_buffer {
>
> struct ring_buffer_meta *meta;
>
> - unsigned long kaslr_addr;
> -
> unsigned int subbuf_size;
> unsigned int subbuf_order;
> unsigned int max_data_size;
> @@ -1931,15 +1928,6 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
> }
> }
>
> -static void rb_meta_init_text_addr(struct ring_buffer_cpu_meta *meta)
> -{
> -#ifdef CONFIG_RANDOMIZE_BASE
> - meta->kaslr_addr = kaslr_offset();
> -#else
> - meta->kaslr_addr = 0;
> -#endif
> -}
> -
> static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> {
> struct ring_buffer_cpu_meta *meta;
> @@ -1967,7 +1955,6 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> meta->first_buffer += delta;
> meta->head_buffer += delta;
> meta->commit_buffer += delta;
> - buffer->kaslr_addr = meta->kaslr_addr;
> continue;
> }
>
> @@ -1984,7 +1971,6 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
> subbuf = rb_subbufs_from_meta(meta);
>
> meta->first_buffer = (unsigned long)subbuf;
> - rb_meta_init_text_addr(meta);
>
> /*
> * The buffers[] array holds the order of the sub-buffers
> @@ -2514,27 +2500,6 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag
> return alloc_buffer(size, flags, order, start, start + range_size, key);
> }
>
> -/**
> - * ring_buffer_last_boot_delta - return the delta offset from last boot
> - * @buffer: The buffer to return the delta from
> - * @text: Return text delta
> - * @data: Return data delta
> - *
> - * Returns: The true if the delta is non zero
> - */
> -bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, unsigned long *kaslr_addr)
> -{
> - if (!buffer)
> - return false;
> -
> - if (!buffer->kaslr_addr)
> - return false;
> -
> - *kaslr_addr = buffer->kaslr_addr;
> -
> - return true;
> -}
> -
> void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size)
> {
> if (!buffer || !buffer->meta)
> @@ -6098,7 +6063,6 @@ static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
> void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
> {
> struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
> - struct ring_buffer_cpu_meta *meta;
>
> if (!cpumask_test_cpu(cpu, buffer->cpumask))
> return;
> @@ -6117,11 +6081,6 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
> atomic_dec(&cpu_buffer->record_disabled);
> atomic_dec(&cpu_buffer->resize_disabled);
>
> - /* Make sure persistent meta now uses this buffer's addresses */
> - meta = rb_range_meta(buffer, 0, cpu_buffer->cpu);
> - if (meta)
> - rb_meta_init_text_addr(meta);
> -
> mutex_unlock(&buffer->mutex);
> }
> EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
> @@ -6136,7 +6095,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
> void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
> {
> struct ring_buffer_per_cpu *cpu_buffer;
> - struct ring_buffer_cpu_meta *meta;
> int cpu;
>
> /* prevent another thread from changing buffer sizes */
> @@ -6164,11 +6122,6 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
>
> reset_disabled_cpu_buffer(cpu_buffer);
>
> - /* Make sure persistent meta now uses this buffer's addresses */
> - meta = rb_range_meta(buffer, 0, cpu_buffer->cpu);
> - if (meta)
> - rb_meta_init_text_addr(meta);
> -
> atomic_dec(&cpu_buffer->record_disabled);
> atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled);
> }
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index a9e8eaf1d47e..cb9f8e6878a0 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5994,8 +5994,14 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
> return ret;
> }
>
> +struct trace_scratch {
> + unsigned long kaslr_addr;
> +};
> +
> static void update_last_data(struct trace_array *tr)
> {
> + struct trace_scratch *tscratch;
> +
> if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> return;
>
> @@ -6010,6 +6016,17 @@ static void update_last_data(struct trace_array *tr)
> /* Using current data now */
> tr->text_delta = 0;
>
> + if (!tr->scratch)
> + return;
> +
> + tscratch = tr->scratch;
> +
> + /* Set the persistent ring buffer meta data to this address */
> +#ifdef CONFIG_RANDOMIZE_BASE
> + tscratch->kaslr_addr = kaslr_offset();
> +#else
> + tscratch->kaslr_addr = 0;
> +#endif
> tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
> }
>
> @@ -6823,6 +6840,7 @@ static ssize_t
> tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
> {
> struct trace_array *tr = filp->private_data;
> + struct trace_scratch *tscratch = tr->scratch;
> struct seq_buf seq;
> char buf[64];
>
> @@ -6835,10 +6853,10 @@ tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t
> * Otherwise it shows the KASLR address from the previous boot which
> * should not be the same as the current boot.
> */
> - if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> + if (!tscratch || !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> seq_buf_puts(&seq, "Offset: current\n");
> else
> - seq_buf_printf(&seq, "Offset: %lx\n", tr->kaslr_addr);
> + seq_buf_printf(&seq, "Offset: %lx\n", tscratch->kaslr_addr);
>
> return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq));
> }
> @@ -9212,6 +9230,8 @@ static int
> allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size)
> {
> enum ring_buffer_flags rb_flags;
> + unsigned int scratch_size;
> + void *scratch;
>
> rb_flags = tr->trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0;
>
> @@ -9222,10 +9242,20 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
> tr->range_addr_start,
> tr->range_addr_size);
>
> + scratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
> + if (scratch) {
> + tr->scratch = scratch;
> + tr->scratch_size = scratch_size;
> +
> #ifdef CONFIG_RANDOMIZE_BASE
> - if (ring_buffer_last_boot_delta(buf->buffer, &tr->kaslr_addr))
> - tr->text_delta = kaslr_offset() - tr->kaslr_addr;
> + {
> + struct trace_scratch *tscratch = tr->scratch;
> +
> + if (tscratch->kaslr_addr)
> + tr->text_delta = kaslr_offset() - tscratch->kaslr_addr;
> + }
> #endif
> + }
> /*
> * This is basically the same as a mapped buffer,
> * with the same restrictions.
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index abe8169c3e87..3a020fb82a34 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -348,8 +348,11 @@ struct trace_array {
> unsigned int mapped;
> unsigned long range_addr_start;
> unsigned long range_addr_size;
> - unsigned long kaslr_addr;
> long text_delta;
> + void *scratch; /* pointer in persistent memory */
> + int scratch_size;
> +
> + int buffer_disabled;
>
> struct trace_pid_list __rcu *filtered_pids;
> struct trace_pid_list __rcu *filtered_no_pids;
> @@ -367,7 +370,6 @@ struct trace_array {
> * CONFIG_TRACER_MAX_TRACE.
> */
> arch_spinlock_t max_lock;
> - int buffer_disabled;
> #ifdef CONFIG_FTRACE_SYSCALLS
> int sys_refcount_enter;
> int sys_refcount_exit;
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-05 22:50 ` [PATCH 5/8] module: Add module_for_each_mod() function Steven Rostedt
@ 2025-02-06 5:28 ` Masami Hiramatsu
2025-02-06 15:27 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 5:28 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Luis Chamberlain, Petr Pavlu,
Sami Tolvanen, Daniel Gomez, linux-modules
On Wed, 05 Feb 2025 17:50:36 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> The tracing system needs a way to save all the currently loaded modules
> and their addresses into persistent memory so that it can evaluate the
> addresses on a reboot from a crash. When the persistent memory trace
> starts, it will load the module addresses and names into the persistent
> memory. To do so, it will call the module_for_each_mod() function and pass
> it a function and data structure to get called on each loaded module. Then
> it can record the memory.
>
> This only implements that function.
>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> Cc: Petr Pavlu <petr.pavlu@suse.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Daniel Gomez <da.gomez@samsung.com>
> Cc: linux-modules@vger.kernel.org
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> include/linux/module.h | 6 ++++++
> kernel/module/main.c | 14 ++++++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 23792d5d7b74..9e1f42f03511 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -779,6 +779,8 @@ static inline void *module_writable_address(struct module *mod, void *loc)
> return __module_writable_address(mod, loc);
> }
>
> +void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data);
> +
> #else /* !CONFIG_MODULES... */
>
> static inline struct module *__module_address(unsigned long addr)
> @@ -891,6 +893,10 @@ static inline void *module_writable_address(struct module *mod, void *loc)
> {
> return loc;
> }
> +
> +static inline void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
> +{
> +}
> #endif /* CONFIG_MODULES */
>
> #ifdef CONFIG_SYSFS
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 1fb9ad289a6f..ea1fe313fb89 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -3809,6 +3809,20 @@ bool is_module_text_address(unsigned long addr)
> return ret;
> }
>
It is better to add a kerneldoc for this API.
/**
* module_for_each_mod() - iterate all modules
* @func: Callback function
* @data: User data
*
* Call the @func with each module in the system. If @func returns !0, this
* stops itrating. Note that @func must not sleep since it is called under
* the preemption disabled.
*/
BTW, do we really need to disable preempt or is it enough to call
rcu_read_lock()?
Thank you,
> +void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
> +{
> + struct module *mod;
> +
> + preempt_disable();
> + list_for_each_entry_rcu(mod, &modules, list) {
> + if (mod->state == MODULE_STATE_UNFORMED)
> + continue;
> + if (func(mod, data))
> + break;
> + }
> + preempt_enable();
> +}
> +
> /**
> * __module_text_address() - get the module whose code contains an address.
> * @addr: the address.
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 6/8] tracing: Have persistent trace instances save module addresses
2025-02-05 22:50 ` [PATCH 6/8] tracing: Have persistent trace instances save module addresses Steven Rostedt
@ 2025-02-06 8:26 ` Masami Hiramatsu
2025-02-06 15:29 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 8:26 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:37 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> For trace instances that are mapped to persistent memory, have them use
> the scratch area to save the currently loaded modules. This will allow
> where the modules have been loaded on the next boot so that their
> addresses can be deciphered by using where they were loaded previously.
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> kernel/trace/trace.c | 99 ++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 87 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index cb9f8e6878a0..a8e5f7ac2193 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5994,14 +5994,60 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
> return ret;
> }
>
> +struct trace_mod_entry {
> + unsigned long mod_addr;
> + char mod_name[MODULE_NAME_LEN];
> +};
> +
> struct trace_scratch {
> unsigned long kaslr_addr;
> + unsigned long nr_entries;
> + struct trace_mod_entry entries[];
> };
>
> +static int save_mod(struct module *mod, void *data)
> +{
> + struct trace_array *tr = data;
> + struct trace_scratch *tscratch;
> + struct trace_mod_entry *entry;
> + unsigned int size;
> +
> + tscratch = tr->scratch;
> + if (!tscratch)
> + return -1;
> + size = tr->scratch_size;
> +
> + if (struct_size(tscratch, entries, tscratch->nr_entries + 1) > size)
> + return -1;
> +
> + entry = &tscratch->entries[tscratch->nr_entries];
> +
> + tscratch->nr_entries++;
> +
> + entry->mod_addr = (unsigned long)mod->mem[MOD_TEXT].base;
> + strscpy(entry->mod_name, mod->name);
> +
> + return 0;
> +}
> +
> static void update_last_data(struct trace_array *tr)
> {
> struct trace_scratch *tscratch;
>
> + if (!(tr->flags & TRACE_ARRAY_FL_BOOT))
> + return;
> +
> + /* Reset the module list and reload them */
> + if (tr->scratch) {
> + struct trace_scratch *tscratch = tr->scratch;
> +
> + memset(tscratch->entries, 0,
> + flex_array_size(tscratch, entries, tscratch->nr_entries));
> + tscratch->nr_entries = 0;
> +
> + module_for_each_mod(save_mod, tr);
> + }
This maybe too frequently scan the module because the update_last_data() is
called when;
- change tracer (this maybe OK)
- update "set_event"
- write 1 to "enable" under events
- change pid filter
- etc.
Once it is scanned, it should not scan again, but update by module
callback, because usually events are enabled individually (thus
write 1 to "event" can happen multiple time online).
I think we can move this after TRACE_ARRAY_FL_LAST_BOOT check,
if TRACE_ARRAY_FL_LAST_BOOT flag is set, that flag should be cleared
with updating the tscratch, and the flag is not set, we can skip updating
the scratch.
Thank you,
> +
> if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> return;
>
> @@ -9226,6 +9272,46 @@ static struct dentry *trace_instance_dir;
> static void
> init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer);
>
> +static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned int size)
> +{
> + struct trace_scratch *tscratch = scratch;
> + struct trace_mod_entry *entry;
> +
> + if (!scratch)
> + return;
> +
> + tr->scratch = scratch;
> + tr->scratch_size = size;
> +
> +#ifdef CONFIG_RANDOMIZE_BASE
> + if (tscratch->kaslr_addr)
> + tr->text_delta = kaslr_offset() - tscratch->kaslr_addr;
> +#endif
> +
> + if (struct_size(tscratch, entries, tscratch->nr_entries) > size)
> + goto reset;
> +
> + /* Check if each module name is a valid string */
> + for (int i = 0; i < tscratch->nr_entries; i++) {
> + int n;
> +
> + entry = &tscratch->entries[i];
> +
> + for (n = 0; n < MODULE_NAME_LEN; n++) {
> + if (entry->mod_name[n] == '\0')
> + break;
> + if (!isprint(entry->mod_name[n]))
> + goto reset;
> + }
> + if (n == MODULE_NAME_LEN)
> + goto reset;
> + }
> + return;
> + reset:
> + /* Invalid trace modules */
> + memset(scratch, 0, size);
> +}
> +
> static int
> allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size)
> {
> @@ -9243,19 +9329,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
> tr->range_addr_size);
>
> scratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
> - if (scratch) {
> - tr->scratch = scratch;
> - tr->scratch_size = scratch_size;
> + setup_trace_scratch(tr, scratch, scratch_size);
>
> -#ifdef CONFIG_RANDOMIZE_BASE
> - {
> - struct trace_scratch *tscratch = tr->scratch;
> -
> - if (tscratch->kaslr_addr)
> - tr->text_delta = kaslr_offset() - tscratch->kaslr_addr;
> - }
> -#endif
> - }
> /*
> * This is basically the same as a mapped buffer,
> * with the same restrictions.
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/8] tracing: Update modules to persistent instances when loaded
2025-02-05 22:50 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
@ 2025-02-06 10:01 ` Masami Hiramatsu
2025-02-06 15:36 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 10:01 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:39 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> When a module is loaded and a persistent buffer is actively tracing, add
> it to the list of modules in the persistent memory.
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> kernel/trace/trace.c | 25 +++++++++++++++++++++++
> kernel/trace/trace.h | 2 ++
> kernel/trace/trace_events.c | 40 ++++++++++++++++++++++++++-----------
> 3 files changed, 55 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7b4027804cd4..443f2bc5b856 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -10088,6 +10088,30 @@ static void trace_module_remove_evals(struct module *mod)
> static inline void trace_module_remove_evals(struct module *mod) { }
> #endif /* CONFIG_TRACE_EVAL_MAP_FILE */
>
> +static bool trace_array_active(struct trace_array *tr)
> +{
> + if (tr->current_trace != &nop_trace)
> + return true;
> +
> + /* 0 is no events, 1 is all disabled */
> + return trace_events_enabled(tr, NULL) > 1;
> +}
> +
> +static void trace_module_record(struct module *mod)
> +{
> + struct trace_array *tr;
> +
> + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> + /* Update any persistent trace array that has already been started */
> + if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
> + TRACE_ARRAY_FL_BOOT) {
> + /* Only update if the trace array is active */
> + if (trace_array_active(tr))
Do we really need this check? It seems that we can just save_mod() if the
above condition is true.
> + save_mod(mod, tr);
> + }
> + }
> +}
> +
> static int trace_module_notify(struct notifier_block *self,
> unsigned long val, void *data)
> {
> @@ -10096,6 +10120,7 @@ static int trace_module_notify(struct notifier_block *self,
> switch (val) {
> case MODULE_STATE_COMING:
> trace_module_add_evals(mod);
> + trace_module_record(mod);
> break;
> case MODULE_STATE_GOING:
> trace_module_remove_evals(mod);
Don't we need to remove the module entry when a module is removed?
(everytime we remove a module, trace data is cleared?)
Thank you,
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 3a020fb82a34..90493220c362 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -786,6 +786,8 @@ extern void trace_find_cmdline(int pid, char comm[]);
> extern int trace_find_tgid(int pid);
> extern void trace_event_follow_fork(struct trace_array *tr, bool enable);
>
> +extern int trace_events_enabled(struct trace_array *tr, const char *system);
> +
> #ifdef CONFIG_DYNAMIC_FTRACE
> extern unsigned long ftrace_update_tot_cnt;
> extern unsigned long ftrace_number_of_pages;
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 4cb275316e51..107767afe0ab 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1811,28 +1811,28 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> return cnt;
> }
>
> -static ssize_t
> -system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> - loff_t *ppos)
> +/*
> + * Returns:
> + * 0 : no events exist?
> + * 1 : all events are disabled
> + * 2 : all events are enabled
> + * 3 : some events are enabled and some are enabled
> + */
> +int trace_events_enabled(struct trace_array *tr, const char *system)
> {
> - const char set_to_char[4] = { '?', '0', '1', 'X' };
> - struct trace_subsystem_dir *dir = filp->private_data;
> - struct event_subsystem *system = dir->subsystem;
> struct trace_event_call *call;
> struct trace_event_file *file;
> - struct trace_array *tr = dir->tr;
> - char buf[2];
> int set = 0;
> - int ret;
>
> - mutex_lock(&event_mutex);
> + guard(mutex)(&event_mutex);
> +
> list_for_each_entry(file, &tr->events, list) {
> call = file->event_call;
> if ((call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) ||
> !trace_event_name(call) || !call->class || !call->class->reg)
> continue;
>
> - if (system && strcmp(call->class->system, system->name) != 0)
> + if (system && strcmp(call->class->system, system) != 0)
> continue;
>
> /*
> @@ -1848,7 +1848,23 @@ system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> if (set == 3)
> break;
> }
> - mutex_unlock(&event_mutex);
> +
> + return set;
> +}
> +
> +static ssize_t
> +system_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> + loff_t *ppos)
> +{
> + const char set_to_char[4] = { '?', '0', '1', 'X' };
> + struct trace_subsystem_dir *dir = filp->private_data;
> + struct event_subsystem *system = dir->subsystem;
> + struct trace_array *tr = dir->tr;
> + char buf[2];
> + int set;
> + int ret;
> +
> + set = trace_events_enabled(tr, system ? system->name : NULL);
>
> buf[0] = set_to_char[set];
> buf[1] = '\n';
> --
> 2.45.2
>
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer
2025-02-06 5:10 ` Masami Hiramatsu
@ 2025-02-06 15:19 ` Steven Rostedt
0 siblings, 0 replies; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 15:19 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 14:10:35 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 7146b780176f..0446d053dbd6 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -49,7 +49,12 @@ static void update_pages_handler(struct work_struct *work);
> >
>
> Can you add a comment to explain the layout of these meta-data and subbufs?
Sure.
>
>
> > struct ring_buffer_meta {
> > int magic;
> > - int struct_size;
> > + int struct_sizes;
> > + unsigned long total_size;
> > + unsigned long buffers_offset;
> > +};
> > +
> > @@ -1664,14 +1674,73 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
> > return (void *)ptr;
> > }
> >
> > +/*
> > + * See if the existing memory contains a valid meta section.
> > + * if so, use that, otherwise initialize it.
>
> Note that this must be called only for the persistent ring buffer.
Or just make it return false if bmeta is NULL.
The range_addr_start is only set for persistent ring buffers.
Also, this is to initialize a persistent ring buffer meta data, which makes
calling it for anything else rather strange.
>
> > + */
> > +static bool rb_meta_init(struct trace_buffer *buffer)
> > +{
> > + unsigned long ptr = buffer->range_addr_start;
> > + struct ring_buffer_meta *bmeta;
> > + unsigned long total_size;
> > + int struct_sizes;
> > +
> > + bmeta = (struct ring_buffer_meta *)ptr;
> > + buffer->meta = bmeta;
> > +
> > + total_size = buffer->range_addr_end - buffer->range_addr_start;
> > +
> > + struct_sizes = sizeof(struct ring_buffer_cpu_meta);
> > + struct_sizes |= sizeof(*bmeta) << 16;
> > +
> > + /* The first buffer will start one page after the meta page */
> > + ptr += sizeof(*bmeta);
> > + ptr = ALIGN(ptr, PAGE_SIZE);
> > +
> > + if (bmeta->magic != RING_BUFFER_META_MAGIC) {
> > + pr_info("Ring buffer boot meta mismatch of magic\n");
> > + goto init;
> > + }
> > +
> > + if (bmeta->struct_sizes != struct_sizes) {
> > + pr_info("Ring buffer boot meta mismatch of struct size\n");
> > + goto init;
> > + }
> > +
> > + if (bmeta->total_size != total_size) {
> > + pr_info("Ring buffer boot meta mismatch of total size\n");
> > + goto init;
> > + }
> > +
> > + if (bmeta->buffers_offset > bmeta->total_size) {
> > + pr_info("Ring buffer boot meta mismatch of offset outside of total size\n");
> > + goto init;
> > + }
> > +
> > + if (bmeta->buffers_offset != (void *)ptr - (void *)bmeta) {
> > + pr_info("Ring buffer boot meta mismatch of first buffer offset\n");
> > + goto init;
> > + }
> > +
> > + return true;
> > +
> > @@ -2327,10 +2386,16 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
> >
> > /* If start/end are specified, then that overrides size */
> > if (start && end) {
> > + unsigned long buffers_start;
> > unsigned long ptr;
> > int n;
> >
> > size = end - start;
> > +
> > + /* Subtract the buffer meta data */
> > + size -= PAGE_SIZE;
> > + buffers_start = start + PAGE_SIZE;
> > +
>
> So the buffer meta data is PAGE_SIZE aligned?
It's documented in the reserve_mem to be, but not as a requirement.
I should modify this to not depend on that.
Thanks,
-- Steve
>
> > size = size / nr_cpu_ids;
> >
> > /*
> > @@ -2340,7 +2405,7 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
> > * needed, plus account for the integer array index that
> > * will be appended to the meta data.
> > */
> > - nr_pages = (size - sizeof(struct ring_buffer_meta)) /
> > + nr_pages = (size - sizeof(struct ring_buffer_cpu_meta)) /
> > (subbuf_size + sizeof(int));
> > /* Need at least two pages plus the reader page */
> > if (nr_pages < 3)
> > @@ -2348,8 +2413,8 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
> >
> > again:
> > /* Make sure that the size fits aligned */
> > - for (n = 0, ptr = start; n < nr_cpu_ids; n++) {
> > - ptr += sizeof(struct ring_buffer_meta) +
> > + for (n = 0, ptr = buffers_start; n < nr_cpu_ids; n++) {
> > + ptr += sizeof(struct ring_buffer_cpu_meta) +
> > sizeof(int) * nr_pages;
> > ptr = ALIGN(ptr, subbuf_size);
> > ptr += subbuf_size * nr_pages;
> > @@ -3075,7 +3140,7 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
> > }
> >
> > /* Return the index into the sub-buffers for a given sub-buffer */
> > -static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
> > +static int rb_meta_subbuf_idx(struct ring_buffer_cpu_meta *meta, void *subbuf)
> > {
> > void *subbuf_array;
> >
> > @@ -3087,7 +3152,7 @@ static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
> > static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
> > struct buffer_page *next_page)
> > {
> > - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> > + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> > unsigned long old_head = (unsigned long)next_page->page;
> > unsigned long new_head;
> >
> > @@ -3104,7 +3169,7 @@ static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
> > static void rb_update_meta_reader(struct ring_buffer_per_cpu *cpu_buffer,
> > struct buffer_page *reader)
> > {
> > - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> > + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> > void *old_reader = cpu_buffer->reader_page->page;
> > void *new_reader = reader->page;
> > int id;
> > @@ -3733,7 +3798,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
> > rb_page_write(cpu_buffer->commit_page));
> > rb_inc_page(&cpu_buffer->commit_page);
> > if (cpu_buffer->ring_meta) {
> > - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> > + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> > meta->commit_buffer = (unsigned long)cpu_buffer->commit_page->page;
> > }
> > /* add barrier to keep gcc from optimizing too much */
> > @@ -5986,7 +6051,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> > if (cpu_buffer->mapped) {
> > rb_update_meta_page(cpu_buffer);
> > if (cpu_buffer->ring_meta) {
> > - struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> > + struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> > meta->commit_buffer = meta->head_buffer;
> > }
> > }
> > @@ -6020,7 +6085,7 @@ static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
> > void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
> > {
> > struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
> > - struct ring_buffer_meta *meta;
> > + struct ring_buffer_cpu_meta *meta;
> >
> > if (!cpumask_test_cpu(cpu, buffer->cpumask))
> > return;
> > @@ -6058,7 +6123,7 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
> > void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
> > {
> > struct ring_buffer_per_cpu *cpu_buffer;
> > - struct ring_buffer_meta *meta;
> > + struct ring_buffer_cpu_meta *meta;
> > int cpu;
> >
> > /* prevent another thread from changing buffer sizes */
>
> Thank you,
>
> > --
> > 2.45.2
> >
> >
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset
2025-02-06 5:22 ` Masami Hiramatsu
@ 2025-02-06 15:24 ` Steven Rostedt
2025-02-07 0:58 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 15:24 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 14:22:32 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> On Wed, 05 Feb 2025 17:50:35 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > From: Steven Rostedt <rostedt@goodmis.org>
> >
> > There's no reason to save the KASLR offset for the ring buffer itself.
> > That is used by the tracer. Now that the tracer has a way to save data in
> > the persistent memory of the ring buffer, have the tracing infrastructure
> > take care of the saving of the KASLR offset.
> >
>
> Looks good to me. But note that the scratchpad size may not enough for
> module table later, because 1 module requires at least the name[]
> (64byte - sizeof(ulong)) and the base address (ulong). This means
> 1 entry consumes 64byte. Thus there can be only 63 entries + meta
> data in 4K page. My ubuntu loads 189(!) modules;
>
> $ lsmod | wc -l
> 190
>
> so we want 255 entries, which requires 16KB.
So, I was thinking of modifying the allocation of the persistent ring
buffer, which currently is
#define ring_buffer_alloc_range(size, flags, order, start, range_size)
[ it's a macro to add lockdep key information in it ]
But I should change it to include a scratch size, and allow the tracing
system to define how much of the range it should allocate for scratch.
Then we could do:
buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
tr->range_addr_start,
tr->range_addr_size,
struct_size(tscratch, entries, 128));
Which would make sure that the scratch size contains enough memory to hold
128 modules.
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-06 5:28 ` Masami Hiramatsu
@ 2025-02-06 15:27 ` Steven Rostedt
2025-02-10 13:04 ` Petr Pavlu
2025-02-14 22:30 ` Steven Rostedt
0 siblings, 2 replies; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 15:27 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton, Luis Chamberlain, Petr Pavlu, Sami Tolvanen,
Daniel Gomez, linux-modules
On Thu, 6 Feb 2025 14:28:17 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -3809,6 +3809,20 @@ bool is_module_text_address(unsigned long addr)
> > return ret;
> > }
> >
>
> It is better to add a kerneldoc for this API.
Agreed, but I was planning on this changing. Waiting to hear from the
module maintainers.
>
> /**
> * module_for_each_mod() - iterate all modules
> * @func: Callback function
> * @data: User data
> *
> * Call the @func with each module in the system. If @func returns !0, this
> * stops itrating. Note that @func must not sleep since it is called under
> * the preemption disabled.
> */
>
> BTW, do we really need to disable preempt or is it enough to call
> rcu_read_lock()?
Bah, as I expected this function to be changed, I didn't spend too much
time on looking at its implementation. I just cut and pasted how the other
loops worked. But yes, it should not be disabling preemption. In fact, I
think the module code itself should not be disabling preemption!
I'll have to go and look into that.
Thanks!
-- Steve
>
> Thank you,
>
> > +void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
> > +{
> > + struct module *mod;
> > +
> > + preempt_disable();
> > + list_for_each_entry_rcu(mod, &modules, list) {
> > + if (mod->state == MODULE_STATE_UNFORMED)
> > + continue;
> > + if (func(mod, data))
> > + break;
> > + }
> > + preempt_enable();
> > +}
> > +
> > /**
> > * __module_text_address() - get the module whose code contains an address.
> > * @addr: the address.
> > --
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 6/8] tracing: Have persistent trace instances save module addresses
2025-02-06 8:26 ` Masami Hiramatsu
@ 2025-02-06 15:29 ` Steven Rostedt
2025-02-06 16:53 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 15:29 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 17:26:42 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> This maybe too frequently scan the module because the update_last_data() is
> called when;
> - change tracer (this maybe OK)
> - update "set_event"
> - write 1 to "enable" under events
> - change pid filter
> - etc.
>
> Once it is scanned, it should not scan again, but update by module
> callback, because usually events are enabled individually (thus
> write 1 to "event" can happen multiple time online).
>
> I think we can move this after TRACE_ARRAY_FL_LAST_BOOT check,
> if TRACE_ARRAY_FL_LAST_BOOT flag is set, that flag should be cleared
> with updating the tscratch, and the flag is not set, we can skip updating
> the scratch.
No, because we could enable it, disable it, and then the user could be
adding and removing modules all day long, and we only add and do not remove
(I do have a patch that does remove them).
But we can do the same check as patch 8 does, and only do this if the
tracer wasn't already active.
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/8] tracing: Update modules to persistent instances when loaded
2025-02-06 10:01 ` Masami Hiramatsu
@ 2025-02-06 15:36 ` Steven Rostedt
2025-02-06 16:53 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 15:36 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 19:01:24 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > +static void trace_module_record(struct module *mod)
> > +{
> > + struct trace_array *tr;
> > +
> > + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> > + /* Update any persistent trace array that has already been started */
> > + if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
> > + TRACE_ARRAY_FL_BOOT) {
> > + /* Only update if the trace array is active */
> > + if (trace_array_active(tr))
>
> Do we really need this check? It seems that we can just save_mod() if the
> above condition is true.
It gets a little more complicated if we need to add and remove modules.
>
> > + save_mod(mod, tr);
> > + }
> > + }
> > +}
> > +
> > static int trace_module_notify(struct notifier_block *self,
> > unsigned long val, void *data)
> > {
> > @@ -10096,6 +10120,7 @@ static int trace_module_notify(struct notifier_block *self,
> > switch (val) {
> > case MODULE_STATE_COMING:
> > trace_module_add_evals(mod);
> > + trace_module_record(mod);
> > break;
> > case MODULE_STATE_GOING:
> > trace_module_remove_evals(mod);
>
> Don't we need to remove the module entry when a module is removed?
> (everytime we remove a module, trace data is cleared?)
I do have a patch that that removes entries, but I decided we don't really
want to do that.
If we want to have events for modules that were removed. Note, the ring
buffer is cleared if any module event was ever enabled and then the module
is removed, as how to print it is removed too. But we could disable that
for the persistent ring buffers as they should not be using the default
trace event print format anyway.
As for stack traces, we still want the module it was for when the stack
trace happens. A common bug we see is when a module is removed, it can
cause other bugs. We want to know about modules that were removed. Keeping
that information about removed modules will allow us to see what functions
were called by a stack trace for a module that was removed.
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 6/8] tracing: Have persistent trace instances save module addresses
2025-02-06 15:29 ` Steven Rostedt
@ 2025-02-06 16:53 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 16:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 10:29:49 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 6 Feb 2025 17:26:42 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > This maybe too frequently scan the module because the update_last_data() is
> > called when;
> > - change tracer (this maybe OK)
> > - update "set_event"
> > - write 1 to "enable" under events
> > - change pid filter
> > - etc.
> >
> > Once it is scanned, it should not scan again, but update by module
> > callback, because usually events are enabled individually (thus
> > write 1 to "event" can happen multiple time online).
> >
> > I think we can move this after TRACE_ARRAY_FL_LAST_BOOT check,
> > if TRACE_ARRAY_FL_LAST_BOOT flag is set, that flag should be cleared
> > with updating the tscratch, and the flag is not set, we can skip updating
> > the scratch.
>
> No, because we could enable it, disable it, and then the user could be
> adding and removing modules all day long, and we only add and do not remove
> (I do have a patch that does remove them).
I think TRACE_ARRAY_FL_LAST_BOOT is something like one-way flag, it is set
if the buffer is persistent, but cleared when it is reused (start writing)
and never be back. Isn't it?
>
> But we can do the same check as patch 8 does, and only do this if the
> tracer wasn't already active.
Yeah, so I think this change needs patch 8 change.
Thank you,
>
> -- Steve
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/8] tracing: Update modules to persistent instances when loaded
2025-02-06 15:36 ` Steven Rostedt
@ 2025-02-06 16:53 ` Masami Hiramatsu
2025-02-06 16:58 ` [PATCH 1/3] tracing: Skip update_last_data() if it is already updated Masami Hiramatsu (Google)
` (3 more replies)
0 siblings, 4 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-06 16:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 10:36:12 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 6 Feb 2025 19:01:24 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > > +static void trace_module_record(struct module *mod)
> > > +{
> > > + struct trace_array *tr;
> > > +
> > > + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> > > + /* Update any persistent trace array that has already been started */
> > > + if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
> > > + TRACE_ARRAY_FL_BOOT) {
> > > + /* Only update if the trace array is active */
> > > + if (trace_array_active(tr))
> >
> > Do we really need this check? It seems that we can just save_mod() if the
> > above condition is true.
>
> It gets a little more complicated if we need to add and remove modules.
Yeah, but for converting the module address, we don't want to see other
module information.
>
> >
> > > + save_mod(mod, tr);
> > > + }
> > > + }
> > > +}
> > > +
> > > static int trace_module_notify(struct notifier_block *self,
> > > unsigned long val, void *data)
> > > {
> > > @@ -10096,6 +10120,7 @@ static int trace_module_notify(struct notifier_block *self,
> > > switch (val) {
> > > case MODULE_STATE_COMING:
> > > trace_module_add_evals(mod);
> > > + trace_module_record(mod);
> > > break;
> > > case MODULE_STATE_GOING:
> > > trace_module_remove_evals(mod);
> >
> > Don't we need to remove the module entry when a module is removed?
> > (everytime we remove a module, trace data is cleared?)
>
> I do have a patch that that removes entries, but I decided we don't really
> want to do that.
>
> If we want to have events for modules that were removed. Note, the ring
> buffer is cleared if any module event was ever enabled and then the module
> is removed, as how to print it is removed too. But we could disable that
> for the persistent ring buffers as they should not be using the default
> trace event print format anyway.
Yeah, if the event is on the module the buffer is cleared.
But the module address can be in the stacktrace. In that case, the event
in the module is not enabled, but other events like sched_switch can
take stacktrace which can include the module address. In that case, the
buffer is also cleared when the module is removed?
> As for stack traces, we still want the module it was for when the stack
> trace happens. A common bug we see is when a module is removed, it can
> cause other bugs. We want to know about modules that were removed. Keeping
> that information about removed modules will allow us to see what functions
> were called by a stack trace for a module that was removed.
Hmm, but that should be covered by module load/unload events?
Anyway, this series does not cover the module text address in the stacktrace.
I just made a series of patches (which also not cover the module removal yet),
but it can show the basic idea.
My idea is to sort the previous module entries in the persistent buffer
when it is setup. We also make a "module_delta" array in the trace_array.
Then the print function can searche the appropriate module info from
the sorted table and find corresponding delta from "module_delta".
For example,
/sys/kernel/tracing/instances/boot_mapped # cat trace
<...>-1629 [006] ..... 202.860051: foo_bar_with_fn: foo Look at me 4
<...>-1629 [006] ..... 202.860059: <stack trace>
=> trace_event_raw_event_foo_bar_with_fn
=> simple_thread_fn
=> kthread
=> ret_from_fork
=> ret_from_fork_asm
/sys/kernel/tracing/instances/boot_mapped # cat last_boot_info
Offset: 0
ffffffffa0016000 trace_events_sample
ffffffffa0025000 trace_printk
/sys/kernel/tracing/instances/boot_mapped # lsmod
trace_events_sample 45056 0 - Live 0xffffffffa001c000
trace_printk 12288 0 - Live 0xffffffffa0016000
Let me share it.
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/3] tracing: Skip update_last_data() if it is already updated
2025-02-06 16:53 ` Masami Hiramatsu
@ 2025-02-06 16:58 ` Masami Hiramatsu (Google)
2025-02-06 16:58 ` [PATCH 2/3] tracing: Remove checking the activity when module map is updating Masami Hiramatsu (Google)
` (2 subsequent siblings)
3 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu (Google) @ 2025-02-06 16:58 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, linux-kernel, linux-trace-kernel, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
If the last boot data is already cleared, there is no reason to update it
again. Skip if the TRACE_ARRAY_FL_LAST_BOOT is cleared.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/trace.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 443f2bc5b856..0f010a34de84 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6041,6 +6041,12 @@ static void update_last_data(struct trace_array *tr)
if (!(tr->flags & TRACE_ARRAY_FL_BOOT))
return;
+ if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
+ return;
+
+ /* Only if the buffer has previous boot data clear and update it. */
+ tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
+
/* Reset the module list and reload them */
if (tr->scratch) {
struct trace_scratch *tscratch = tr->scratch;
@@ -6052,9 +6058,6 @@ static void update_last_data(struct trace_array *tr)
module_for_each_mod(save_mod, tr);
}
- if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
- return;
-
/*
* Need to clear all CPU buffers as there cannot be events
* from the previous boot mixed with events with this boot
@@ -6077,7 +6080,6 @@ static void update_last_data(struct trace_array *tr)
#else
tscratch->kaslr_addr = 0;
#endif
- tr->flags &= ~TRACE_ARRAY_FL_LAST_BOOT;
}
/**
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/3] tracing: Remove checking the activity when module map is updating
2025-02-06 16:53 ` Masami Hiramatsu
2025-02-06 16:58 ` [PATCH 1/3] tracing: Skip update_last_data() if it is already updated Masami Hiramatsu (Google)
@ 2025-02-06 16:58 ` Masami Hiramatsu (Google)
2025-03-07 15:21 ` Steven Rostedt
2025-02-06 16:59 ` [PATCH 3/3] tracing: Show last module text symbols in the stacktrace Masami Hiramatsu (Google)
2025-02-06 17:18 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
3 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu (Google) @ 2025-02-06 16:58 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, linux-kernel, linux-trace-kernel, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Remove unnecessary active check because tr->flags already checks it.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/trace.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0f010a34de84..5a064e712fd7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10090,15 +10090,6 @@ static void trace_module_remove_evals(struct module *mod)
static inline void trace_module_remove_evals(struct module *mod) { }
#endif /* CONFIG_TRACE_EVAL_MAP_FILE */
-static bool trace_array_active(struct trace_array *tr)
-{
- if (tr->current_trace != &nop_trace)
- return true;
-
- /* 0 is no events, 1 is all disabled */
- return trace_events_enabled(tr, NULL) > 1;
-}
-
static void trace_module_record(struct module *mod)
{
struct trace_array *tr;
@@ -10107,9 +10098,7 @@ static void trace_module_record(struct module *mod)
/* Update any persistent trace array that has already been started */
if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
TRACE_ARRAY_FL_BOOT) {
- /* Only update if the trace array is active */
- if (trace_array_active(tr))
- save_mod(mod, tr);
+ save_mod(mod, tr);
}
}
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 3/3] tracing: Show last module text symbols in the stacktrace
2025-02-06 16:53 ` Masami Hiramatsu
2025-02-06 16:58 ` [PATCH 1/3] tracing: Skip update_last_data() if it is already updated Masami Hiramatsu (Google)
2025-02-06 16:58 ` [PATCH 2/3] tracing: Remove checking the activity when module map is updating Masami Hiramatsu (Google)
@ 2025-02-06 16:59 ` Masami Hiramatsu (Google)
2025-02-06 17:46 ` Steven Rostedt
2025-02-06 17:18 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
3 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu (Google) @ 2025-02-06 16:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, linux-kernel, linux-trace-kernel, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since the previous boot trace buffer can include module text address in
the stacktrace. As same as the kernel text address, convert the module
text address using the module address information.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/trace.c | 76 +++++++++++++++++++++++++++++++++++++++++--
kernel/trace/trace.h | 4 ++
kernel/trace/trace_output.c | 3 +-
3 files changed, 78 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5a064e712fd7..9d942b7c2651 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -49,6 +49,7 @@
#include <linux/fsnotify.h>
#include <linux/irq_work.h>
#include <linux/workqueue.h>
+#include <linux/sort.h>
#include <asm/setup.h> /* COMMAND_LINE_SIZE and kaslr_offset() */
@@ -6007,6 +6008,27 @@ struct trace_scratch {
static DEFINE_MUTEX(scratch_mutex);
+unsigned long trace_adjust_address(struct trace_array *tr, unsigned long addr)
+{
+ struct trace_scratch *tscratch;
+ int i;
+
+ /* If we don't have last boot delta, return the address */
+ if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
+ return addr;
+
+ tscratch = tr->scratch;
+ if (tscratch && tr->module_delta && tscratch->entries[0].mod_addr < addr) {
+ /* Note that entries are sorted */
+ for (i = 0; i < tr->nr_modules; i++)
+ if (addr < tscratch->entries[i].mod_addr)
+ break;
+ return addr + tr->module_delta[i - 1];
+ }
+
+ return addr + tr->text_delta;
+}
+
static int save_mod(struct module *mod, void *data)
{
struct trace_array *tr = data;
@@ -6068,6 +6090,9 @@ static void update_last_data(struct trace_array *tr)
/* Using current data now */
tr->text_delta = 0;
+ kfree(tr->module_delta);
+ tr->module_delta = NULL;
+ tr->nr_modules = 0;
if (!tr->scratch)
return;
@@ -9351,10 +9376,37 @@ static struct dentry *trace_instance_dir;
static void
init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer);
+static int make_mod_delta(struct module *mod, void *data)
+{
+ struct trace_scratch *tscratch;
+ struct trace_mod_entry *entry;
+ struct trace_array *tr = data;
+ int i;
+
+ tscratch = tr->scratch;
+ for (i = 0; i < tr->nr_modules; i++) {
+ entry = &tscratch->entries[i];
+ if (!strcmp(mod->name, entry->mod_name)) {
+ tr->module_delta[i] = (unsigned long)mod->mem[MOD_TEXT].base - entry->mod_addr;
+ return 1;
+ }
+ }
+ return 0;
+}
+
+static int mod_addr_comp(const void *a, const void *b, const void *data)
+{
+ const struct trace_mod_entry *e1 = a;
+ const struct trace_mod_entry *e2 = b;
+
+ return e1->mod_addr - e2->mod_addr;
+}
+
static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned int size)
{
struct trace_scratch *tscratch = scratch;
struct trace_mod_entry *entry;
+ int i, nr_entries;
if (!scratch)
return;
@@ -9371,7 +9423,7 @@ static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned
goto reset;
/* Check if each module name is a valid string */
- for (int i = 0; i < tscratch->nr_entries; i++) {
+ for (i = 0; i < tscratch->nr_entries; i++) {
int n;
entry = &tscratch->entries[i];
@@ -9385,6 +9437,20 @@ static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned
if (n == MODULE_NAME_LEN)
goto reset;
}
+ nr_entries = i;
+ /* Allocate module delta array */
+ tr->module_delta = kcalloc(nr_entries, sizeof(long), GFP_KERNEL);
+ if (!tr->module_delta)
+ goto reset;
+ tr->nr_modules = nr_entries;
+
+ /* Sort module table by base address. */
+ sort_r(tscratch->entries, nr_entries, sizeof(struct trace_mod_entry),
+ mod_addr_comp, NULL, NULL);
+
+ /* Scan modules */
+ module_for_each_mod(make_mod_delta, tr);
+
return;
reset:
/* Invalid trace modules */
@@ -10093,12 +10159,16 @@ static inline void trace_module_remove_evals(struct module *mod) { }
static void trace_module_record(struct module *mod)
{
struct trace_array *tr;
+ unsigned long flags;
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+ flags = tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT);
/* Update any persistent trace array that has already been started */
- if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
- TRACE_ARRAY_FL_BOOT) {
+ if (flags == TRACE_ARRAY_FL_BOOT) {
save_mod(mod, tr);
+ } else if (flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) {
+ /* Update delta if the module loaded in previous boot */
+ make_mod_delta(mod, tr);
}
}
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 90493220c362..47c0742fe9ec 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -349,6 +349,8 @@ struct trace_array {
unsigned long range_addr_start;
unsigned long range_addr_size;
long text_delta;
+ int nr_modules;
+ long *module_delta;
void *scratch; /* pointer in persistent memory */
int scratch_size;
@@ -465,6 +467,8 @@ extern int tracing_set_clock(struct trace_array *tr, const char *clockstr);
extern bool trace_clock_in_ns(struct trace_array *tr);
+extern unsigned long trace_adjust_address(struct trace_array *tr, unsigned long addr);
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 03d56f711ad1..a5336c4ece8e 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1248,7 +1248,6 @@ static enum print_line_t trace_stack_print(struct trace_iterator *iter,
struct trace_seq *s = &iter->seq;
unsigned long *p;
unsigned long *end;
- long delta = iter->tr->text_delta;
trace_assign_type(field, iter->ent);
end = (unsigned long *)((long)iter->ent + iter->ent_size);
@@ -1265,7 +1264,7 @@ static enum print_line_t trace_stack_print(struct trace_iterator *iter,
trace_seq_puts(s, "[FTRACE TRAMPOLINE]\n");
continue;
}
- seq_print_ip_sym(s, (*p) + delta, flags);
+ seq_print_ip_sym(s, trace_adjust_address(iter->tr, *p), flags);
trace_seq_putc(s, '\n');
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 8/8] tracing: Update modules to persistent instances when loaded
2025-02-06 16:53 ` Masami Hiramatsu
` (2 preceding siblings ...)
2025-02-06 16:59 ` [PATCH 3/3] tracing: Show last module text symbols in the stacktrace Masami Hiramatsu (Google)
@ 2025-02-06 17:18 ` Steven Rostedt
2025-02-07 0:47 ` Masami Hiramatsu
3 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 17:18 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Fri, 7 Feb 2025 01:53:30 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> On Thu, 6 Feb 2025 10:36:12 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Thu, 6 Feb 2025 19:01:24 +0900
> > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> >
> > > > +static void trace_module_record(struct module *mod)
> > > > +{
> > > > + struct trace_array *tr;
> > > > +
> > > > + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> > > > + /* Update any persistent trace array that has already been started */
> > > > + if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
> > > > + TRACE_ARRAY_FL_BOOT) {
> > > > + /* Only update if the trace array is active */
> > > > + if (trace_array_active(tr))
> > >
> > > Do we really need this check? It seems that we can just save_mod() if the
> > > above condition is true.
> >
> > It gets a little more complicated if we need to add and remove modules.
>
> Yeah, but for converting the module address, we don't want to see other
> module information.
But we want to see the module information for modules that were loaded
during the trace. If a module is removed and suddenly the system crashed,
we just lost that information. Hence why I reset the module information
when the tracing starts.
> > If we want to have events for modules that were removed. Note, the ring
> > buffer is cleared if any module event was ever enabled and then the module
> > is removed, as how to print it is removed too. But we could disable that
> > for the persistent ring buffers as they should not be using the default
> > trace event print format anyway.
>
> Yeah, if the event is on the module the buffer is cleared.
> But the module address can be in the stacktrace. In that case, the event
> in the module is not enabled, but other events like sched_switch can
> take stacktrace which can include the module address. In that case, the
> buffer is also cleared when the module is removed?
No. The buffer is only cleared if one of its events were ever enabled. If
no event within the module was enabled, then the buffer is not cleared.
>
> > As for stack traces, we still want the module it was for when the stack
> > trace happens. A common bug we see is when a module is removed, it can
> > cause other bugs. We want to know about modules that were removed. Keeping
> > that information about removed modules will allow us to see what functions
> > were called by a stack trace for a module that was removed.
>
> Hmm, but that should be covered by module load/unload events?
If we have them enabled.
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/3] tracing: Show last module text symbols in the stacktrace
2025-02-06 16:59 ` [PATCH 3/3] tracing: Show last module text symbols in the stacktrace Masami Hiramatsu (Google)
@ 2025-02-06 17:46 ` Steven Rostedt
2025-02-07 1:50 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-06 17:46 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Fri, 7 Feb 2025 01:59:05 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> Since the previous boot trace buffer can include module text address in
> the stacktrace. As same as the kernel text address, convert the module
> text address using the module address information.
Note, this doesn't look like it depends on the first two patches, which I
don't think we want yet. As that assumes we never disable the buffer once
we start it, and may start it again. But that's a debate we can have later.
>
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
> kernel/trace/trace.c | 76 +++++++++++++++++++++++++++++++++++++++++--
> kernel/trace/trace.h | 4 ++
> kernel/trace/trace_output.c | 3 +-
> 3 files changed, 78 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5a064e712fd7..9d942b7c2651 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -49,6 +49,7 @@
> #include <linux/fsnotify.h>
> #include <linux/irq_work.h>
> #include <linux/workqueue.h>
> +#include <linux/sort.h>
>
> #include <asm/setup.h> /* COMMAND_LINE_SIZE and kaslr_offset() */
>
> @@ -6007,6 +6008,27 @@ struct trace_scratch {
>
> static DEFINE_MUTEX(scratch_mutex);
>
> +unsigned long trace_adjust_address(struct trace_array *tr, unsigned long addr)
> +{
> + struct trace_scratch *tscratch;
> + int i;
> +
> + /* If we don't have last boot delta, return the address */
> + if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> + return addr;
> +
> + tscratch = tr->scratch;
> + if (tscratch && tr->module_delta && tscratch->entries[0].mod_addr < addr) {
> + /* Note that entries are sorted */
> + for (i = 0; i < tr->nr_modules; i++)
> + if (addr < tscratch->entries[i].mod_addr)
> + break;
> + return addr + tr->module_delta[i - 1];
> + }
> +
> + return addr + tr->text_delta;
> +}
> +
> static int save_mod(struct module *mod, void *data)
> {
> struct trace_array *tr = data;
> @@ -6068,6 +6090,9 @@ static void update_last_data(struct trace_array *tr)
>
> /* Using current data now */
> tr->text_delta = 0;
> + kfree(tr->module_delta);
> + tr->module_delta = NULL;
> + tr->nr_modules = 0;
>
> if (!tr->scratch)
> return;
The above could probably go after the check for !tr->scratch.
> @@ -9351,10 +9376,37 @@ static struct dentry *trace_instance_dir;
> static void
> init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer);
>
> +static int make_mod_delta(struct module *mod, void *data)
> +{
> + struct trace_scratch *tscratch;
> + struct trace_mod_entry *entry;
> + struct trace_array *tr = data;
> + int i;
> +
> + tscratch = tr->scratch;
> + for (i = 0; i < tr->nr_modules; i++) {
> + entry = &tscratch->entries[i];
> + if (!strcmp(mod->name, entry->mod_name)) {
> + tr->module_delta[i] = (unsigned long)mod->mem[MOD_TEXT].base - entry->mod_addr;
> + return 1;
Returning 1 causes the module_loop to stop. That is, this will only process
the first module it finds and then no more after that.
> + }
> + }
Doesn't this assume that we have the same modules loaded?
Note, I do plan on adding another field in the trace_scratch structure that
has the uname, then only allow the ascii trace to use the current kallsyms
if the uname matches.
> + return 0;
> +}
> +
> +static int mod_addr_comp(const void *a, const void *b, const void *data)
> +{
> + const struct trace_mod_entry *e1 = a;
> + const struct trace_mod_entry *e2 = b;
> +
> + return e1->mod_addr - e2->mod_addr;
Hmm, could this possibly cause an overflow issue? If the two addresses are
more than 32 bits apart?
> +}
> +
> static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned int size)
> {
> struct trace_scratch *tscratch = scratch;
> struct trace_mod_entry *entry;
> + int i, nr_entries;
>
> if (!scratch)
> return;
> @@ -9371,7 +9423,7 @@ static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned
> goto reset;
>
> /* Check if each module name is a valid string */
> - for (int i = 0; i < tscratch->nr_entries; i++) {
> + for (i = 0; i < tscratch->nr_entries; i++) {
> int n;
>
> entry = &tscratch->entries[i];
> @@ -9385,6 +9437,20 @@ static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned
> if (n == MODULE_NAME_LEN)
> goto reset;
> }
> + nr_entries = i;
> + /* Allocate module delta array */
> + tr->module_delta = kcalloc(nr_entries, sizeof(long), GFP_KERNEL);
> + if (!tr->module_delta)
I wonder if this should trigger a WARN_ON()?
> + goto reset;
> + tr->nr_modules = nr_entries;
> +
> + /* Sort module table by base address. */
> + sort_r(tscratch->entries, nr_entries, sizeof(struct trace_mod_entry),
> + mod_addr_comp, NULL, NULL);
> +
> + /* Scan modules */
> + module_for_each_mod(make_mod_delta, tr);
> +
> return;
> reset:
> /* Invalid trace modules */
BTW, here's my removal patch:
-- Steve
From 521bc0e5286c93eb4be981d22a6c05043a78b555 Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt@goodmis.org>
Date: Wed, 5 Feb 2025 11:55:51 -0500
Subject: [PATCH] tracing: Update modules to persistent instances when removed
When a module is removed, remove it from the list of saved modules in the
persistent memory to any persistent instance that is currently active.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.c | 69 +++++++++++++++++++++++++++++++++++++++++---
1 file changed, 65 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 443f2bc5b856..13eeb8df8b8c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6001,6 +6001,7 @@ struct trace_mod_entry {
struct trace_scratch {
unsigned long kaslr_addr;
+ unsigned long first_free_slot;
unsigned long nr_entries;
struct trace_mod_entry entries[];
};
@@ -6012,6 +6013,7 @@ static int save_mod(struct module *mod, void *data)
struct trace_array *tr = data;
struct trace_scratch *tscratch;
struct trace_mod_entry *entry;
+ unsigned int idx;
unsigned int size;
guard(mutex)(&scratch_mutex);
@@ -6021,16 +6023,60 @@ static int save_mod(struct module *mod, void *data)
return -1;
size = tr->scratch_size;
- if (struct_size(tscratch, entries, tscratch->nr_entries + 1) > size)
- return -1;
+ idx = tscratch->first_free_slot < tscratch->nr_entries ?
+ tscratch->first_free_slot : tscratch->nr_entries;
- entry = &tscratch->entries[tscratch->nr_entries];
+ if (struct_size(tscratch, entries, idx + 1) > size)
+ return -1;
- tscratch->nr_entries++;
+ entry = &tscratch->entries[idx];
entry->mod_addr = (unsigned long)mod->mem[MOD_TEXT].base;
strscpy(entry->mod_name, mod->name);
+ if (idx == tscratch->nr_entries)
+ tscratch->nr_entries++;
+
+ for (idx++; idx < tscratch->nr_entries; idx++) {
+ entry = &tscratch->entries[idx];
+ if (!entry->mod_addr)
+ break;
+ }
+
+ tscratch->first_free_slot = idx;
+
+ return 0;
+}
+
+static int remove_mod(struct module *mod, void *data)
+{
+ struct trace_array *tr = data;
+ struct trace_scratch *tscratch;
+ struct trace_mod_entry *entry;
+ unsigned int idx;
+ unsigned int size;
+
+ guard(mutex)(&scratch_mutex);
+
+ tscratch = tr->scratch;
+ if (!tscratch)
+ return -1;
+ size = tr->scratch_size;
+
+ for (idx = 0; idx < tscratch->nr_entries; idx++) {
+ entry = &tscratch->entries[idx];
+ if (entry->mod_addr == (unsigned long)mod->mem[MOD_TEXT].base)
+ break;
+ }
+ if (idx == tscratch->nr_entries)
+ return -1;
+
+ if (idx < tscratch->first_free_slot)
+ tscratch->first_free_slot = idx;
+
+ entry->mod_addr = 0;
+ entry->mod_name[0] = '\0';
+
return 0;
}
@@ -10112,6 +10158,20 @@ static void trace_module_record(struct module *mod)
}
}
+static void trace_module_delete(struct module *mod)
+{
+ struct trace_array *tr;
+
+ list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+ /* Update any persistent trace array that has already been started */
+ if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
+ TRACE_ARRAY_FL_BOOT) {
+ if (trace_array_active(tr))
+ remove_mod(mod, tr);
+ }
+ }
+}
+
static int trace_module_notify(struct notifier_block *self,
unsigned long val, void *data)
{
@@ -10124,6 +10184,7 @@ static int trace_module_notify(struct notifier_block *self,
break;
case MODULE_STATE_GOING:
trace_module_remove_evals(mod);
+ trace_module_delete(mod);
break;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 8/8] tracing: Update modules to persistent instances when loaded
2025-02-06 17:18 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
@ 2025-02-07 0:47 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-07 0:47 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 12:18:20 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 7 Feb 2025 01:53:30 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > On Thu, 6 Feb 2025 10:36:12 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > On Thu, 6 Feb 2025 19:01:24 +0900
> > > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > >
> > > > > +static void trace_module_record(struct module *mod)
> > > > > +{
> > > > > + struct trace_array *tr;
> > > > > +
> > > > > + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> > > > > + /* Update any persistent trace array that has already been started */
> > > > > + if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
> > > > > + TRACE_ARRAY_FL_BOOT) {
> > > > > + /* Only update if the trace array is active */
> > > > > + if (trace_array_active(tr))
> > > >
> > > > Do we really need this check? It seems that we can just save_mod() if the
> > > > above condition is true.
> > >
> > > It gets a little more complicated if we need to add and remove modules.
> >
> > Yeah, but for converting the module address, we don't want to see other
> > module information.
>
> But we want to see the module information for modules that were loaded
> during the trace. If a module is removed and suddenly the system crashed,
> we just lost that information. Hence why I reset the module information
> when the tracing starts.
Then, what about removing module info if the address is recycled for
new module? We can keep it until the same address range is used by other
module(s).
>
>
> > > If we want to have events for modules that were removed. Note, the ring
> > > buffer is cleared if any module event was ever enabled and then the module
> > > is removed, as how to print it is removed too. But we could disable that
> > > for the persistent ring buffers as they should not be using the default
> > > trace event print format anyway.
> >
> > Yeah, if the event is on the module the buffer is cleared.
> > But the module address can be in the stacktrace. In that case, the event
> > in the module is not enabled, but other events like sched_switch can
> > take stacktrace which can include the module address. In that case, the
> > buffer is also cleared when the module is removed?
>
> No. The buffer is only cleared if one of its events were ever enabled. If
> no event within the module was enabled, then the buffer is not cleared.
>
> >
> > > As for stack traces, we still want the module it was for when the stack
> > > trace happens. A common bug we see is when a module is removed, it can
> > > cause other bugs. We want to know about modules that were removed. Keeping
> > > that information about removed modules will allow us to see what functions
> > > were called by a stack trace for a module that was removed.
> >
> > Hmm, but that should be covered by module load/unload events?
>
> If we have them enabled.
Yes, and current module load event does not cover the address. I think
we can add a new event to log it.
Thank you,
>
> -- Steve
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset
2025-02-06 15:24 ` Steven Rostedt
@ 2025-02-07 0:58 ` Masami Hiramatsu
2025-02-07 1:03 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-07 0:58 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 10:24:25 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 6 Feb 2025 14:22:32 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > On Wed, 05 Feb 2025 17:50:35 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > From: Steven Rostedt <rostedt@goodmis.org>
> > >
> > > There's no reason to save the KASLR offset for the ring buffer itself.
> > > That is used by the tracer. Now that the tracer has a way to save data in
> > > the persistent memory of the ring buffer, have the tracing infrastructure
> > > take care of the saving of the KASLR offset.
> > >
> >
> > Looks good to me. But note that the scratchpad size may not enough for
> > module table later, because 1 module requires at least the name[]
> > (64byte - sizeof(ulong)) and the base address (ulong). This means
> > 1 entry consumes 64byte. Thus there can be only 63 entries + meta
> > data in 4K page. My ubuntu loads 189(!) modules;
> >
> > $ lsmod | wc -l
> > 190
> >
> > so we want 255 entries, which requires 16KB.
>
> So, I was thinking of modifying the allocation of the persistent ring
> buffer, which currently is
>
> #define ring_buffer_alloc_range(size, flags, order, start, range_size)
>
> [ it's a macro to add lockdep key information in it ]
>
> But I should change it to include a scratch size, and allow the tracing
> system to define how much of the range it should allocate for scratch.
>
> Then we could do:
>
> buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
> tr->range_addr_start,
> tr->range_addr_size,
> struct_size(tscratch, entries, 128));
>
> Which would make sure that the scratch size contains enough memory to hold
> 128 modules.
Yeah, this idea looks godd to me. BTW, the scratch size will be aligned to
the subbuffer size (or page size?)
Thanks,
>
> -- Steve
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset
2025-02-07 0:58 ` Masami Hiramatsu
@ 2025-02-07 1:03 ` Steven Rostedt
2025-02-07 2:15 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-07 1:03 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Fri, 7 Feb 2025 09:58:44 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > Which would make sure that the scratch size contains enough memory to hold
> > 128 modules.
>
> Yeah, this idea looks godd to me. BTW, the scratch size will be aligned to
> the subbuffer size (or page size?)
I don't think it needs to be. I think it just needs to be aligned to word
size.
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/3] tracing: Show last module text symbols in the stacktrace
2025-02-06 17:46 ` Steven Rostedt
@ 2025-02-07 1:50 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-07 1:50 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 12:46:31 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 7 Feb 2025 01:59:05 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
>
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >
> > Since the previous boot trace buffer can include module text address in
> > the stacktrace. As same as the kernel text address, convert the module
> > text address using the module address information.
>
> Note, this doesn't look like it depends on the first two patches, which I
> don't think we want yet. As that assumes we never disable the buffer once
> we start it, and may start it again. But that's a debate we can have later.
Yes, the first 2 patches are kind of optimization.
>
> >
> > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > ---
> > kernel/trace/trace.c | 76 +++++++++++++++++++++++++++++++++++++++++--
> > kernel/trace/trace.h | 4 ++
> > kernel/trace/trace_output.c | 3 +-
> > 3 files changed, 78 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 5a064e712fd7..9d942b7c2651 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -49,6 +49,7 @@
> > #include <linux/fsnotify.h>
> > #include <linux/irq_work.h>
> > #include <linux/workqueue.h>
> > +#include <linux/sort.h>
> >
> > #include <asm/setup.h> /* COMMAND_LINE_SIZE and kaslr_offset() */
> >
> > @@ -6007,6 +6008,27 @@ struct trace_scratch {
> >
> > static DEFINE_MUTEX(scratch_mutex);
> >
> > +unsigned long trace_adjust_address(struct trace_array *tr, unsigned long addr)
> > +{
> > + struct trace_scratch *tscratch;
> > + int i;
> > +
> > + /* If we don't have last boot delta, return the address */
> > + if (!(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> > + return addr;
> > +
> > + tscratch = tr->scratch;
> > + if (tscratch && tr->module_delta && tscratch->entries[0].mod_addr < addr) {
> > + /* Note that entries are sorted */
> > + for (i = 0; i < tr->nr_modules; i++)
> > + if (addr < tscratch->entries[i].mod_addr)
> > + break;
> > + return addr + tr->module_delta[i - 1];
> > + }
> > +
> > + return addr + tr->text_delta;
> > +}
> > +
> > static int save_mod(struct module *mod, void *data)
> > {
> > struct trace_array *tr = data;
> > @@ -6068,6 +6090,9 @@ static void update_last_data(struct trace_array *tr)
> >
> > /* Using current data now */
> > tr->text_delta = 0;
> > + kfree(tr->module_delta);
> > + tr->module_delta = NULL;
> > + tr->nr_modules = 0;
> >
> > if (!tr->scratch)
> > return;
>
> The above could probably go after the check for !tr->scratch.
Indeed.
So TRACE_ARRAY_FL_LAST_BOOT means the buffer is on a reserved memory and it
still have the previous boot data. OTOH, `tr->scratch` means the buffer has
scratch pad area. And currently last_boot_info related fields are used
only if there is a scratch pad area. Currently the reserved memory based
buffer must have the scratch pad, but you might expect there could be
the previous boot data without scratch pad area in the future?
>
> > @@ -9351,10 +9376,37 @@ static struct dentry *trace_instance_dir;
> > static void
> > init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer);
> >
> > +static int make_mod_delta(struct module *mod, void *data)
> > +{
> > + struct trace_scratch *tscratch;
> > + struct trace_mod_entry *entry;
> > + struct trace_array *tr = data;
> > + int i;
> > +
> > + tscratch = tr->scratch;
> > + for (i = 0; i < tr->nr_modules; i++) {
> > + entry = &tscratch->entries[i];
> > + if (!strcmp(mod->name, entry->mod_name)) {
> > + tr->module_delta[i] = (unsigned long)mod->mem[MOD_TEXT].base - entry->mod_addr;
> > + return 1;
>
> Returning 1 causes the module_loop to stop. That is, this will only process
> the first module it finds and then no more after that.
Ah, good catch! I must have been half asleep. :(
>
> > + }
> > + }
>
> Doesn't this assume that we have the same modules loaded?
Yes, I thought that was your idea to show the symbols if the same module
is loaded. We can use
>
> Note, I do plan on adding another field in the trace_scratch structure that
> has the uname, then only allow the ascii trace to use the current kallsyms
> if the uname matches.
Yeah, it is OK. I thought the build-id is useful, but uname is also good.
>
> > + return 0;
> > +}
> > +
> > +static int mod_addr_comp(const void *a, const void *b, const void *data)
> > +{
> > + const struct trace_mod_entry *e1 = a;
> > + const struct trace_mod_entry *e2 = b;
> > +
> > + return e1->mod_addr - e2->mod_addr;
>
> Hmm, could this possibly cause an overflow issue? If the two addresses are
> more than 32 bits apart?
I guess the modules are loaded in the same area (mostly in vmalloc) and
maybe the area size is under 2GB. Anyway, we can shift down it by page size.
>
> > +}
> > +
> > static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned int size)
> > {
> > struct trace_scratch *tscratch = scratch;
> > struct trace_mod_entry *entry;
> > + int i, nr_entries;
> >
> > if (!scratch)
> > return;
> > @@ -9371,7 +9423,7 @@ static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned
> > goto reset;
> >
> > /* Check if each module name is a valid string */
> > - for (int i = 0; i < tscratch->nr_entries; i++) {
> > + for (i = 0; i < tscratch->nr_entries; i++) {
> > int n;
> >
> > entry = &tscratch->entries[i];
> > @@ -9385,6 +9437,20 @@ static void setup_trace_scratch(struct trace_array *tr, void *scratch, unsigned
> > if (n == MODULE_NAME_LEN)
> > goto reset;
> > }
> > + nr_entries = i;
> > + /* Allocate module delta array */
> > + tr->module_delta = kcalloc(nr_entries, sizeof(long), GFP_KERNEL);
> > + if (!tr->module_delta)
>
> I wonder if this should trigger a WARN_ON()?
I think this module_delta is optionally decode the previous boot data.
So maybe pr_info() is enough.
>
> > + goto reset;
> > + tr->nr_modules = nr_entries;
> > +
> > + /* Sort module table by base address. */
> > + sort_r(tscratch->entries, nr_entries, sizeof(struct trace_mod_entry),
> > + mod_addr_comp, NULL, NULL);
> > +
> > + /* Scan modules */
> > + module_for_each_mod(make_mod_delta, tr);
> > +
> > return;
> > reset:
> > /* Invalid trace modules */
>
>
> BTW, here's my removal patch:
Thank you! and as I discussed above, I think we can postpone the module
removal until the next module is loaded on the same address.
(or, we can use the scratch pad as a log structured table, in this case
we can set a removal flag and replace the oldest removed one. Something
like LRU.)
Thanks,
>
> -- Steve
>
>
> From 521bc0e5286c93eb4be981d22a6c05043a78b555 Mon Sep 17 00:00:00 2001
> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Wed, 5 Feb 2025 11:55:51 -0500
> Subject: [PATCH] tracing: Update modules to persistent instances when removed
>
> When a module is removed, remove it from the list of saved modules in the
> persistent memory to any persistent instance that is currently active.
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> kernel/trace/trace.c | 69 +++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 65 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 443f2bc5b856..13eeb8df8b8c 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -6001,6 +6001,7 @@ struct trace_mod_entry {
>
> struct trace_scratch {
> unsigned long kaslr_addr;
> + unsigned long first_free_slot;
> unsigned long nr_entries;
> struct trace_mod_entry entries[];
> };
> @@ -6012,6 +6013,7 @@ static int save_mod(struct module *mod, void *data)
> struct trace_array *tr = data;
> struct trace_scratch *tscratch;
> struct trace_mod_entry *entry;
> + unsigned int idx;
> unsigned int size;
>
> guard(mutex)(&scratch_mutex);
> @@ -6021,16 +6023,60 @@ static int save_mod(struct module *mod, void *data)
> return -1;
> size = tr->scratch_size;
>
> - if (struct_size(tscratch, entries, tscratch->nr_entries + 1) > size)
> - return -1;
> + idx = tscratch->first_free_slot < tscratch->nr_entries ?
> + tscratch->first_free_slot : tscratch->nr_entries;
>
> - entry = &tscratch->entries[tscratch->nr_entries];
> + if (struct_size(tscratch, entries, idx + 1) > size)
> + return -1;
>
> - tscratch->nr_entries++;
> + entry = &tscratch->entries[idx];
>
> entry->mod_addr = (unsigned long)mod->mem[MOD_TEXT].base;
> strscpy(entry->mod_name, mod->name);
>
> + if (idx == tscratch->nr_entries)
> + tscratch->nr_entries++;
> +
> + for (idx++; idx < tscratch->nr_entries; idx++) {
> + entry = &tscratch->entries[idx];
> + if (!entry->mod_addr)
> + break;
> + }
> +
> + tscratch->first_free_slot = idx;
> +
> + return 0;
> +}
> +
> +static int remove_mod(struct module *mod, void *data)
> +{
> + struct trace_array *tr = data;
> + struct trace_scratch *tscratch;
> + struct trace_mod_entry *entry;
> + unsigned int idx;
> + unsigned int size;
> +
> + guard(mutex)(&scratch_mutex);
> +
> + tscratch = tr->scratch;
> + if (!tscratch)
> + return -1;
> + size = tr->scratch_size;
> +
> + for (idx = 0; idx < tscratch->nr_entries; idx++) {
> + entry = &tscratch->entries[idx];
> + if (entry->mod_addr == (unsigned long)mod->mem[MOD_TEXT].base)
> + break;
> + }
> + if (idx == tscratch->nr_entries)
> + return -1;
> +
> + if (idx < tscratch->first_free_slot)
> + tscratch->first_free_slot = idx;
> +
> + entry->mod_addr = 0;
> + entry->mod_name[0] = '\0';
> +
> return 0;
> }
>
> @@ -10112,6 +10158,20 @@ static void trace_module_record(struct module *mod)
> }
> }
>
> +static void trace_module_delete(struct module *mod)
> +{
> + struct trace_array *tr;
> +
> + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> + /* Update any persistent trace array that has already been started */
> + if ((tr->flags & (TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT)) ==
> + TRACE_ARRAY_FL_BOOT) {
> + if (trace_array_active(tr))
> + remove_mod(mod, tr);
> + }
> + }
> +}
> +
> static int trace_module_notify(struct notifier_block *self,
> unsigned long val, void *data)
> {
> @@ -10124,6 +10184,7 @@ static int trace_module_notify(struct notifier_block *self,
> break;
> case MODULE_STATE_GOING:
> trace_module_remove_evals(mod);
> + trace_module_delete(mod);
> break;
> }
>
> --
> 2.45.2
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 7/8] tracing: Show module names and addresses of last boot
2025-02-05 22:50 ` [PATCH 7/8] tracing: Show module names and addresses of last boot Steven Rostedt
@ 2025-02-07 1:51 ` Masami Hiramatsu
2025-02-07 2:02 ` Steven Rostedt
0 siblings, 1 reply; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-07 1:51 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton
On Wed, 05 Feb 2025 17:50:38 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Add the last boot module's names and addresses to the last_boot_info file.
> This only shows the module information from a previous boot. If the buffer
> is started and is recording the current boot, this file still will only
> show "current".
>
> # cat instances/boot_mapped/last_boot_info
> Offset: 10c00000
> ffffffffc00ca000 usb_serial_simple
> ffffffffc00ae000 usbserial
> ffffffffc008b000 bfq
This is just a suggestion, what about using the same format for all
lines? For example,
# cat instances/boot_mapped/last_boot_info
10c00000 [kernel]
ffffffffc00ca000 usb_serial_simple
ffffffffc00ae000 usbserial
ffffffffc008b000 bfq
>
> # echo function > instances/boot_mapped/current_tracer
> # cat instances/boot_mapped/last_boot_info
> Offset: current
And this one is empty if it is already for the current.
(because last boot information is cleared)
Thank you,
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> kernel/trace/trace.c | 103 +++++++++++++++++++++++++++++++++++++------
> 1 file changed, 90 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index a8e5f7ac2193..7b4027804cd4 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -6005,6 +6005,8 @@ struct trace_scratch {
> struct trace_mod_entry entries[];
> };
>
> +static DEFINE_MUTEX(scratch_mutex);
> +
> static int save_mod(struct module *mod, void *data)
> {
> struct trace_array *tr = data;
> @@ -6012,6 +6014,8 @@ static int save_mod(struct module *mod, void *data)
> struct trace_mod_entry *entry;
> unsigned int size;
>
> + guard(mutex)(&scratch_mutex);
> +
> tscratch = tr->scratch;
> if (!tscratch)
> return -1;
> @@ -6882,15 +6886,47 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
> return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> }
>
> -static ssize_t
> -tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
> +#define LAST_BOOT_HEADER ((void *)1)
> +
> +static void *l_next(struct seq_file *m, void *v, loff_t *pos)
> {
> - struct trace_array *tr = filp->private_data;
> + struct trace_array *tr = m->private;
> struct trace_scratch *tscratch = tr->scratch;
> - struct seq_buf seq;
> - char buf[64];
> + unsigned int index = *pos;
> +
> + (*pos)++;
> +
> + if (*pos == 1)
> + return LAST_BOOT_HEADER;
> +
> + /* Only show offsets of the last boot data */
> + if (!tscratch || !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> + return NULL;
> +
> + /* *pos 0 is for the header, 1 is for the first module */
> + index--;
> +
> + if (index >= tscratch->nr_entries)
> + return NULL;
>
> - seq_buf_init(&seq, buf, 64);
> + return &tscratch->entries[index];
> +}
> +
> +static void *l_start(struct seq_file *m, loff_t *pos)
> +{
> + mutex_lock(&scratch_mutex);
> +
> + return l_next(m, NULL, pos);
> +}
> +
> +static void l_stop(struct seq_file *m, void *p)
> +{
> + mutex_unlock(&scratch_mutex);
> +}
> +
> +static void show_last_boot_header(struct seq_file *m, struct trace_array *tr)
> +{
> + struct trace_scratch *tscratch = tr->scratch;
>
> /*
> * Do not leak KASLR address. This only shows the KASLR address of
> @@ -6900,11 +6936,52 @@ tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t
> * should not be the same as the current boot.
> */
> if (!tscratch || !(tr->flags & TRACE_ARRAY_FL_LAST_BOOT))
> - seq_buf_puts(&seq, "Offset: current\n");
> + seq_puts(m, "Offset: current\n");
> else
> - seq_buf_printf(&seq, "Offset: %lx\n", tscratch->kaslr_addr);
> + seq_printf(m, "Offset: %lx\n", tscratch->kaslr_addr);
> +}
>
> - return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq));
> +static int l_show(struct seq_file *m, void *v)
> +{
> + struct trace_array *tr = m->private;
> + struct trace_mod_entry *entry = v;
> +
> + if (v == LAST_BOOT_HEADER) {
> + show_last_boot_header(m, tr);
> + return 0;
> + }
> +
> + seq_printf(m, "%lx %s\n", entry->mod_addr, entry->mod_name);
> + return 0;
> +}
> +
> +static const struct seq_operations last_boot_seq_ops = {
> + .start = l_start,
> + .next = l_next,
> + .stop = l_stop,
> + .show = l_show,
> +};
> +
> +static int tracing_last_boot_open(struct inode *inode, struct file *file)
> +{
> + struct trace_array *tr = inode->i_private;
> + struct seq_file *m;
> + int ret;
> +
> + ret = tracing_check_open_get_tr(tr);
> + if (ret)
> + return ret;
> +
> + ret = seq_open(file, &last_boot_seq_ops);
> + if (ret) {
> + trace_array_put(tr);
> + return ret;
> + }
> +
> + m = file->private_data;
> + m->private = tr;
> +
> + return 0;
> }
>
> static int tracing_buffer_meta_open(struct inode *inode, struct file *filp)
> @@ -7533,10 +7610,10 @@ static const struct file_operations trace_time_stamp_mode_fops = {
> };
>
> static const struct file_operations last_boot_fops = {
> - .open = tracing_open_generic_tr,
> - .read = tracing_last_boot_read,
> - .llseek = generic_file_llseek,
> - .release = tracing_release_generic_tr,
> + .open = tracing_last_boot_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = tracing_seq_release,
> };
>
> #ifdef CONFIG_TRACER_SNAPSHOT
> --
> 2.45.2
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 7/8] tracing: Show module names and addresses of last boot
2025-02-07 1:51 ` Masami Hiramatsu
@ 2025-02-07 2:02 ` Steven Rostedt
2025-02-07 2:25 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-07 2:02 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Fri, 7 Feb 2025 10:51:11 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > # cat instances/boot_mapped/last_boot_info
> > Offset: 10c00000
> > ffffffffc00ca000 usb_serial_simple
> > ffffffffc00ae000 usbserial
> > ffffffffc008b000 bfq
>
> This is just a suggestion, what about using the same format for all
> lines? For example,
>
> # cat instances/boot_mapped/last_boot_info
> 10c00000 [kernel]
> ffffffffc00ca000 usb_serial_simple
> ffffffffc00ae000 usbserial
> ffffffffc008b000 bfq
This can be confusing because the first is an offset, and the others are
addresses.
>
> >
> > # echo function > instances/boot_mapped/current_tracer
> > # cat instances/boot_mapped/last_boot_info
> > Offset: current
>
> And this one is empty if it is already for the current.
> (because last boot information is cleared)
Hmm, I have to think about that.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset
2025-02-07 1:03 ` Steven Rostedt
@ 2025-02-07 2:15 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-07 2:15 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 20:03:16 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 7 Feb 2025 09:58:44 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > > Which would make sure that the scratch size contains enough memory to hold
> > > 128 modules.
> >
> > Yeah, this idea looks godd to me. BTW, the scratch size will be aligned to
> > the subbuffer size (or page size?)
>
> I don't think it needs to be. I think it just needs to be aligned to word
> size.
Ah, OK. The first cpu_meta data is not need to be aligned.
Thank you,
>
> -- Steve
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 7/8] tracing: Show module names and addresses of last boot
2025-02-07 2:02 ` Steven Rostedt
@ 2025-02-07 2:25 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-02-07 2:25 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Thu, 6 Feb 2025 21:02:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 7 Feb 2025 10:51:11 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > > # cat instances/boot_mapped/last_boot_info
> > > Offset: 10c00000
> > > ffffffffc00ca000 usb_serial_simple
> > > ffffffffc00ae000 usbserial
> > > ffffffffc008b000 bfq
> >
> > This is just a suggestion, what about using the same format for all
> > lines? For example,
> >
> > # cat instances/boot_mapped/last_boot_info
> > 10c00000 [kernel]
> > ffffffffc00ca000 usb_serial_simple
> > ffffffffc00ae000 usbserial
> > ffffffffc008b000 bfq
>
> This can be confusing because the first is an offset, and the others are
> addresses.
I think both are the offset from the viewpoint of `nm` results. :)
$ nm vmlinux | grep " t " | sort | head -n 5
ffffffff81000820 t __pfx_devkmsg_emit
ffffffff81000830 t devkmsg_emit
ffffffff81000a80 t __pfx_io_queue_deferred
ffffffff81000a90 t io_queue_deferred
ffffffff81000ca0 t __pfx_io_uring_drop_tctx_refs
$ nm samples/trace_events/trace-events-sample.ko | grep " t " | sort | head -n 5
0000000000000440 t __pfx_trace_event_raw_event_foo_bar
0000000000000450 t trace_event_raw_event_foo_bar
0000000000000780 t __pfx_perf_trace_foo_bar
0000000000000790 t perf_trace_foo_bar
0000000000000af0 t __pfx_trace_event_raw_event_foo_bar_with_cond
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-06 15:27 ` Steven Rostedt
@ 2025-02-10 13:04 ` Petr Pavlu
2025-02-10 14:08 ` Sebastian Andrzej Siewior
2025-02-14 22:30 ` Steven Rostedt
1 sibling, 1 reply; 45+ messages in thread
From: Petr Pavlu @ 2025-02-10 13:04 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton, Luis Chamberlain, Sami Tolvanen, Daniel Gomez,
linux-modules, Sebastian Andrzej Siewior
On 2/6/25 16:27, Steven Rostedt wrote:
> On Thu, 6 Feb 2025 14:28:17 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
>>> --- a/kernel/module/main.c
>>> +++ b/kernel/module/main.c
>>> @@ -3809,6 +3809,20 @@ bool is_module_text_address(unsigned long addr)
>>> return ret;
>>> }
>>>
>>
>> It is better to add a kerneldoc for this API.
>
> Agreed, but I was planning on this changing. Waiting to hear from the
> module maintainers.
>
>>
>> /**
>> * module_for_each_mod() - iterate all modules
>> * @func: Callback function
>> * @data: User data
>> *
>> * Call the @func with each module in the system. If @func returns !0, this
>> * stops itrating. Note that @func must not sleep since it is called under
>> * the preemption disabled.
>> */
>>
>> BTW, do we really need to disable preempt or is it enough to call
>> rcu_read_lock()?
>
> Bah, as I expected this function to be changed, I didn't spend too much
> time on looking at its implementation. I just cut and pasted how the other
> loops worked. But yes, it should not be disabling preemption. In fact, I
> think the module code itself should not be disabling preemption!
>
> I'll have to go and look into that.
The series "module: Use RCU instead of RCU-sched" from Sebastian Andrzej
Siewior cleans this up [1]. It is currently queued on modules-next (for
6.15-rc1).
The new function module_for_each_mod() should then use "guard(rcu)();".
[1] https://lore.kernel.org/linux-modules/20250108090457.512198-1-bigeasy@linutronix.de/
--
Thanks,
Petr
>>> +void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
>>> +{
>>> + struct module *mod;
>>> +
>>> + preempt_disable();
>>> + list_for_each_entry_rcu(mod, &modules, list) {
>>> + if (mod->state == MODULE_STATE_UNFORMED)
>>> + continue;
>>> + if (func(mod, data))
>>> + break;
>>> + }
>>> + preempt_enable();
>>> +}
>>> +
>>> /**
>>> * __module_text_address() - get the module whose code contains an address.
>>> * @addr: the address.
>>> --
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-10 13:04 ` Petr Pavlu
@ 2025-02-10 14:08 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-10 14:08 UTC (permalink / raw)
To: Petr Pavlu
Cc: Steven Rostedt, Masami Hiramatsu (Google), linux-kernel,
linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton, Luis Chamberlain, Sami Tolvanen, Daniel Gomez,
linux-modules
On 2025-02-10 14:04:35 [+0100], Petr Pavlu wrote:
> >> BTW, do we really need to disable preempt or is it enough to call
> >> rcu_read_lock()?
> >
> > Bah, as I expected this function to be changed, I didn't spend too much
> > time on looking at its implementation. I just cut and pasted how the other
> > loops worked. But yes, it should not be disabling preemption. In fact, I
> > think the module code itself should not be disabling preemption!
> >
> > I'll have to go and look into that.
>
> The series "module: Use RCU instead of RCU-sched" from Sebastian Andrzej
> Siewior cleans this up [1]. It is currently queued on modules-next (for
> 6.15-rc1).
>
> The new function module_for_each_mod() should then use "guard(rcu)();".
So the removal of the preempt-disable statements here already pays off.
Nice.
Sebastian
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-06 15:27 ` Steven Rostedt
2025-02-10 13:04 ` Petr Pavlu
@ 2025-02-14 22:30 ` Steven Rostedt
2025-02-18 21:21 ` Luis Chamberlain
1 sibling, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-14 22:30 UTC (permalink / raw)
To: Masami Hiramatsu (Google), Luis Chamberlain
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton, Petr Pavlu, Sami Tolvanen, Daniel Gomez,
linux-modules
On Thu, 6 Feb 2025 10:27:20 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > BTW, do we really need to disable preempt or is it enough to call
> > rcu_read_lock()?
>
> Bah, as I expected this function to be changed, I didn't spend too much
> time on looking at its implementation. I just cut and pasted how the other
> loops worked. But yes, it should not be disabling preemption. In fact, I
> think the module code itself should not be disabling preemption!
>
> I'll have to go and look into that.
It really looks like it requires preempt_disable(), as the code in
kernel/module/main.c has in several places:
preempt_disable();
list_for_each_entry_rcu(mod, &modules, list) {
[..]
}
preempt_enable();
Or
module_assert_mutex_or_preempt();
[..]
list_for_each_entry_rcu(mod, &modules, list,
lockdep_is_held(&module_mutex)) {
So it looks like it either requires preempt_disable or holding the
module_mutex.
As I need to call this with trace_types_lock held, and there's a place
where trace_types_lock is within a module callback, I don't think it's safe
to take that lock in that loop, otherwise we have the ABBA deadlock.
Luis,
Is this patch OK, and also is there any plan to move the module code to
using just rcu_read_lock instead of preempt_disable?
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-14 22:30 ` Steven Rostedt
@ 2025-02-18 21:21 ` Luis Chamberlain
2025-02-18 21:29 ` Steven Rostedt
2025-02-19 0:24 ` Steven Rostedt
0 siblings, 2 replies; 45+ messages in thread
From: Luis Chamberlain @ 2025-02-18 21:21 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu (Google), linux-kernel, linux-trace-kernel,
Mark Rutland, Mathieu Desnoyers, Andrew Morton, Petr Pavlu,
Sami Tolvanen, Daniel Gomez, linux-modules
On Fri, Feb 14, 2025 at 05:30:17PM -0500, Steven Rostedt wrote:
> On Thu, 6 Feb 2025 10:27:20 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > > BTW, do we really need to disable preempt or is it enough to call
> > > rcu_read_lock()?
> >
> > Bah, as I expected this function to be changed, I didn't spend too much
> > time on looking at its implementation. I just cut and pasted how the other
> > loops worked. But yes, it should not be disabling preemption. In fact, I
> > think the module code itself should not be disabling preemption!
> >
> > I'll have to go and look into that.
>
> It really looks like it requires preempt_disable(), as the code in
> kernel/module/main.c has in several places:
>
> preempt_disable();
>
> list_for_each_entry_rcu(mod, &modules, list) {
> [..]
> }
>
> preempt_enable();
>
> Or
>
> module_assert_mutex_or_preempt();
>
> [..]
>
> list_for_each_entry_rcu(mod, &modules, list,
> lockdep_is_held(&module_mutex)) {
>
>
> So it looks like it either requires preempt_disable or holding the
> module_mutex.
>
> As I need to call this with trace_types_lock held, and there's a place
> where trace_types_lock is within a module callback, I don't think it's safe
> to take that lock in that loop, otherwise we have the ABBA deadlock.
>
> Luis,
>
> Is this patch OK, and also is there any plan to move the module code to
> using just rcu_read_lock instead of preempt_disable?
The patch is not OK, you're looking at old code, look at
modules-next and as Petr suggested look at Sebastian's recently
merged work.
git remote add korg-modules git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux.git
Luis
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-18 21:21 ` Luis Chamberlain
@ 2025-02-18 21:29 ` Steven Rostedt
2025-02-19 0:24 ` Steven Rostedt
1 sibling, 0 replies; 45+ messages in thread
From: Steven Rostedt @ 2025-02-18 21:29 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Masami Hiramatsu (Google), linux-kernel, linux-trace-kernel,
Mark Rutland, Mathieu Desnoyers, Andrew Morton, Petr Pavlu,
Sami Tolvanen, Daniel Gomez, linux-modules
On Tue, 18 Feb 2025 13:21:20 -0800
Luis Chamberlain <mcgrof@kernel.org> wrote:
> > Luis,
> >
> > Is this patch OK, and also is there any plan to move the module code to
> > using just rcu_read_lock instead of preempt_disable?
>
> The patch is not OK, you're looking at old code, look at
> modules-next and as Petr suggested look at Sebastian's recently
> merged work.
>
> git remote add korg-modules git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux.git
Ha! Some how I missed seeing the two replies from Petr and Sebastian!
Not sure how that happened. Something may be going wacky with claws-mail. :-/
Thanks for pointing them out. I'll definitely take a look.
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-18 21:21 ` Luis Chamberlain
2025-02-18 21:29 ` Steven Rostedt
@ 2025-02-19 0:24 ` Steven Rostedt
2025-02-19 16:02 ` Luis Chamberlain
1 sibling, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-02-19 0:24 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Masami Hiramatsu (Google), linux-kernel, linux-trace-kernel,
Mark Rutland, Mathieu Desnoyers, Andrew Morton, Petr Pavlu,
Sami Tolvanen, Daniel Gomez, linux-modules,
Sebastian Andrzej Siewior
On Tue, 18 Feb 2025 13:21:20 -0800
Luis Chamberlain <mcgrof@kernel.org> wrote:
> The patch is not OK, you're looking at old code, look at
> modules-next and as Petr suggested look at Sebastian's recently
> merged work.
>
> git remote add korg-modules git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux.git
Would this be OK?
I have this on v6.14-rc3, and it doesn't cause any conflicts when I merge
it with modules-next, and it builds fine.
-- Steve
diff --git a/include/linux/module.h b/include/linux/module.h
index 30e5b19bafa9..9a71dd2cb11f 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -782,6 +782,8 @@ static inline void *module_writable_address(struct module *mod, void *loc)
return __module_writable_address(mod, loc);
}
+void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data);
+
#else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr)
@@ -894,6 +896,10 @@ static inline void *module_writable_address(struct module *mod, void *loc)
{
return loc;
}
+
+static inline void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
+{
+}
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 1fb9ad289a6f..1bd4e3b7098a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3809,6 +3809,19 @@ bool is_module_text_address(unsigned long addr)
return ret;
}
+void module_for_each_mod(int(*func)(struct module *mod, void *data), void *data)
+{
+ struct module *mod;
+
+ guard(rcu)();
+ list_for_each_entry_rcu(mod, &modules, list) {
+ if (mod->state == MODULE_STATE_UNFORMED)
+ continue;
+ if (func(mod, data))
+ break;
+ }
+}
+
/**
* __module_text_address() - get the module whose code contains an address.
* @addr: the address.
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 5/8] module: Add module_for_each_mod() function
2025-02-19 0:24 ` Steven Rostedt
@ 2025-02-19 16:02 ` Luis Chamberlain
0 siblings, 0 replies; 45+ messages in thread
From: Luis Chamberlain @ 2025-02-19 16:02 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu (Google), linux-kernel, linux-trace-kernel,
Mark Rutland, Mathieu Desnoyers, Andrew Morton, Petr Pavlu,
Sami Tolvanen, Daniel Gomez, linux-modules,
Sebastian Andrzej Siewior
On Tue, Feb 18, 2025 at 07:24:46PM -0500, Steven Rostedt wrote:
> On Tue, 18 Feb 2025 13:21:20 -0800
> Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> > The patch is not OK, you're looking at old code, look at
> > modules-next and as Petr suggested look at Sebastian's recently
> > merged work.
> >
> > git remote add korg-modules git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux.git
>
> Would this be OK?
>
> I have this on v6.14-rc3, and it doesn't cause any conflicts when I merge
> it with modules-next, and it builds fine.
Seems more in line with what is used now.
Luis
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/3] tracing: Remove checking the activity when module map is updating
2025-02-06 16:58 ` [PATCH 2/3] tracing: Remove checking the activity when module map is updating Masami Hiramatsu (Google)
@ 2025-03-07 15:21 ` Steven Rostedt
2025-03-11 0:40 ` Masami Hiramatsu
0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2025-03-07 15:21 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Fri, 7 Feb 2025 01:58:56 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> Remove unnecessary active check because tr->flags already checks it.
I've thought this over, and sure, if we start tracing on a persistent ring
buffer, then it can add all modules loaded from then on even if it it's not
tracing.
Can you merge patches 1 and 2, rebase it on ring-buffer/for-next and
resubmit this as one patch?
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/3] tracing: Remove checking the activity when module map is updating
2025-03-07 15:21 ` Steven Rostedt
@ 2025-03-11 0:40 ` Masami Hiramatsu
0 siblings, 0 replies; 45+ messages in thread
From: Masami Hiramatsu @ 2025-03-11 0:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
Andrew Morton
On Fri, 7 Mar 2025 10:21:37 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 7 Feb 2025 01:58:56 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
>
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >
> > Remove unnecessary active check because tr->flags already checks it.
>
> I've thought this over, and sure, if we start tracing on a persistent ring
> buffer, then it can add all modules loaded from then on even if it it's not
> tracing.
>
> Can you merge patches 1 and 2, rebase it on ring-buffer/for-next and
> resubmit this as one patch?
Oops, I've missed this message. OK, let me rebase it.
Thanks,
>
> Thanks,
>
> -- Steve
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2025-03-11 0:40 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05 22:50 [PATCH 0/8] ring-buffer/tracing: Save module information in persistent memory Steven Rostedt
2025-02-05 22:50 ` [PATCH 1/8] ring-buffer: Use kaslr address instead of text delta Steven Rostedt
2025-02-06 0:32 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 2/8] ring-buffer: Add buffer meta data for persistent ring buffer Steven Rostedt
2025-02-06 5:10 ` Masami Hiramatsu
2025-02-06 15:19 ` Steven Rostedt
2025-02-05 22:50 ` [PATCH 3/8] ring-buffer: Add ring_buffer_meta_scratch() Steven Rostedt
2025-02-06 5:13 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 4/8] tracing: Have persistent trace instances save KASLR offset Steven Rostedt
2025-02-06 5:22 ` Masami Hiramatsu
2025-02-06 15:24 ` Steven Rostedt
2025-02-07 0:58 ` Masami Hiramatsu
2025-02-07 1:03 ` Steven Rostedt
2025-02-07 2:15 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 5/8] module: Add module_for_each_mod() function Steven Rostedt
2025-02-06 5:28 ` Masami Hiramatsu
2025-02-06 15:27 ` Steven Rostedt
2025-02-10 13:04 ` Petr Pavlu
2025-02-10 14:08 ` Sebastian Andrzej Siewior
2025-02-14 22:30 ` Steven Rostedt
2025-02-18 21:21 ` Luis Chamberlain
2025-02-18 21:29 ` Steven Rostedt
2025-02-19 0:24 ` Steven Rostedt
2025-02-19 16:02 ` Luis Chamberlain
2025-02-05 22:50 ` [PATCH 6/8] tracing: Have persistent trace instances save module addresses Steven Rostedt
2025-02-06 8:26 ` Masami Hiramatsu
2025-02-06 15:29 ` Steven Rostedt
2025-02-06 16:53 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 7/8] tracing: Show module names and addresses of last boot Steven Rostedt
2025-02-07 1:51 ` Masami Hiramatsu
2025-02-07 2:02 ` Steven Rostedt
2025-02-07 2:25 ` Masami Hiramatsu
2025-02-05 22:50 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
2025-02-06 10:01 ` Masami Hiramatsu
2025-02-06 15:36 ` Steven Rostedt
2025-02-06 16:53 ` Masami Hiramatsu
2025-02-06 16:58 ` [PATCH 1/3] tracing: Skip update_last_data() if it is already updated Masami Hiramatsu (Google)
2025-02-06 16:58 ` [PATCH 2/3] tracing: Remove checking the activity when module map is updating Masami Hiramatsu (Google)
2025-03-07 15:21 ` Steven Rostedt
2025-03-11 0:40 ` Masami Hiramatsu
2025-02-06 16:59 ` [PATCH 3/3] tracing: Show last module text symbols in the stacktrace Masami Hiramatsu (Google)
2025-02-06 17:46 ` Steven Rostedt
2025-02-07 1:50 ` Masami Hiramatsu
2025-02-06 17:18 ` [PATCH 8/8] tracing: Update modules to persistent instances when loaded Steven Rostedt
2025-02-07 0:47 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).