linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] tracing: Persistent traces across a reboot or crash
@ 2024-03-06  1:59 Steven Rostedt
  2024-03-06  1:59 ` [PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping Steven Rostedt
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells


This is a way to map a ring buffer instance across reboots.
The requirement is that you have a memory region that is not erased.
I tested this on a Debian VM running on qemu on a Debian server,
and even tested it on a baremetal box running Fedora. I was
surprised that it worked on the baremetal box, but it does so
surprisingly consistently.

The idea is that you can reserve a memory region and save it in two
special variables:

  trace_buffer_start and trace_buffer_size

If these are set by fs_initcall() then a "boot_mapped" instance
is created. The memory that was reserved is used by the ring buffer
of this instance. It acts like a memory mapped instance so it has
some limitations. It does not allow snapshots nor does it allow
tracers which use a snapshot buffer (like irqsoff and wakeup tracers).

On boot up, when setting up the ring buffer, it looks at the current
content and does a vigorous test to see if the content is valid.
It even walks the events in all the sub-buffers to make sure the
ring buffer meta data is correct. If it determines that the content
is valid, it will reconstruct the ring buffer to use the content
it has found.

If the buffer is valid, on the next boot, the boot_mapped instance
will contain the data from the previous boot. You can cat the
trace or trace_pipe file, or even run trace-cmd extract on it to
make a trace.dat file that holds the date. This is much better than
dealing with a ftrace_dump_on_opps (I wish I had this a decade ago!)

There are still some limitations of this buffer. One is that it assumes
that the kernel you are booting back into is the same one that crashed.
At least the trace_events (like sched_switch and friends) all have the
same ids. This would be true with the same kernel as the ids are determined
at link time.

Module events could possible be a problem as the ids may not match.

One idea is to just print the raw fields and not process the print formats
for this instance, as the print formats may do some crazy things with
data that does not match.

Another limitation is any print format that has "%pS" will likely not work.
That's because the pointer in the old ring buffer is for an address that
may be different than the function points to now. I was thinking of
adding a file in the boot_mapped instance that holds the delta of the
old mapping to the new mapping, so that trace-cmd and perf could
calculate the current kallsyms from the old pointers.

Finally, this is still a proof of concept. How to create this memory
mapping isn't decided yet. In this patch set I simply hacked into
kexec crash code and hard coded an address that worked for one of my
machines (for the other machine I had to play around to find another
address). Perhaps we could add a kernel command line parameter that
lets people decided, or an option where it could possibly look at
the ACPI (for intel) tables to come up with an address on its own.

Anyway, I plan on using this for debugging, as it already is pretty
featureful but there's much more that can be done.

Basically, all you need to do is:

  echo 1 > /sys/kernel/tracing/instances/boot_mapped/events/enable

Do what ever you want and the system crashes (and boots to the same
kernel). Then:

  cat /sys/kernel/tracing/instances/boot_mapped/trace

and it will have the trace.

I'm sure there's still some gotchas here, which is why this is currently
still just a POC.

Enjoy...

Steven Rostedt (Google) (8):
      ring-buffer: Allow mapped field to be set without mapping
      ring-buffer: Add ring_buffer_alloc_range()
      tracing: Create "boot_mapped" instance for memory mapped buffer
      HACK: Hard code in mapped tracing buffer address
      ring-buffer: Add ring_buffer_meta data
      ring-buffer: Add output of ring buffer meta page
      ring-buffer: Add test if range of boot buffer is valid
      ring-buffer: Validate boot range memory events

----
 arch/x86/kernel/setup.c     |  20 ++
 include/linux/ring_buffer.h |  17 +
 include/linux/trace.h       |   7 +
 kernel/trace/ring_buffer.c  | 826 ++++++++++++++++++++++++++++++++++++++------
 kernel/trace/trace.c        |  95 ++++-
 kernel/trace/trace.h        |   5 +
 6 files changed, 856 insertions(+), 114 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-06  1:59 ` [PATCH 2/8] ring-buffer: Add ring_buffer_alloc_range() Steven Rostedt
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In preparation for having the ring buffer mapped to a dedicated location,
which will have the same restrictions as user space memory mapped buffers,
allow it to use the "mapped" field of the ring_buffer_per_cpu structure
without having the user space meta page mapping.

When this starts using the mapped field, it will need to handle adding a
user space mapping (and removing it) from a ring buffer that is using a
dedicated memory range.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1d7d7a701867..524b2c185c88 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5171,6 +5171,9 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
 
+	if (!meta)
+		return;
+
 	meta->reader.read = cpu_buffer->reader_page->read;
 	meta->reader.id = cpu_buffer->reader_page->id;
 	meta->reader.lost_events = cpu_buffer->lost_events;
@@ -6159,7 +6162,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
 
 	mutex_lock(&cpu_buffer->mapping_lock);
 
-	if (!cpu_buffer->mapped) {
+	if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
 		mutex_unlock(&cpu_buffer->mapping_lock);
 		return ERR_PTR(-ENODEV);
 	}
@@ -6217,7 +6220,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu)
 
 	mutex_lock(&cpu_buffer->mapping_lock);
 
-	if (cpu_buffer->mapped) {
+	if (cpu_buffer->meta_page) {
 		err = __rb_inc_dec_mapped(buffer, cpu_buffer, true);
 		mutex_unlock(&cpu_buffer->mapping_lock);
 		return err;
@@ -6247,7 +6250,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu)
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
 	rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
-	cpu_buffer->mapped = 1;
+	cpu_buffer->mapped++;
 
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 unlock:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/8] ring-buffer: Add ring_buffer_alloc_range()
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
  2024-03-06  1:59 ` [PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-06  1:59 ` [PATCH 3/8] tracing: Create "boot_mapped" instance for memory mapped buffer Steven Rostedt
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In preparation to allowing the trace ring buffer to be allocated in a
range of memory that is persistent across reboots, add
ring_buffer_alloc_range(). It takes a contiguous range of memory and will
split it up evening for the per CPU ring buffers.

If there's not enough memory to handle all CPUs with the minimum size, it
will fail to allocate the ring buffer.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/ring_buffer.h |  17 +++
 kernel/trace/ring_buffer.c  | 221 ++++++++++++++++++++++++++++++------
 2 files changed, 203 insertions(+), 35 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 0841ba8bab14..17b5508f042c 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -89,6 +89,11 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer,
 struct trace_buffer *
 __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key);
 
+struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags,
+					       int order, unsigned long start,
+					       unsigned long range_size,
+					       struct lock_class_key *key);
+
 /*
  * Because the ring buffer is generic, if other users of the ring buffer get
  * traced by ftrace, it can produce lockdep warnings. We need to keep each
@@ -100,6 +105,18 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+/*
+ * Because the ring buffer is generic, if other users of the ring buffer get
+ * traced by ftrace, it can produce lockdep warnings. We need to keep each
+ * ring buffer's lock class separate.
+ */
+#define ring_buffer_alloc_range(size, flags, order, start, range_size)	\
+({									\
+	static struct lock_class_key __key;				\
+	__ring_buffer_alloc_range((size), (flags), (order), (start),	\
+				  (range_size), &__key);		\
+})
+
 int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
 __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
 			  struct file *filp, poll_table *poll_table, int full);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 524b2c185c88..367597dc766b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -41,6 +41,9 @@
 
 static void update_pages_handler(struct work_struct *work);
 
+struct ring_buffer_meta {
+};
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -339,7 +342,8 @@ struct buffer_page {
 	local_t		 entries;	/* entries on this page */
 	unsigned long	 real_end;	/* real end of data */
 	unsigned	 order;		/* order of the page */
-	u32		 id;		/* ID for external mapping */
+	u32		 id:30;		/* ID for external mapping */
+	u32		 range:1;	/* Mapped via a range */
 	struct buffer_data_page *page;	/* Actual data page */
 };
 
@@ -370,7 +374,9 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
 
 static void free_buffer_page(struct buffer_page *bpage)
 {
-	free_pages((unsigned long)bpage->page, bpage->order);
+	/* Range pages are not to be freed */
+	if (!bpage->range)
+		free_pages((unsigned long)bpage->page, bpage->order);
 	kfree(bpage);
 }
 
@@ -520,6 +526,9 @@ struct trace_buffer {
 	struct rb_irq_work		irq_work;
 	bool				time_stamp_abs;
 
+	unsigned long			range_addr_start;
+	unsigned long			range_addr_end;
+
 	unsigned int			subbuf_size;
 	unsigned int			subbuf_order;
 	unsigned int			max_data_size;
@@ -1431,9 +1440,67 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	}
 }
 
+/*
+ * Take an address, add the meta data size as well as the array of
+ * array subbuffer indexes, then align it to a subbuffer size.
+ */
+static unsigned long
+rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
+{
+	addr += sizeof(struct ring_buffer_meta) +
+		sizeof(int) * nr_subbufs;
+	return ALIGN(addr, subbuf_size);
+}
+
+/*
+ * Return a specific sub-buffer for a given @cpu defined by @idx.
+ */
+static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int nr_pages, int idx)
+{
+	unsigned long ptr;
+	int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+	int nr_subbufs;
+
+	/* Include the reader page */
+	nr_subbufs = nr_pages + 1;
+
+	/*
+	 * The first chunk may not be subbuffer aligned, where as
+	 * the rest of the chunks are.
+	 */
+	ptr = buffer->range_addr_start;
+	ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
+	if (cpu) {
+		unsigned long p;
+
+		ptr += subbuf_size * nr_subbufs;
+
+		/* Save the beginning of this CPU chunk */
+		p = ptr;
+
+		ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
+
+		if (cpu > 1) {
+			unsigned long size;
+
+			ptr += subbuf_size * nr_subbufs;
+
+			/* Now all chunks after this are the same size */
+			size = ptr - p;
+			ptr += size * (cpu - 2);
+
+			ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
+		}
+	}
+	if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end)
+		return NULL;
+	return (void *)ptr;
+}
+
 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		long nr_pages, struct list_head *pages)
 {
+	struct trace_buffer *buffer = cpu_buffer->buffer;
 	struct buffer_page *bpage, *tmp;
 	bool user_thread = current->mm != NULL;
 	gfp_t mflags;
@@ -1470,6 +1537,7 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		set_current_oom_origin();
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
+		int cpu = cpu_buffer->cpu;
 
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    mflags, cpu_to_node(cpu_buffer->cpu));
@@ -1478,14 +1546,22 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 		rb_check_bpage(cpu_buffer, bpage);
 
-		list_add(&bpage->list, pages);
+		list_add_tail(&bpage->list, pages);
 
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
-					mflags | __GFP_ZERO,
-					cpu_buffer->buffer->subbuf_order);
-		if (!page)
-			goto free_pages;
-		bpage->page = page_address(page);
+		if (buffer->range_addr_start) {
+			/* A range was given. Use that for the buffer page */
+			bpage->page = rb_range_buffer(buffer, cpu, nr_pages, i + 1);
+			if (!bpage->page)
+				goto free_pages;
+			bpage->range = 1;
+		} else {
+			page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+						mflags | __GFP_ZERO,
+						cpu_buffer->buffer->subbuf_order);
+			if (!page)
+				goto free_pages;
+			bpage->page = page_address(page);
+		}
 		bpage->order = cpu_buffer->buffer->subbuf_order;
 		rb_init_page(bpage->page);
 
@@ -1567,11 +1643,18 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 
 	cpu_buffer->reader_page = bpage;
 
-	page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO,
-				cpu_buffer->buffer->subbuf_order);
-	if (!page)
-		goto fail_free_reader;
-	bpage->page = page_address(page);
+	if (buffer->range_addr_start) {
+		bpage->page = rb_range_buffer(buffer, cpu, nr_pages, 0);
+		if (!bpage->page)
+			goto fail_free_reader;
+		bpage->range = 1;
+	} else {
+		page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO,
+					cpu_buffer->buffer->subbuf_order);
+		if (!page)
+			goto fail_free_reader;
+		bpage->page = page_address(page);
+	}
 	rb_init_page(bpage->page);
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
@@ -1622,22 +1705,14 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 	kfree(cpu_buffer);
 }
 
-/**
- * __ring_buffer_alloc - allocate a new ring_buffer
- * @size: the size in bytes per cpu that is needed.
- * @flags: attributes to set for the ring buffer.
- * @key: ring buffer reader_lock_key.
- *
- * Currently the only flag that is available is the RB_FL_OVERWRITE
- * flag. This flag means that the buffer will overwrite old data
- * when the buffer wraps. If this flag is not set, the buffer will
- * drop data when the tail hits the head.
- */
-struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
-					struct lock_class_key *key)
+static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
+					 int order, unsigned long start,
+					 unsigned long end,
+					 struct lock_class_key *key)
 {
 	struct trace_buffer *buffer;
 	long nr_pages;
+	int subbuf_size;
 	int bsize;
 	int cpu;
 	int ret;
@@ -1651,14 +1726,13 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!zalloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	/* Default buffer page size - one system page */
-	buffer->subbuf_order = 0;
-	buffer->subbuf_size = PAGE_SIZE - BUF_PAGE_HDR_SIZE;
+	buffer->subbuf_order = order;
+	subbuf_size = (PAGE_SIZE << order);
+	buffer->subbuf_size = subbuf_size - BUF_PAGE_HDR_SIZE;
 
 	/* Max payload is buffer page size - header (8bytes) */
 	buffer->max_data_size = buffer->subbuf_size - (sizeof(u32) * 2);
 
-	nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
@@ -1666,10 +1740,6 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	init_irq_work(&buffer->irq_work.work, rb_wake_up_waiters);
 	init_waitqueue_head(&buffer->irq_work.waiters);
 
-	/* need at least two pages */
-	if (nr_pages < 2)
-		nr_pages = 2;
-
 	buffer->cpus = nr_cpu_ids;
 
 	bsize = sizeof(void *) * nr_cpu_ids;
@@ -1678,6 +1748,46 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!buffer->buffers)
 		goto fail_free_cpumask;
 
+	/* If start/end are specified, then that overrides size */
+	if (start && end) {
+		unsigned long ptr;
+		int n;
+
+		size = end - start;
+		size = size / nr_cpu_ids;
+		nr_pages = (size - sizeof(struct ring_buffer_meta)) /
+			(subbuf_size + sizeof(int));
+		/* Need at least two pages plus the reader page */
+		if (nr_pages < 3)
+			goto fail_free_buffers;
+
+ again:
+		/* Make sure that the size fits aligned */
+		for (n = 0, ptr = start; n < nr_cpu_ids; n++) {
+			ptr += sizeof(struct ring_buffer_meta) +
+				sizeof(int) * nr_pages;
+			ptr = ALIGN(ptr, subbuf_size);
+			ptr += subbuf_size * nr_pages;
+		}
+		if (ptr > end) {
+			if (nr_pages <= 3)
+				goto fail_free_buffers;
+			nr_pages--;
+			goto again;
+		}
+
+		/* nr_pages should not count the reader page */
+		nr_pages--;
+		buffer->range_addr_start = start;
+		buffer->range_addr_end = end;
+	} else {
+
+		/* need at least two pages */
+		nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size);
+		if (nr_pages < 2)
+			nr_pages = 2;
+	}
+
 	cpu = raw_smp_processor_id();
 	cpumask_set_cpu(cpu, buffer->cpumask);
 	buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
@@ -1706,8 +1816,49 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	kfree(buffer);
 	return NULL;
 }
+
+/**
+ * __ring_buffer_alloc - allocate a new ring_buffer
+ * @size: the size in bytes per cpu that is needed.
+ * @flags: attributes to set for the ring buffer.
+ * @key: ring buffer reader_lock_key.
+ *
+ * Currently the only flag that is available is the RB_FL_OVERWRITE
+ * flag. This flag means that the buffer will overwrite old data
+ * when the buffer wraps. If this flag is not set, the buffer will
+ * drop data when the tail hits the head.
+ */
+struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
+					struct lock_class_key *key)
+{
+	/* Default buffer page size - one system page */
+	return alloc_buffer(size, flags, 0, 0, 0,key);
+
+}
 EXPORT_SYMBOL_GPL(__ring_buffer_alloc);
 
+/**
+ * __ring_buffer_alloc_range - allocate a new ring_buffer from existing memory
+ * @size: the size in bytes per cpu that is needed.
+ * @flags: attributes to set for the ring buffer.
+ * @start: start of allocated range
+ * @range_size: size of allocated range
+ * @order: sub-buffer order
+ * @key: ring buffer reader_lock_key.
+ *
+ * Currently the only flag that is available is the RB_FL_OVERWRITE
+ * flag. This flag means that the buffer will overwrite old data
+ * when the buffer wraps. If this flag is not set, the buffer will
+ * drop data when the tail hits the head.
+ */
+struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags,
+					       int order, unsigned long start,
+					       unsigned long range_size,
+					       struct lock_class_key *key)
+{
+	return alloc_buffer(size, flags, order, start, start + range_size, key);
+}
+
 /**
  * ring_buffer_free - free a ring buffer.
  * @buffer: the buffer to free.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/8] tracing: Create "boot_mapped" instance for memory mapped buffer
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
  2024-03-06  1:59 ` [PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping Steven Rostedt
  2024-03-06  1:59 ` [PATCH 2/8] ring-buffer: Add ring_buffer_alloc_range() Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-06  1:59 ` [PATCH 4/8] HACK: Hard code in mapped tracing buffer address Steven Rostedt
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Add two global variables trace_buffer_start and trace_buffer_size. If they
are both set, then a "boot_mapped" instance will be created using the
memory specified by these variables as its ring buffer.

The instance will exist in:

  /sys/kernel/tracing/instances/boot_mapped

Note, because the ring buffer is using a defined memory ranged, it will
act just like a memory mapped ring buffer. It will not have a snapshot
buffer, as it can't swap out the buffer. The snapshot files as well as any
tracers that uses a snapshot will not be present in the boot_mapped
instance.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/trace.h |  7 +++++
 kernel/trace/trace.c  | 65 ++++++++++++++++++++++++++++++++++++-------
 kernel/trace/trace.h  |  3 ++
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index fdcd76b7be83..75dab6bb88c9 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -33,6 +33,13 @@ struct trace_array;
 int register_ftrace_export(struct trace_export *export);
 int unregister_ftrace_export(struct trace_export *export);
 
+/*
+ * If the below are set, then a "boot_mapped" tracing instance will
+ * be created using this memory for its ring buffer.
+ */
+extern unsigned long trace_buffer_start;
+extern unsigned long trace_buffer_size;
+
 /**
  * trace_array_puts - write a constant string into the trace buffer.
  * @tr:    The trace array to write to
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ff0b0a999171..ff986d2a4bd0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4900,6 +4900,11 @@ static int tracing_open(struct inode *inode, struct file *file)
 static bool
 trace_ok_for_array(struct tracer *t, struct trace_array *tr)
 {
+#ifdef CONFIG_TRACER_SNAPSHOT
+	/* arrays with mapped buffer range do not have snapshots */
+	if (tr->range_addr_start && t->use_max_tr)
+		return false;
+#endif
 	return (tr->flags & TRACE_ARRAY_FL_GLOBAL) || t->allow_instances;
 }
 
@@ -8670,11 +8675,13 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu)
 				tr, cpu, &tracing_entries_fops);
 
 #ifdef CONFIG_TRACER_SNAPSHOT
-	trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
-				tr, cpu, &snapshot_fops);
+	if (!tr->range_addr_start) {
+		trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
+				      tr, cpu, &snapshot_fops);
 
-	trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu,
-				tr, cpu, &snapshot_raw_fops);
+		trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu,
+				      tr, cpu, &snapshot_raw_fops);
+	}
 #endif
 }
 
@@ -9211,7 +9218,18 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
 
 	buf->tr = tr;
 
-	buf->buffer = ring_buffer_alloc(size, rb_flags);
+	if (tr->range_addr_start && tr->range_addr_size) {
+		buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
+						      tr->range_addr_start,
+						      tr->range_addr_size);
+		/*
+		 * This is basically the same as a mapped buffer,
+		 * with the same restrictions.
+		 */
+		tr->mapped++;
+	} else {
+		buf->buffer = ring_buffer_alloc(size, rb_flags);
+	}
 	if (!buf->buffer)
 		return -ENOMEM;
 
@@ -9248,6 +9266,10 @@ static int allocate_trace_buffers(struct trace_array *tr, int size)
 		return ret;
 
 #ifdef CONFIG_TRACER_MAX_TRACE
+	/* Fix mapped buffer trace arrays do not have snapshot buffers */
+	if (tr->range_addr_start)
+		return 0;
+
 	ret = allocate_trace_buffer(tr, &tr->max_buffer,
 				    allocate_snapshot ? size : 1);
 	if (MEM_FAIL(ret, "Failed to allocate trace buffer\n")) {
@@ -9348,7 +9370,9 @@ static int trace_array_create_dir(struct trace_array *tr)
 }
 
 static struct trace_array *
-trace_array_create_systems(const char *name, const char *systems)
+trace_array_create_systems(const char *name, const char *systems,
+			   unsigned long range_addr_start,
+			   unsigned long range_addr_size)
 {
 	struct trace_array *tr;
 	int ret;
@@ -9374,6 +9398,10 @@ trace_array_create_systems(const char *name, const char *systems)
 			goto out_free_tr;
 	}
 
+	/* Only for boot up memory mapped ring buffers */
+	tr->range_addr_start = range_addr_start;
+	tr->range_addr_size = range_addr_size;
+
 	tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS;
 
 	cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
@@ -9431,9 +9459,24 @@ trace_array_create_systems(const char *name, const char *systems)
 
 static struct trace_array *trace_array_create(const char *name)
 {
-	return trace_array_create_systems(name, NULL);
+	return trace_array_create_systems(name, NULL, 0, 0);
+}
+
+unsigned long trace_buffer_start;
+unsigned long trace_buffer_size;
+
+static int __init trace_range_tr(void)
+{
+	if (!trace_buffer_start || !trace_buffer_size)
+		return 0;
+
+	trace_array_create_systems("boot_mapped", NULL,
+				   trace_buffer_start, trace_buffer_size);
+	return 0;
 }
 
+fs_initcall(trace_range_tr);
+
 static int instance_mkdir(const char *name)
 {
 	struct trace_array *tr;
@@ -9485,7 +9528,7 @@ struct trace_array *trace_array_get_by_name(const char *name, const char *system
 			goto out_unlock;
 	}
 
-	tr = trace_array_create_systems(name, systems);
+	tr = trace_array_create_systems(name, systems, 0, 0);
 
 	if (IS_ERR(tr))
 		tr = NULL;
@@ -9678,8 +9721,10 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 		MEM_FAIL(1, "Could not allocate function filter files");
 
 #ifdef CONFIG_TRACER_SNAPSHOT
-	trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer,
-			  tr, &snapshot_fops);
+	if (!tr->range_addr_start) {
+		trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer,
+				  tr, &snapshot_fops);
+	}
 #endif
 
 	trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 749a182dab48..d22d7c3b770a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -338,6 +338,9 @@ struct trace_array {
 	unsigned int		snapshot;
 	unsigned int		mapped;
 	unsigned long		max_latency;
+	/* The below is for memory mapped ring buffer */
+	unsigned long		range_addr_start;
+	unsigned long		range_addr_size;
 #ifdef CONFIG_FSNOTIFY
 	struct dentry		*d_max_latency;
 	struct work_struct	fsnotify_work;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/8] HACK: Hard code in mapped tracing buffer address
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (2 preceding siblings ...)
  2024-03-06  1:59 ` [PATCH 3/8] tracing: Create "boot_mapped" instance for memory mapped buffer Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-06  1:59 ` [PATCH 5/8] ring-buffer: Add ring_buffer_meta data Steven Rostedt
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Do not submit!

This is for testing purposes only. It hard codes an address that I was
using to store the ring buffer range. How the memory actually gets mapped
will be another project.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/x86/kernel/setup.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84201071dfac..dcba729349d3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -26,6 +26,8 @@
 #include <linux/swiotlb.h>
 #include <linux/random.h>
 
+#include <linux/trace.h>
+
 #include <uapi/linux/mount.h>
 
 #include <xen/xen.h>
@@ -1106,6 +1108,24 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	arch_reserve_crashkernel();
 
+	trace_buffer_size = 12582912;
+	{
+		phys_addr_t ftrace_addr;
+		unsigned long phys_start = 0x285400000;
+		unsigned long phys_end = phys_start + trace_buffer_size + 1024*1024;
+
+		ftrace_addr = memblock_phys_alloc_range(trace_buffer_size, 4096,
+							phys_start, phys_end);
+		if (ftrace_addr) {
+			printk("MEMORY ALLOC %lx-%lx\n", (long)ftrace_addr,
+			       (long)ftrace_addr + trace_buffer_size);
+			trace_buffer_start = (unsigned long)__va(ftrace_addr);
+			printk("MEMORY ADDR %lx-%lx\n", trace_buffer_start,
+			       trace_buffer_start + trace_buffer_size);
+		} else
+			printk("MEMORY FAILED\n");
+	}
+
 	memblock_find_dma_reserve();
 
 	if (!early_xdbc_setup_hardware())
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/8] ring-buffer: Add ring_buffer_meta data
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (3 preceding siblings ...)
  2024-03-06  1:59 ` [PATCH 4/8] HACK: Hard code in mapped tracing buffer address Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-08 10:42   ` kernel test robot
  2024-03-08 10:54   ` kernel test robot
  2024-03-06  1:59 ` [PATCH 6/8] ring-buffer: Add output of ring buffer meta page Steven Rostedt
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Populate the ring_buffer_meta array. It holds the pointer to the
head_buffer (next to read), the commit_buffer (next to write) the size of
the sub-buffers, number of sub-buffers and an array that keeps track of
the order of the sub-buffers.

This information will be stored in the persistent memory to help on reboot
to reconstruct the ring buffer.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 207 ++++++++++++++++++++++++++++++++-----
 1 file changed, 182 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 367597dc766b..5a90ada49366 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -42,6 +42,11 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+	unsigned long	head_buffer;
+	unsigned long	commit_buffer;
+	__u32		subbuf_size;
+	__u32		nr_subbufs;
+	int		buffers[];
 };
 
 /*
@@ -497,6 +502,7 @@ struct ring_buffer_per_cpu {
 	struct mutex			mapping_lock;
 	unsigned long			*subbuf_ids;	/* ID to subbuf addr */
 	struct trace_buffer_meta	*meta_page;
+	struct ring_buffer_meta		*ring_meta;
 
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	long				nr_pages_to_update;
@@ -1206,6 +1212,11 @@ static void rb_head_page_activate(struct ring_buffer_per_cpu *cpu_buffer)
 	 * Set the previous list pointer to have the HEAD flag.
 	 */
 	rb_set_list_to_head(head->list.prev);
+
+	if (cpu_buffer->ring_meta) {
+		struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+		meta->head_buffer = (unsigned long)head->page;
+	}
 }
 
 static void rb_list_head_clear(struct list_head *list)
@@ -1453,50 +1464,124 @@ rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
 }
 
 /*
- * Return a specific sub-buffer for a given @cpu defined by @idx.
+ * Return the ring_buffer_meta for a given @cpu.
  */
-static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int nr_pages, int idx)
+static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
 {
-	unsigned long ptr;
 	int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+	unsigned long ptr = buffer->range_addr_start;
+	struct ring_buffer_meta *meta;
 	int nr_subbufs;
 
-	/* Include the reader page */
-	nr_subbufs = nr_pages + 1;
+	if (!ptr)
+		return NULL;
+
+	/* When nr_pages passed in is zero, the first meta has already been initialized */
+	if (!nr_pages) {
+		meta = (struct ring_buffer_meta *)ptr;
+		nr_subbufs = meta->nr_subbufs;
+	} else {
+		meta = NULL;
+		/* Include the reader page */
+		nr_subbufs = nr_pages + 1;
+	}
 
 	/*
 	 * The first chunk may not be subbuffer aligned, where as
 	 * the rest of the chunks are.
 	 */
-	ptr = buffer->range_addr_start;
-	ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
 	if (cpu) {
-		unsigned long p;
-
-		ptr += subbuf_size * nr_subbufs;
-
-		/* Save the beginning of this CPU chunk */
-		p = ptr;
-
 		ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
+		ptr += subbuf_size * nr_subbufs;
 
 		if (cpu > 1) {
 			unsigned long size;
+			unsigned long p;
 
+			/* Save the beginning of this CPU chunk */
+			p = ptr;
+			ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
 			ptr += subbuf_size * nr_subbufs;
 
 			/* Now all chunks after this are the same size */
 			size = ptr - p;
 			ptr += size * (cpu - 2);
-
-			ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
 		}
 	}
-	if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end)
+	return (void *)ptr;
+}
+
+static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
+{
+	int subbuf_size = meta->subbuf_size;
+	unsigned long ptr;
+
+	ptr = (unsigned long)meta;
+	ptr = rb_range_align_subbuf(ptr, subbuf_size, meta->nr_subbufs);
+
+	return (void *)ptr;
+}
+
+/*
+ * Return a specific sub-buffer for a given @cpu defined by @idx.
+ */
+static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
+{
+	struct ring_buffer_meta *meta;
+	unsigned long ptr;
+	int subbuf_size;
+
+	meta = rb_range_meta(cpu_buffer->buffer, 0, cpu_buffer->cpu);
+	if (!meta)
+		return NULL;
+
+	if (WARN_ON_ONCE(idx >= meta->nr_subbufs))
 		return NULL;
+
+	subbuf_size = meta->subbuf_size;
+
+	/* Map this buffer to the order that's in meta->buffers[] */
+	idx = meta->buffers[idx];
+
+	ptr = (unsigned long)rb_subbufs_from_meta(meta);
+
+	ptr += subbuf_size * idx;
+	if (ptr + subbuf_size > cpu_buffer->buffer->range_addr_end)
+		return NULL;
+
 	return (void *)ptr;
 }
 
+static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
+{
+	struct ring_buffer_meta *meta;
+	void *subbuf;
+	int cpu;
+
+	for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		meta = rb_range_meta(buffer, nr_pages, cpu);
+
+		meta->nr_subbufs = nr_pages + 1;
+		meta->subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+
+		subbuf = rb_subbufs_from_meta(meta);
+
+		/*
+		 * The buffers[] array holds the order of the sub-buffers
+		 * that are after the meta data. The sub-buffers may
+		 * be swapped out when read and inserted into a different
+		 * location of the ring buffer. Although their addresses
+		 * remain the same, the buffers[] array contains the
+		 * index into the sub-buffers holding their actual order.
+		 */
+		for (int i = 0; i < meta->nr_subbufs; i++) {
+			meta->buffers[i] = i;
+			rb_init_page(subbuf);
+			subbuf += meta->subbuf_size;
+		}
+	}
+}
+
 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		long nr_pages, struct list_head *pages)
 {
@@ -1537,7 +1622,6 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		set_current_oom_origin();
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
-		int cpu = cpu_buffer->cpu;
 
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    mflags, cpu_to_node(cpu_buffer->cpu));
@@ -1550,10 +1634,11 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 		if (buffer->range_addr_start) {
 			/* A range was given. Use that for the buffer page */
-			bpage->page = rb_range_buffer(buffer, cpu, nr_pages, i + 1);
+			bpage->page = rb_range_buffer(cpu_buffer, i + 1);
 			if (!bpage->page)
 				goto free_pages;
 			bpage->range = 1;
+			bpage->id = i + 1;
 		} else {
 			page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
 						mflags | __GFP_ZERO,
@@ -1561,9 +1646,9 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 			if (!page)
 				goto free_pages;
 			bpage->page = page_address(page);
+			rb_init_page(bpage->page);
 		}
 		bpage->order = cpu_buffer->buffer->subbuf_order;
-		rb_init_page(bpage->page);
 
 		if (user_thread && fatal_signal_pending(current))
 			goto free_pages;
@@ -1644,7 +1729,13 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 	cpu_buffer->reader_page = bpage;
 
 	if (buffer->range_addr_start) {
-		bpage->page = rb_range_buffer(buffer, cpu, nr_pages, 0);
+		/*
+		 * Range mapped buffers have the same restrictions as memory
+		 * mapped ones do.
+		 */
+		cpu_buffer->mapped = 1;
+		cpu_buffer->ring_meta = rb_range_meta(buffer, nr_pages, cpu);
+		bpage->page = rb_range_buffer(cpu_buffer, 0);
 		if (!bpage->page)
 			goto fail_free_reader;
 		bpage->range = 1;
@@ -1654,8 +1745,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 		if (!page)
 			goto fail_free_reader;
 		bpage->page = page_address(page);
+		rb_init_page(bpage->page);
 	}
-	rb_init_page(bpage->page);
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 	INIT_LIST_HEAD(&cpu_buffer->new_pages);
@@ -1669,6 +1760,10 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 	cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	rb_head_page_activate(cpu_buffer);
+	if (cpu_buffer->ring_meta) {
+		struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+		meta->commit_buffer = meta->head_buffer;
+	}
 
 	return cpu_buffer;
 
@@ -1780,6 +1875,8 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 		nr_pages--;
 		buffer->range_addr_start = start;
 		buffer->range_addr_end = end;
+
+		rb_range_meta_init(buffer, nr_pages);
 	} else {
 
 		/* need at least two pages */
@@ -2464,6 +2561,52 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
 	iter->next_event = 0;
 }
 
+/* Return the index into the sub-buffers for a given sub-buffer */
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf)
+{
+	void *subbuf_array;
+
+	subbuf_array = (void *)meta + sizeof(int) * meta->nr_subbufs;
+	subbuf_array = (void *)ALIGN((unsigned long)subbuf_array, meta->subbuf_size);
+	return (subbuf - subbuf_array) / meta->subbuf_size;
+}
+
+static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer,
+				struct buffer_page *next_page)
+{
+	struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+	unsigned long old_head = (unsigned long)next_page->page;
+	unsigned long new_head;
+
+	rb_inc_page(&next_page);
+	new_head = (unsigned long)next_page->page;
+
+	/*
+	 * Only move it forward once, if something else came in and
+	 * moved it forward, then we don't want to touch it.
+	 */
+	(void)cmpxchg(&meta->head_buffer, old_head, new_head);
+}
+
+static void rb_update_meta_reader(struct ring_buffer_per_cpu *cpu_buffer,
+				  struct buffer_page *reader)
+{
+	struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+	void *old_reader = cpu_buffer->reader_page->page;
+	void *new_reader = reader->page;
+	int id;
+
+	id = reader->id;
+	cpu_buffer->reader_page->id = id;
+	reader->id = 0;
+
+	meta->buffers[0] = rb_meta_subbuf_idx(meta, new_reader);
+	meta->buffers[id] = rb_meta_subbuf_idx(meta, old_reader);
+
+	/* The head pointer is the one after the reader */
+	rb_update_meta_head(cpu_buffer, reader);
+}
+
 /*
  * rb_handle_head_page - writer hit the head page
  *
@@ -2513,6 +2656,8 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer,
 		local_sub(rb_page_commit(next_page), &cpu_buffer->entries_bytes);
 		local_inc(&cpu_buffer->pages_lost);
 
+		if (cpu_buffer->ring_meta)
+			rb_update_meta_head(cpu_buffer, next_page);
 		/*
 		 * The entries will be zeroed out when we move the
 		 * tail page.
@@ -3074,6 +3219,10 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 		local_set(&cpu_buffer->commit_page->page->commit,
 			  rb_page_write(cpu_buffer->commit_page));
 		rb_inc_page(&cpu_buffer->commit_page);
+		if (cpu_buffer->ring_meta) {
+			struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+			meta->commit_buffer = cpu_buffer->commit_page->page;
+		}
 		/* add barrier to keep gcc from optimizing too much */
 		barrier();
 	}
@@ -4691,6 +4840,9 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	if (!ret)
 		goto spin;
 
+	if (cpu_buffer->ring_meta)
+		rb_update_meta_reader(cpu_buffer, reader);
+
 	/*
 	 * Yay! We succeeded in replacing the page.
 	 *
@@ -5381,11 +5533,16 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->lost_events = 0;
 	cpu_buffer->last_overrun = 0;
 
-	if (cpu_buffer->mapped)
-		rb_update_meta_page(cpu_buffer);
-
 	rb_head_page_activate(cpu_buffer);
 	cpu_buffer->pages_removed = 0;
+
+	if (cpu_buffer->mapped) {
+		rb_update_meta_page(cpu_buffer);
+		if (cpu_buffer->ring_meta) {
+			struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+			meta->commit_buffer = meta->head_buffer;
+		}
+	}
 }
 
 /* Must have disabled the cpu buffer then done a synchronize_rcu */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/8] ring-buffer: Add output of ring buffer meta page
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (4 preceding siblings ...)
  2024-03-06  1:59 ` [PATCH 5/8] ring-buffer: Add ring_buffer_meta data Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-06  1:59 ` [PATCH 7/8] ring-buffer: Add test if range of boot buffer is valid Steven Rostedt
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Add a buffer_meta per-cpu file for the trace instance that is mapped to
boot memory. This shows the current meta-data and can be used by user
space tools to record off the current mappings to help reconstruct the
ring buffer after a reboot.

It does not expose any virtual addresses, just indexes into the sub-buffer
pages.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 77 ++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c       | 30 ++++++++++++++-
 kernel/trace/trace.h       |  2 +
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a90ada49366..1e06ebe36ad1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -31,6 +31,8 @@
 #include <asm/local64.h>
 #include <asm/local.h>
 
+#include "trace.h"
+
 /*
  * The "absolute" timestamp in the buffer is only 59 bits.
  * If a clock has the 5 MSBs set, it needs to be saved and
@@ -1582,6 +1584,81 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
 	}
 }
 
+static void *rbm_start(struct seq_file *m, loff_t *pos)
+{
+	struct ring_buffer_per_cpu *cpu_buffer = m->private;
+	struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+	unsigned long val;
+
+	if (!meta)
+		return NULL;
+
+	if (*pos > meta->nr_subbufs)
+		return NULL;
+
+	val = *pos;
+	val++;
+
+	return (void *)val;
+}
+
+static void *rbm_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	(*pos)++;
+
+	return rbm_start(m, pos);
+}
+
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+
+static int rbm_show(struct seq_file *m, void *v)
+{
+	struct ring_buffer_per_cpu *cpu_buffer = m->private;
+	struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+	unsigned long val = (unsigned long)v;
+
+	if (val == 1) {
+		seq_printf(m, "head_buffer:   %d\n",
+			   rb_meta_subbuf_idx(meta, (void *)meta->head_buffer));
+		seq_printf(m, "commit_buffer: %d\n",
+			   rb_meta_subbuf_idx(meta, (void *)meta->commit_buffer));
+		seq_printf(m, "subbuf_size:   %d\n", meta->subbuf_size);
+		seq_printf(m, "nr_subbufs:    %d\n", meta->nr_subbufs);
+		return 0;
+	}
+
+	val -= 2;
+	seq_printf(m, "buffer[%ld]:    %d\n", val, meta->buffers[val]);
+
+	return 0;
+}
+
+static void rbm_stop(struct seq_file *m, void *p)
+{
+}
+
+static const struct seq_operations rb_meta_seq_ops = {
+	.start		= rbm_start,
+	.next		= rbm_next,
+	.show		= rbm_show,
+	.stop		= rbm_stop,
+};
+
+int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu)
+{
+	struct seq_file *m;
+	int ret;
+
+	ret = seq_open(file, &rb_meta_seq_ops);
+	if (ret)
+		return ret;
+
+	m = file->private_data;
+	m->private = buffer->buffers[cpu];
+
+	return 0;
+}
+
 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		long nr_pages, struct list_head *pages)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ff986d2a4bd0..b4a7960aed98 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4997,7 +4997,7 @@ static int show_traces_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static int show_traces_release(struct inode *inode, struct file *file)
+static int tracing_seq_release(struct inode *inode, struct file *file)
 {
 	struct trace_array *tr = inode->i_private;
 
@@ -5038,7 +5038,7 @@ static const struct file_operations show_traces_fops = {
 	.open		= show_traces_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= show_traces_release,
+	.release	= tracing_seq_release,
 };
 
 static ssize_t
@@ -6840,6 +6840,22 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
 }
 
+static int tracing_buffer_meta_open(struct inode *inode, struct file *filp)
+{
+	struct trace_array *tr = inode->i_private;
+	int cpu = tracing_get_cpu(inode);
+	int ret;
+
+	ret = tracing_check_open_get_tr(tr);
+	if (ret)
+		return ret;
+
+	ret = ring_buffer_meta_seq_init(filp, tr->array_buffer.buffer, cpu);
+	if (ret < 0)
+		__trace_array_put(tr);
+	return ret;
+}
+
 static ssize_t
 tracing_free_buffer_write(struct file *filp, const char __user *ubuf,
 			  size_t cnt, loff_t *ppos)
@@ -7416,6 +7432,13 @@ static const struct file_operations tracing_entries_fops = {
 	.release	= tracing_release_generic_tr,
 };
 
+static const struct file_operations tracing_buffer_meta_fops = {
+	.open		= tracing_buffer_meta_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= tracing_seq_release,
+};
+
 static const struct file_operations tracing_total_entries_fops = {
 	.open		= tracing_open_generic_tr,
 	.read		= tracing_total_entries_read,
@@ -8674,6 +8697,9 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu)
 	trace_create_cpu_file("buffer_size_kb", TRACE_MODE_READ, d_cpu,
 				tr, cpu, &tracing_entries_fops);
 
+	if (tr->range_addr_start)
+		trace_create_cpu_file("buffer_meta", TRACE_MODE_READ, d_cpu,
+				      tr, cpu, &tracing_buffer_meta_fops);
 #ifdef CONFIG_TRACER_SNAPSHOT
 	if (!tr->range_addr_start) {
 		trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d22d7c3b770a..ccff4891c2ac 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -644,6 +644,8 @@ trace_buffer_lock_reserve(struct trace_buffer *buffer,
 			  unsigned long len,
 			  unsigned int trace_ctx);
 
+int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu);
+
 struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
 						struct trace_array_cpu *data);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/8] ring-buffer: Add test if range of boot buffer is valid
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (5 preceding siblings ...)
  2024-03-06  1:59 ` [PATCH 6/8] ring-buffer: Add output of ring buffer meta page Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-06  1:59 ` [PATCH 8/8] ring-buffer: Validate boot range memory events Steven Rostedt
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Add a test against the ring buffer memory range to see if it has valid
data. The ring_buffer_meta structure is given a new field called
"first_buffer" which holds the address of the first sub-buffer. This is
used to both determine if the other fields are valid as well as finding
the offset between the old addresses of the sub-buffer from the previous
boot to the new addresses of the current boot.

Since the values for nr_subbufs and subbuf_size is to be the same, check
if the values in the meta page match the values calculated.

Take the range of the first_buffer and the total size of all the buffers
and make sure the saved head_buffer and commit_buffer fall in the range.

Iterate through all the sub-buffers to make sure that the values in the
sub-buffer "commit" field (the field that holds the amount of data on the
sub-buffer) is within the end of the sub-buffer. Also check the index
array to make sure that all the indexes are within nr_subbufs.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 142 ++++++++++++++++++++++++++++++++++---
 1 file changed, 134 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1e06ebe36ad1..e74185a4d864 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -44,6 +44,7 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+	unsigned long	first_buffer;
 	unsigned long	head_buffer;
 	unsigned long	commit_buffer;
 	__u32		subbuf_size;
@@ -1554,20 +1555,101 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
 	return (void *)ptr;
 }
 
+/*
+ * See if the existing memory contains valid ring buffer data.
+ * As the previous kernel must be the same as this kernel, all
+ * the calculations (size of buffers and number of buffers)
+ * must be the same.
+ */
+static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
+			  struct trace_buffer *buffer, int nr_pages)
+{
+	int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+	struct buffer_data_page *subbuf;
+	unsigned long buffers_start;
+	unsigned long buffers_end;
+
+	/* The subbuffer's size and number of subbuffers must match */
+	if (meta->subbuf_size != subbuf_size ||
+	    meta->nr_subbufs != nr_pages + 1) {
+		pr_info("Ring buffer boot meta [%d] mismatch of subbuf_size/nr_pages\n", cpu);
+		return false;
+	}
+
+	buffers_start = meta->first_buffer;
+	buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
+
+	/* Is the head and commit buffers within the range of buffers? */
+	if (meta->head_buffer < buffers_start ||
+	    meta->head_buffer >= buffers_end) {
+		pr_info("Ring buffer boot meta [%d] head buffer out of range\n", cpu);
+		return false;
+	}
+
+	if (meta->commit_buffer < buffers_start ||
+	    meta->commit_buffer >= buffers_end) {
+		pr_info("Ring buffer boot meta [%d] commit buffer out of range\n", cpu);
+		return false;
+	}
+
+	subbuf = rb_subbufs_from_meta(meta);
+
+	/* Is the meta buffers and the subbufs themselves have correct data? */
+	for (int i = 0; i < meta->nr_subbufs; i++) {
+		if (meta->buffers[i] < 0 ||
+		    meta->buffers[i] >= meta->nr_subbufs) {
+			pr_info("Ring buffer boot meta [%d] array out of range\n", cpu);
+			return false;
+		}
+
+		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
+			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
+			return false;
+		}
+
+		subbuf = (void *)subbuf + subbuf_size;
+	}
+
+	pr_info("Ring buffer meta is from previous boot!\n");
+	return true;
+}
+
 static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
 {
 	struct ring_buffer_meta *meta;
+	unsigned long delta;
 	void *subbuf;
 	int cpu;
 
 	for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		void *next_meta;
+
 		meta = rb_range_meta(buffer, nr_pages, cpu);
 
+		if (rb_meta_valid(meta, cpu, buffer, nr_pages)) {
+			/* Make the mappings match the current address */
+			subbuf = rb_subbufs_from_meta(meta);
+			delta = (unsigned long)subbuf - meta->first_buffer;
+			meta->first_buffer += delta;
+			meta->head_buffer += delta;
+			meta->commit_buffer += delta;
+			continue;
+		}
+
+		if (cpu < nr_cpu_ids - 1)
+			next_meta = rb_range_meta(buffer, nr_pages, cpu + 1);
+		else
+			next_meta = (void *)buffer->range_addr_end;
+
+		memset(meta, 0, next_meta - (void *)meta);
+
 		meta->nr_subbufs = nr_pages + 1;
 		meta->subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
 
 		subbuf = rb_subbufs_from_meta(meta);
 
+		meta->first_buffer = (unsigned long)subbuf;
+
 		/*
 		 * The buffers[] array holds the order of the sub-buffers
 		 * that are after the meta data. The sub-buffers may
@@ -1659,10 +1741,26 @@ int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, in
 	return 0;
 }
 
+/* Map the buffer_pages to the previous head and commit pages */
+static void rb_meta_buffer_update(struct ring_buffer_per_cpu *cpu_buffer,
+				  struct buffer_page *bpage)
+{
+	struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+
+	if (meta->head_buffer == (unsigned long)bpage->page)
+		cpu_buffer->head_page = bpage;
+
+	if (meta->commit_buffer == (unsigned long)bpage->page) {
+		cpu_buffer->commit_page = bpage;
+		cpu_buffer->tail_page = bpage;
+	}
+}
+
 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		long nr_pages, struct list_head *pages)
 {
 	struct trace_buffer *buffer = cpu_buffer->buffer;
+	struct ring_buffer_meta *meta = NULL;
 	struct buffer_page *bpage, *tmp;
 	bool user_thread = current->mm != NULL;
 	gfp_t mflags;
@@ -1697,6 +1795,10 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	 */
 	if (user_thread)
 		set_current_oom_origin();
+
+	if (buffer->range_addr_start)
+		meta = rb_range_meta(buffer, nr_pages, cpu_buffer->cpu);
+
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
 
@@ -1709,11 +1811,14 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 		list_add_tail(&bpage->list, pages);
 
-		if (buffer->range_addr_start) {
+		if (meta) {
 			/* A range was given. Use that for the buffer page */
 			bpage->page = rb_range_buffer(cpu_buffer, i + 1);
 			if (!bpage->page)
 				goto free_pages;
+			/* If this is valid from a previous boot */
+			if (meta->head_buffer)
+				rb_meta_buffer_update(cpu_buffer, bpage);
 			bpage->range = 1;
 			bpage->id = i + 1;
 		} else {
@@ -1775,6 +1880,7 @@ static struct ring_buffer_per_cpu *
 rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
+	struct ring_buffer_meta *meta;
 	struct buffer_page *bpage;
 	struct page *page;
 	int ret;
@@ -1815,6 +1921,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 		bpage->page = rb_range_buffer(cpu_buffer, 0);
 		if (!bpage->page)
 			goto fail_free_reader;
+		if (cpu_buffer->ring_meta->head_buffer)
+			rb_meta_buffer_update(cpu_buffer, bpage);
 		bpage->range = 1;
 	} else {
 		page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO,
@@ -1832,14 +1940,32 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 	if (ret < 0)
 		goto fail_free_reader;
 
-	cpu_buffer->head_page
-		= list_entry(cpu_buffer->pages, struct buffer_page, list);
-	cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page;
+	/* If the boot meta was valid then this has already been updated */
+	meta = cpu_buffer->ring_meta;
+	if (!meta || !meta->head_buffer ||
+	    !cpu_buffer->head_page || !cpu_buffer->commit_page || !cpu_buffer->tail_page) {
+		if (meta && meta->head_buffer &&
+		    (cpu_buffer->head_page || cpu_buffer->commit_page || cpu_buffer->tail_page)) {
+			pr_warn("Ring buffer meta buffers not all mapped\n");
+			if (!cpu_buffer->head_page)
+				pr_warn("   Missing head_page\n");
+			if (!cpu_buffer->commit_page)
+				pr_warn("   Missing commit_page\n");
+			if (!cpu_buffer->tail_page)
+				pr_warn("   Missing tail_page\n");
+		}
 
-	rb_head_page_activate(cpu_buffer);
-	if (cpu_buffer->ring_meta) {
-		struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
-		meta->commit_buffer = meta->head_buffer;
+		cpu_buffer->head_page
+			= list_entry(cpu_buffer->pages, struct buffer_page, list);
+		cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page;
+
+		rb_head_page_activate(cpu_buffer);
+
+		if (cpu_buffer->ring_meta)
+			meta->commit_buffer = meta->head_buffer;
+	} else {
+		/* The valid meta buffer still needs to activate the head page */
+		rb_head_page_activate(cpu_buffer);
 	}
 
 	return cpu_buffer;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/8] ring-buffer: Validate boot range memory events
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (6 preceding siblings ...)
  2024-03-06  1:59 ` [PATCH 7/8] ring-buffer: Add test if range of boot buffer is valid Steven Rostedt
@ 2024-03-06  1:59 ` Steven Rostedt
  2024-03-08 15:39   ` kernel test robot
  2024-03-06  2:01 ` [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
  2024-03-09 18:27 ` Kees Cook
  9 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Make sure all the events in each of the sub-buffers that were mapped in a
memory region are valid. This moves the code that walks the buffers for
time-stamp validation out of the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS
ifdef block and is used to validate the content. Only the ring buffer
event meta data is checked and not the data load.

This also has a second purpose. The buffer_page structure that points to
the data sub-buffers has accounting that keeps track of the number of
events that are on the sub-buffer. This updates that counter as well. That
counter is used in reading the buffer and knowing if the ring buffer is
empty or not.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 222 +++++++++++++++++++++++++++----------
 1 file changed, 165 insertions(+), 57 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e74185a4d864..f7b511935fcf 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1610,10 +1610,171 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
 		subbuf = (void *)subbuf + subbuf_size;
 	}
 
-	pr_info("Ring buffer meta is from previous boot!\n");
 	return true;
 }
 
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+
+#ifdef CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS
+static DEFINE_PER_CPU(atomic_t, checking);
+static atomic_t ts_dump;
+
+#define buffer_warn_return(fmt, ...)					\
+	do {								\
+		/* If another report is happening, ignore this one */	\
+		if (atomic_inc_return(&ts_dump) != 1) {			\
+			atomic_dec(&ts_dump);				\
+			goto out;					\
+		}							\
+		atomic_inc(&cpu_buffer->record_disabled);		\
+		pr_warn(fmt, ##__VA_ARGS__);				\
+		dump_buffer_page(bpage, info, tail);			\
+		atomic_dec(&ts_dump);					\
+		/* There's some cases in boot up that this can happen */ \
+		if (WARN_ON_ONCE(system_state != SYSTEM_BOOTING))	\
+			/* Do not re-enable checking */			\
+			return;						\
+	} while (0)
+#else
+#define buffer_warn_return(fmt, ...) do { } while (0)
+#endif
+
+static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu,
+			       unsigned long long *timestamp, bool warn)
+{
+	struct ring_buffer_event *event;
+	u64 ts, delta;
+	int events = 0;
+	int e;
+
+	ts = dpage->time_stamp;
+
+	for (e = 0; e < tail; e += rb_event_length(event)) {
+
+		event = (struct ring_buffer_event *)(dpage->data + e);
+
+		switch (event->type_len) {
+
+		case RINGBUF_TYPE_TIME_EXTEND:
+			delta = rb_event_time_stamp(event);
+			ts += delta;
+			break;
+
+		case RINGBUF_TYPE_TIME_STAMP:
+			delta = rb_event_time_stamp(event);
+			delta = rb_fix_abs_ts(delta, ts);
+			if (warn && delta < ts) {
+				buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
+						   cpu, ts, delta);
+			}
+			ts = delta;
+			break;
+
+		case RINGBUF_TYPE_PADDING:
+			if (event->time_delta == 1)
+				break;
+			fallthrough;
+		case RINGBUF_TYPE_DATA:
+			events++;
+			ts += event->time_delta;
+			break;
+
+		default:
+			return -1;
+		}
+	}
+	*timestamp = ts;
+	return events;
+}
+
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+{
+	unsigned long long ts;
+	int tail;
+
+	tail = local_read(&dpage->commit);
+	return rb_read_data_buffer(dpage, tail, cpu, &ts, false);
+}
+
+/* If the meta data has been validated, now validate the events */
+static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+	struct buffer_page *head_page;
+	unsigned long entry_bytes = 0;
+	unsigned long entries = 0;
+	int ret;
+	int i;
+
+	if (!meta || !meta->head_buffer)
+		return;
+
+	/* Do the reader page first */
+	ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+	if (ret < 0) {
+		printk("INVALID READER PAGE\n");
+		goto invalid;
+	}
+	entries += ret;
+	entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
+	local_set(&cpu_buffer->reader_page->entries, ret);
+
+	head_page = cpu_buffer->head_page;
+
+	/* If both the head and commit are on the reader_page then we are done. */
+	if (head_page == cpu_buffer->reader_page &&
+	    head_page == cpu_buffer->commit_page)
+		goto done;
+
+	/* Iterate until finding the commit page */
+	for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) {
+
+		/* Reader page has already been done */
+		if (head_page == cpu_buffer->reader_page)
+			continue;
+
+		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+		if (ret < 0) {
+			pr_info("Ring buffer meta [%d] invalid buffer page\n",
+				cpu_buffer->cpu);
+			goto invalid;
+		}
+		entries += ret;
+		entry_bytes += local_read(&head_page->page->commit);
+		local_set(&cpu_buffer->head_page->entries, ret);
+
+		if (head_page == cpu_buffer->commit_page)
+			break;
+	}
+
+	if (head_page != cpu_buffer->commit_page) {
+		pr_info("Ring buffer meta [%d] commit page not found\n",
+			cpu_buffer->cpu);
+		goto invalid;
+	}
+ done:
+	local_set(&cpu_buffer->entries, entries);
+	local_set(&cpu_buffer->entries_bytes, entry_bytes);
+
+	pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+	return;
+
+ invalid:
+	/* The content of the buffers are invalid, reset the meta data */
+	meta->head_buffer = 0;
+	meta->commit_buffer = 0;
+
+	/* Reset the reader page */
+	local_set(&cpu_buffer->reader_page->entries, 0);
+	local_set(&cpu_buffer->reader_page->page->commit, 0);
+
+	/* Reset all the subbuffers */
+	for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) {
+		local_set(&head_page->entries, 0);
+		local_set(&head_page->page->commit, 0);
+	}
+}
+
 static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
 {
 	struct ring_buffer_meta *meta;
@@ -1691,8 +1852,6 @@ static void *rbm_next(struct seq_file *m, void *v, loff_t *pos)
 	return rbm_start(m, pos);
 }
 
-static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
-
 static int rbm_show(struct seq_file *m, void *v)
 {
 	struct ring_buffer_per_cpu *cpu_buffer = m->private;
@@ -1940,6 +2099,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 	if (ret < 0)
 		goto fail_free_reader;
 
+	rb_meta_validate_events(cpu_buffer);
+
 	/* If the boot meta was valid then this has already been updated */
 	meta = cpu_buffer->ring_meta;
 	if (!meta || !meta->head_buffer ||
@@ -3844,26 +4005,6 @@ static void dump_buffer_page(struct buffer_data_page *bpage,
 	pr_warn("expected end:0x%lx last event actually ended at:0x%x\n", tail, e);
 }
 
-static DEFINE_PER_CPU(atomic_t, checking);
-static atomic_t ts_dump;
-
-#define buffer_warn_return(fmt, ...)					\
-	do {								\
-		/* If another report is happening, ignore this one */	\
-		if (atomic_inc_return(&ts_dump) != 1) {			\
-			atomic_dec(&ts_dump);				\
-			goto out;					\
-		}							\
-		atomic_inc(&cpu_buffer->record_disabled);		\
-		pr_warn(fmt, ##__VA_ARGS__);				\
-		dump_buffer_page(bpage, info, tail);			\
-		atomic_dec(&ts_dump);					\
-		/* There's some cases in boot up that this can happen */ \
-		if (WARN_ON_ONCE(system_state != SYSTEM_BOOTING))	\
-			/* Do not re-enable checking */			\
-			return;						\
-	} while (0)
-
 /*
  * Check if the current event time stamp matches the deltas on
  * the buffer page.
@@ -3902,41 +4043,8 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer,
 	if (atomic_inc_return(this_cpu_ptr(&checking)) != 1)
 		goto out;
 
-	ts = bpage->time_stamp;
-
-	for (e = 0; e < tail; e += rb_event_length(event)) {
-
-		event = (struct ring_buffer_event *)(bpage->data + e);
-
-		switch (event->type_len) {
-
-		case RINGBUF_TYPE_TIME_EXTEND:
-			delta = rb_event_time_stamp(event);
-			ts += delta;
-			break;
-
-		case RINGBUF_TYPE_TIME_STAMP:
-			delta = rb_event_time_stamp(event);
-			delta = rb_fix_abs_ts(delta, ts);
-			if (delta < ts) {
-				buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
-						   cpu_buffer->cpu, ts, delta);
-			}
-			ts = delta;
-			break;
+	ret = rb_read_data_buffer(bpage, tail, &ts, cpu_buffer->cpu, true);
 
-		case RINGBUF_TYPE_PADDING:
-			if (event->time_delta == 1)
-				break;
-			fallthrough;
-		case RINGBUF_TYPE_DATA:
-			ts += event->time_delta;
-			break;
-
-		default:
-			RB_WARN_ON(cpu_buffer, 1);
-		}
-	}
 	if ((full && ts > info->ts) ||
 	    (!full && ts + info->delta != info->ts)) {
 		buffer_warn_return("[CPU: %d]TIME DOES NOT MATCH expected:%lld actual:%lld delta:%lld before:%lld after:%lld%s context:%s\n",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (7 preceding siblings ...)
  2024-03-06  1:59 ` [PATCH 8/8] ring-buffer: Validate boot range memory events Steven Rostedt
@ 2024-03-06  2:01 ` Steven Rostedt
  2024-03-06  2:01   ` Steven Rostedt
  2024-03-09 18:27 ` Kees Cook
  9 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  2:01 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

I forgot to add [POC] to the topic.

All these patches are a proof of concept.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
  2024-03-06  2:01 ` [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
@ 2024-03-06  2:01   ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-06  2:01 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

I forgot to add [POC] to the subject :-p

All these patches are a proof of concept.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/8] ring-buffer: Add ring_buffer_meta data
  2024-03-06  1:59 ` [PATCH 5/8] ring-buffer: Add ring_buffer_meta data Steven Rostedt
@ 2024-03-08 10:42   ` kernel test robot
  2024-03-08 10:54   ` kernel test robot
  1 sibling, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-03-08 10:42 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: oe-kbuild-all, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Linux Memory Management List, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

Hi Steven,

kernel test robot noticed the following build warnings:

[auto build test WARNING on next-20240305]
[cannot apply to tip/x86/core akpm-mm/mm-everything linus/master v6.8-rc7 v6.8-rc6 v6.8-rc5 v6.8-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/ring-buffer-Allow-mapped-field-to-be-set-without-mapping/20240306-100047
base:   next-20240305
patch link:    https://lore.kernel.org/r/20240306020006.100449500%40goodmis.org
patch subject: [PATCH 5/8] ring-buffer: Add ring_buffer_meta data
config: sh-defconfig (https://download.01.org/0day-ci/archive/20240308/202403081831.EWSQPo2a-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240308/202403081831.EWSQPo2a-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403081831.EWSQPo2a-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/trace/ring_buffer.c: In function 'rb_set_commit_to_write':
>> kernel/trace/ring_buffer.c:3224:45: warning: assignment to 'long unsigned int' from 'struct buffer_data_page *' makes integer from pointer without a cast [-Wint-conversion]
    3224 |                         meta->commit_buffer = cpu_buffer->commit_page->page;
         |                                             ^


vim +3224 kernel/trace/ring_buffer.c

  3192	
  3193	static __always_inline void
  3194	rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
  3195	{
  3196		unsigned long max_count;
  3197	
  3198		/*
  3199		 * We only race with interrupts and NMIs on this CPU.
  3200		 * If we own the commit event, then we can commit
  3201		 * all others that interrupted us, since the interruptions
  3202		 * are in stack format (they finish before they come
  3203		 * back to us). This allows us to do a simple loop to
  3204		 * assign the commit to the tail.
  3205		 */
  3206	 again:
  3207		max_count = cpu_buffer->nr_pages * 100;
  3208	
  3209		while (cpu_buffer->commit_page != READ_ONCE(cpu_buffer->tail_page)) {
  3210			if (RB_WARN_ON(cpu_buffer, !(--max_count)))
  3211				return;
  3212			if (RB_WARN_ON(cpu_buffer,
  3213				       rb_is_reader_page(cpu_buffer->tail_page)))
  3214				return;
  3215			/*
  3216			 * No need for a memory barrier here, as the update
  3217			 * of the tail_page did it for this page.
  3218			 */
  3219			local_set(&cpu_buffer->commit_page->page->commit,
  3220				  rb_page_write(cpu_buffer->commit_page));
  3221			rb_inc_page(&cpu_buffer->commit_page);
  3222			if (cpu_buffer->ring_meta) {
  3223				struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> 3224				meta->commit_buffer = cpu_buffer->commit_page->page;
  3225			}
  3226			/* add barrier to keep gcc from optimizing too much */
  3227			barrier();
  3228		}
  3229		while (rb_commit_index(cpu_buffer) !=
  3230		       rb_page_write(cpu_buffer->commit_page)) {
  3231	
  3232			/* Make sure the readers see the content of what is committed. */
  3233			smp_wmb();
  3234			local_set(&cpu_buffer->commit_page->page->commit,
  3235				  rb_page_write(cpu_buffer->commit_page));
  3236			RB_WARN_ON(cpu_buffer,
  3237				   local_read(&cpu_buffer->commit_page->page->commit) &
  3238				   ~RB_WRITE_MASK);
  3239			barrier();
  3240		}
  3241	
  3242		/* again, keep gcc from optimizing */
  3243		barrier();
  3244	
  3245		/*
  3246		 * If an interrupt came in just after the first while loop
  3247		 * and pushed the tail page forward, we will be left with
  3248		 * a dangling commit that will never go forward.
  3249		 */
  3250		if (unlikely(cpu_buffer->commit_page != READ_ONCE(cpu_buffer->tail_page)))
  3251			goto again;
  3252	}
  3253	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/8] ring-buffer: Add ring_buffer_meta data
  2024-03-06  1:59 ` [PATCH 5/8] ring-buffer: Add ring_buffer_meta data Steven Rostedt
  2024-03-08 10:42   ` kernel test robot
@ 2024-03-08 10:54   ` kernel test robot
  1 sibling, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-03-08 10:54 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: llvm, oe-kbuild-all, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linux Memory Management List,
	Vincent Donnefort, Joel Fernandes, Daniel Bristot de Oliveira,
	Ingo Molnar, Peter Zijlstra, suleiman, Thomas Gleixner,
	Vineeth Pillai, Youssef Esmat, Beau Belgrave, Alexander Graf,
	Baoquan He, Borislav Petkov, Paul E. McKenney, David Howells

Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20240305]
[cannot apply to tip/x86/core akpm-mm/mm-everything linus/master v6.8-rc7 v6.8-rc6 v6.8-rc5 v6.8-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/ring-buffer-Allow-mapped-field-to-be-set-without-mapping/20240306-100047
base:   next-20240305
patch link:    https://lore.kernel.org/r/20240306020006.100449500%40goodmis.org
patch subject: [PATCH 5/8] ring-buffer: Add ring_buffer_meta data
config: s390-defconfig (https://download.01.org/0day-ci/archive/20240308/202403081843.QYKJKYk4-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 503c55e17037436dcd45ac69dea8967e67e3f5e8)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240308/202403081843.QYKJKYk4-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403081843.QYKJKYk4-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/trace/ring_buffer.c:8:
   In file included from include/linux/trace_events.h:6:
   In file included from include/linux/ring_buffer.h:5:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   In file included from kernel/trace/ring_buffer.c:8:
   In file included from include/linux/trace_events.h:10:
   In file included from include/linux/perf_event.h:62:
   In file included from include/linux/security.h:35:
   include/linux/bpf.h:736:48: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     736 |         ARG_PTR_TO_MAP_VALUE_OR_NULL    = PTR_MAYBE_NULL | ARG_PTR_TO_MAP_VALUE,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:737:43: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     737 |         ARG_PTR_TO_MEM_OR_NULL          = PTR_MAYBE_NULL | ARG_PTR_TO_MEM,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
   include/linux/bpf.h:738:43: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     738 |         ARG_PTR_TO_CTX_OR_NULL          = PTR_MAYBE_NULL | ARG_PTR_TO_CTX,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
   include/linux/bpf.h:739:45: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     739 |         ARG_PTR_TO_SOCKET_OR_NULL       = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:740:44: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     740 |         ARG_PTR_TO_STACK_OR_NULL        = PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~
   include/linux/bpf.h:741:45: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     741 |         ARG_PTR_TO_BTF_ID_OR_NULL       = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:745:38: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     745 |         ARG_PTR_TO_UNINIT_MEM           = MEM_UNINIT | ARG_PTR_TO_MEM,
         |                                           ~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
   include/linux/bpf.h:747:45: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_arg_type') [-Wenum-enum-conversion]
     747 |         ARG_PTR_TO_FIXED_SIZE_MEM       = MEM_FIXED_SIZE | ARG_PTR_TO_MEM,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
   include/linux/bpf.h:770:48: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     770 |         RET_PTR_TO_MAP_VALUE_OR_NULL    = PTR_MAYBE_NULL | RET_PTR_TO_MAP_VALUE,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:771:45: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     771 |         RET_PTR_TO_SOCKET_OR_NULL       = PTR_MAYBE_NULL | RET_PTR_TO_SOCKET,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:772:47: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     772 |         RET_PTR_TO_TCP_SOCK_OR_NULL     = PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:773:50: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     773 |         RET_PTR_TO_SOCK_COMMON_OR_NULL  = PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:775:49: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     775 |         RET_PTR_TO_DYNPTR_MEM_OR_NULL   = PTR_MAYBE_NULL | RET_PTR_TO_MEM,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
   include/linux/bpf.h:776:45: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     776 |         RET_PTR_TO_BTF_ID_OR_NULL       = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:777:43: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_return_type') [-Wenum-enum-conversion]
     777 |         RET_PTR_TO_BTF_ID_TRUSTED       = PTR_TRUSTED    | RET_PTR_TO_BTF_ID,
         |                                           ~~~~~~~~~~~    ^ ~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:888:44: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_reg_type') [-Wenum-enum-conversion]
     888 |         PTR_TO_MAP_VALUE_OR_NULL        = PTR_MAYBE_NULL | PTR_TO_MAP_VALUE,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~
   include/linux/bpf.h:889:42: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_reg_type') [-Wenum-enum-conversion]
     889 |         PTR_TO_SOCKET_OR_NULL           = PTR_MAYBE_NULL | PTR_TO_SOCKET,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~
   include/linux/bpf.h:890:46: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_reg_type') [-Wenum-enum-conversion]
     890 |         PTR_TO_SOCK_COMMON_OR_NULL      = PTR_MAYBE_NULL | PTR_TO_SOCK_COMMON,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~
   include/linux/bpf.h:891:44: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_reg_type') [-Wenum-enum-conversion]
     891 |         PTR_TO_TCP_SOCK_OR_NULL         = PTR_MAYBE_NULL | PTR_TO_TCP_SOCK,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~
   include/linux/bpf.h:892:42: warning: bitwise operation between different enumeration types ('enum bpf_type_flag' and 'enum bpf_reg_type') [-Wenum-enum-conversion]
     892 |         PTR_TO_BTF_ID_OR_NULL           = PTR_MAYBE_NULL | PTR_TO_BTF_ID,
         |                                           ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:3224:24: error: incompatible pointer to integer conversion assigning to 'unsigned long' from 'struct buffer_data_page *' [-Wint-conversion]
    3224 |                         meta->commit_buffer = cpu_buffer->commit_page->page;
         |                                             ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   25 warnings and 1 error generated.


vim +3224 kernel/trace/ring_buffer.c

  3192	
  3193	static __always_inline void
  3194	rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
  3195	{
  3196		unsigned long max_count;
  3197	
  3198		/*
  3199		 * We only race with interrupts and NMIs on this CPU.
  3200		 * If we own the commit event, then we can commit
  3201		 * all others that interrupted us, since the interruptions
  3202		 * are in stack format (they finish before they come
  3203		 * back to us). This allows us to do a simple loop to
  3204		 * assign the commit to the tail.
  3205		 */
  3206	 again:
  3207		max_count = cpu_buffer->nr_pages * 100;
  3208	
  3209		while (cpu_buffer->commit_page != READ_ONCE(cpu_buffer->tail_page)) {
  3210			if (RB_WARN_ON(cpu_buffer, !(--max_count)))
  3211				return;
  3212			if (RB_WARN_ON(cpu_buffer,
  3213				       rb_is_reader_page(cpu_buffer->tail_page)))
  3214				return;
  3215			/*
  3216			 * No need for a memory barrier here, as the update
  3217			 * of the tail_page did it for this page.
  3218			 */
  3219			local_set(&cpu_buffer->commit_page->page->commit,
  3220				  rb_page_write(cpu_buffer->commit_page));
  3221			rb_inc_page(&cpu_buffer->commit_page);
  3222			if (cpu_buffer->ring_meta) {
  3223				struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
> 3224				meta->commit_buffer = cpu_buffer->commit_page->page;
  3225			}
  3226			/* add barrier to keep gcc from optimizing too much */
  3227			barrier();
  3228		}
  3229		while (rb_commit_index(cpu_buffer) !=
  3230		       rb_page_write(cpu_buffer->commit_page)) {
  3231	
  3232			/* Make sure the readers see the content of what is committed. */
  3233			smp_wmb();
  3234			local_set(&cpu_buffer->commit_page->page->commit,
  3235				  rb_page_write(cpu_buffer->commit_page));
  3236			RB_WARN_ON(cpu_buffer,
  3237				   local_read(&cpu_buffer->commit_page->page->commit) &
  3238				   ~RB_WRITE_MASK);
  3239			barrier();
  3240		}
  3241	
  3242		/* again, keep gcc from optimizing */
  3243		barrier();
  3244	
  3245		/*
  3246		 * If an interrupt came in just after the first while loop
  3247		 * and pushed the tail page forward, we will be left with
  3248		 * a dangling commit that will never go forward.
  3249		 */
  3250		if (unlikely(cpu_buffer->commit_page != READ_ONCE(cpu_buffer->tail_page)))
  3251			goto again;
  3252	}
  3253	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 8/8] ring-buffer: Validate boot range memory events
  2024-03-06  1:59 ` [PATCH 8/8] ring-buffer: Validate boot range memory events Steven Rostedt
@ 2024-03-08 15:39   ` kernel test robot
  0 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-03-08 15:39 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: oe-kbuild-all, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Linux Memory Management List, Vincent Donnefort,
	Joel Fernandes, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20240305]
[cannot apply to tip/x86/core akpm-mm/mm-everything linus/master v6.8-rc7 v6.8-rc6 v6.8-rc5 v6.8-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/ring-buffer-Allow-mapped-field-to-be-set-without-mapping/20240306-100047
base:   next-20240305
patch link:    https://lore.kernel.org/r/20240306020006.586558735%40goodmis.org
patch subject: [PATCH 8/8] ring-buffer: Validate boot range memory events
config: microblaze-allmodconfig (https://download.01.org/0day-ci/archive/20240308/202403082327.SiOUrqxy-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240308/202403082327.SiOUrqxy-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403082327.SiOUrqxy-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   kernel/trace/ring_buffer.c: In function 'rb_read_data_buffer':
>> kernel/trace/ring_buffer.c:1629:29: error: 'cpu_buffer' undeclared (first use in this function)
    1629 |                 atomic_inc(&cpu_buffer->record_disabled);               \
         |                             ^~~~~~~~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c:1629:29: note: each undeclared identifier is reported only once for each function it appears in
    1629 |                 atomic_inc(&cpu_buffer->record_disabled);               \
         |                             ^~~~~~~~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:1631:17: error: implicit declaration of function 'dump_buffer_page'; did you mean 'free_buffer_page'? [-Werror=implicit-function-declaration]
    1631 |                 dump_buffer_page(bpage, info, tail);                    \
         |                 ^~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:1631:34: error: 'bpage' undeclared (first use in this function); did you mean 'dpage'?
    1631 |                 dump_buffer_page(bpage, info, tail);                    \
         |                                  ^~~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:1631:41: error: 'info' undeclared (first use in this function)
    1631 |                 dump_buffer_page(bpage, info, tail);                    \
         |                                         ^~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:1636:25: warning: 'return' with no value, in function returning non-void [-Wreturn-type]
    1636 |                         return;                                         \
         |                         ^~~~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c:1642:12: note: declared here
    1642 | static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu,
         |            ^~~~~~~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:1667:33: error: label 'out' used but not defined
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c: In function 'rb_set_commit_to_write':
   kernel/trace/ring_buffer.c:3588:45: warning: assignment to 'long unsigned int' from 'struct buffer_data_page *' makes integer from pointer without a cast [-Wint-conversion]
    3588 |                         meta->commit_buffer = cpu_buffer->commit_page->page;
         |                                             ^
   kernel/trace/ring_buffer.c: At top level:
>> kernel/trace/ring_buffer.c:3957:13: warning: conflicting types for 'dump_buffer_page'; have 'void(struct buffer_data_page *, struct rb_event_info *, long unsigned int)'
    3957 | static void dump_buffer_page(struct buffer_data_page *bpage,
         |             ^~~~~~~~~~~~~~~~
>> kernel/trace/ring_buffer.c:3957:13: error: static declaration of 'dump_buffer_page' follows non-static declaration
   kernel/trace/ring_buffer.c:1631:17: note: previous implicit declaration of 'dump_buffer_page' with type 'void(struct buffer_data_page *, struct rb_event_info *, long unsigned int)'
    1631 |                 dump_buffer_page(bpage, info, tail);                    \
         |                 ^~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c:1667:33: note: in expansion of macro 'buffer_warn_return'
    1667 |                                 buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
         |                                 ^~~~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c: In function 'check_buffer':
>> kernel/trace/ring_buffer.c:4046:9: error: 'ret' undeclared (first use in this function); did you mean 'net'?
    4046 |         ret = rb_read_data_buffer(bpage, tail, &ts, cpu_buffer->cpu, true);
         |         ^~~
         |         net
>> kernel/trace/ring_buffer.c:4046:48: warning: passing argument 3 of 'rb_read_data_buffer' makes integer from pointer without a cast [-Wint-conversion]
    4046 |         ret = rb_read_data_buffer(bpage, tail, &ts, cpu_buffer->cpu, true);
         |                                                ^~~
         |                                                |
         |                                                u64 * {aka long long unsigned int *}
   kernel/trace/ring_buffer.c:1642:78: note: expected 'int' but argument is of type 'u64 *' {aka 'long long unsigned int *'}
    1642 | static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu,
         |                                                                          ~~~~^~~
>> kernel/trace/ring_buffer.c:4046:63: warning: passing argument 4 of 'rb_read_data_buffer' makes pointer from integer without a cast [-Wint-conversion]
    4046 |         ret = rb_read_data_buffer(bpage, tail, &ts, cpu_buffer->cpu, true);
         |                                                     ~~~~~~~~~~^~~~~
         |                                                               |
         |                                                               int
   kernel/trace/ring_buffer.c:1643:52: note: expected 'long long unsigned int *' but argument is of type 'int'
    1643 |                                unsigned long long *timestamp, bool warn)
         |                                ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
>> kernel/trace/ring_buffer.c:4020:13: warning: unused variable 'e' [-Wunused-variable]
    4020 |         int e;
         |             ^
>> kernel/trace/ring_buffer.c:4018:17: warning: unused variable 'delta' [-Wunused-variable]
    4018 |         u64 ts, delta;
         |                 ^~~~~
>> kernel/trace/ring_buffer.c:4016:35: warning: unused variable 'event' [-Wunused-variable]
    4016 |         struct ring_buffer_event *event;
         |                                   ^~~~~
   cc1: some warnings being treated as errors


vim +/cpu_buffer +1629 kernel/trace/ring_buffer.c

  1621	
  1622	#define buffer_warn_return(fmt, ...)					\
  1623		do {								\
  1624			/* If another report is happening, ignore this one */	\
  1625			if (atomic_inc_return(&ts_dump) != 1) {			\
  1626				atomic_dec(&ts_dump);				\
  1627				goto out;					\
  1628			}							\
> 1629			atomic_inc(&cpu_buffer->record_disabled);		\
  1630			pr_warn(fmt, ##__VA_ARGS__);				\
> 1631			dump_buffer_page(bpage, info, tail);			\
  1632			atomic_dec(&ts_dump);					\
  1633			/* There's some cases in boot up that this can happen */ \
  1634			if (WARN_ON_ONCE(system_state != SYSTEM_BOOTING))	\
  1635				/* Do not re-enable checking */			\
> 1636				return;						\
  1637		} while (0)
  1638	#else
  1639	#define buffer_warn_return(fmt, ...) do { } while (0)
  1640	#endif
  1641	
  1642	static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu,
> 1643				       unsigned long long *timestamp, bool warn)
  1644	{
  1645		struct ring_buffer_event *event;
  1646		u64 ts, delta;
  1647		int events = 0;
  1648		int e;
  1649	
  1650		ts = dpage->time_stamp;
  1651	
  1652		for (e = 0; e < tail; e += rb_event_length(event)) {
  1653	
  1654			event = (struct ring_buffer_event *)(dpage->data + e);
  1655	
  1656			switch (event->type_len) {
  1657	
  1658			case RINGBUF_TYPE_TIME_EXTEND:
  1659				delta = rb_event_time_stamp(event);
  1660				ts += delta;
  1661				break;
  1662	
  1663			case RINGBUF_TYPE_TIME_STAMP:
  1664				delta = rb_event_time_stamp(event);
  1665				delta = rb_fix_abs_ts(delta, ts);
  1666				if (warn && delta < ts) {
> 1667					buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n",
  1668							   cpu, ts, delta);
  1669				}
  1670				ts = delta;
  1671				break;
  1672	
  1673			case RINGBUF_TYPE_PADDING:
  1674				if (event->time_delta == 1)
  1675					break;
  1676				fallthrough;
  1677			case RINGBUF_TYPE_DATA:
  1678				events++;
  1679				ts += event->time_delta;
  1680				break;
  1681	
  1682			default:
  1683				return -1;
  1684			}
  1685		}
  1686		*timestamp = ts;
  1687		return events;
  1688	}
  1689	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
  2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
                   ` (8 preceding siblings ...)
  2024-03-06  2:01 ` [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
@ 2024-03-09 18:27 ` Kees Cook
  2024-03-09 18:51   ` Steven Rostedt
  9 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2024-03-09 18:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Joel Fernandes,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On Tue, Mar 05, 2024 at 08:59:10PM -0500, Steven Rostedt wrote:
> This is a way to map a ring buffer instance across reboots.

As mentioned on Fedi, check out the persistent storage subsystem
(pstore)[1]. It already does what you're starting to construct for RAM
backends (but also supports reed-solomon ECC), and supports several
other backends including EFI storage (which is default enabled on at
least Fedora[2]), block devices, etc. It has an existing mechanism for
handling reservations (including via device tree), and supports multiple
"frontends" including the Oops handler, console output, and even ftrace
which does per-cpu recording and event reconstruction (Joel wrote this
frontend).

It should be pretty straight forward to implement a new frontend if the
ftrace one isn't flexible enough. It's a bit clunky still to add one,
but search for "ftrace" in fs/pstore/ram.c to see how to plumb a new
frontend into the RAM backend.

I continue to want to lift the frontend configuration options up into
the pstore core, since it would avoid a bunch of redundancy, but this is
where we are currently. :)

-Kees

[1] CONFIG_PSTORE et. al. in fs/pstore/ https://docs.kernel.org/admin-guide/ramoops.html
[2] https://www.freedesktop.org/software/systemd/man/latest/systemd-pstore.service.html

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
  2024-03-09 18:27 ` Kees Cook
@ 2024-03-09 18:51   ` Steven Rostedt
  2024-03-09 20:40     ` Kees Cook
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2024-03-09 18:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-trace-kernel, Joel Fernandes,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On Sat, 9 Mar 2024 10:27:47 -0800
Kees Cook <keescook@chromium.org> wrote:

> On Tue, Mar 05, 2024 at 08:59:10PM -0500, Steven Rostedt wrote:
> > This is a way to map a ring buffer instance across reboots.  
> 
> As mentioned on Fedi, check out the persistent storage subsystem
> (pstore)[1]. It already does what you're starting to construct for RAM
> backends (but also supports reed-solomon ECC), and supports several
> other backends including EFI storage (which is default enabled on at
> least Fedora[2]), block devices, etc. It has an existing mechanism for
> handling reservations (including via device tree), and supports multiple
> "frontends" including the Oops handler, console output, and even ftrace
> which does per-cpu recording and event reconstruction (Joel wrote this
> frontend).

Mathieu was telling me about the pmem infrastructure.

This patch set doesn't care where the memory comes from. You just give
it an address and size, and it will do the rest.

> 
> It should be pretty straight forward to implement a new frontend if the
> ftrace one isn't flexible enough. It's a bit clunky still to add one,
> but search for "ftrace" in fs/pstore/ram.c to see how to plumb a new
> frontend into the RAM backend.
> 
> I continue to want to lift the frontend configuration options up into
> the pstore core, since it would avoid a bunch of redundancy, but this is
> where we are currently. :)

Thanks for the info. We use pstore on ChromeOS, but it is currently
restricted to 1MB which is too small for the tracing buffers. From what
I understand, it's also in a specific location where there's only 1MB
available for contiguous memory.

I'm looking at finding a way to get consistent memory outside that
range. That's what I'll be doing next week ;-)

But this code was just to see if I could get a single contiguous range
of memory mapped to ftrace, and this patch set does exactly that.

> 
> -Kees
> 
> [1] CONFIG_PSTORE et. al. in fs/pstore/ https://docs.kernel.org/admin-guide/ramoops.html
> [2] https://www.freedesktop.org/software/systemd/man/latest/systemd-pstore.service.html
> 

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
  2024-03-09 18:51   ` Steven Rostedt
@ 2024-03-09 20:40     ` Kees Cook
  2024-03-20  0:44       ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2024-03-09 20:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Joel Fernandes,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On Sat, Mar 09, 2024 at 01:51:16PM -0500, Steven Rostedt wrote:
> On Sat, 9 Mar 2024 10:27:47 -0800
> Kees Cook <keescook@chromium.org> wrote:
> 
> > On Tue, Mar 05, 2024 at 08:59:10PM -0500, Steven Rostedt wrote:
> > > This is a way to map a ring buffer instance across reboots.  
> > 
> > As mentioned on Fedi, check out the persistent storage subsystem
> > (pstore)[1]. It already does what you're starting to construct for RAM
> > backends (but also supports reed-solomon ECC), and supports several
> > other backends including EFI storage (which is default enabled on at
> > least Fedora[2]), block devices, etc. It has an existing mechanism for
> > handling reservations (including via device tree), and supports multiple
> > "frontends" including the Oops handler, console output, and even ftrace
> > which does per-cpu recording and event reconstruction (Joel wrote this
> > frontend).
> 
> Mathieu was telling me about the pmem infrastructure.

I use nvdimm to back my RAM backend testing with qemu so I can examine
the storage "externally":

RAM_SIZE=16384
NVDIMM_SIZE=200
MAX_SIZE=$(( RAM_SIZE + NVDIMM_SIZE ))
...
qemu-system-x86_64 \
	...
        -machine pc,nvdimm=on \
        -m ${RAM_SIZE}M,slots=2,maxmem=${MAX_SIZE}M \
        -object memory-backend-file,id=mem1,share=on,mem-path=$IMAGES/x86/nvdimm.img,size=${NVDIMM_SIZE}M,align=128M
\
        -device nvdimm,id=nvdimm1,memdev=mem1,label-size=1M \
	...
        -append 'console=uart,io,0x3f8,115200n8 loglevel=8 root=/dev/vda1 ro ramoops.mem_size=1048576 ramoops.ecc=1 ramoops.mem_address=0x440000000 ramoops.console_size=16384 ramoops.ftrace_size=16384 ramoops.pmsg_size=16384 ramoops.record_size=32768 panic=-1 init=/root/resume.sh '"$@"


The part I'd like to get wired up sanely is having pstore find the
nvdimm area automatically, but it never quite happened:
https://lore.kernel.org/lkml/CAGXu5jLtmb3qinZnX3rScUJLUFdf+pRDVPjy=CS4KUtW9tLHtw@mail.gmail.com/

> Thanks for the info. We use pstore on ChromeOS, but it is currently
> restricted to 1MB which is too small for the tracing buffers. From what
> I understand, it's also in a specific location where there's only 1MB
> available for contiguous memory.

That's the area that is specifically hardware backed with persistent
RAM.

> I'm looking at finding a way to get consistent memory outside that
> range. That's what I'll be doing next week ;-)
> 
> But this code was just to see if I could get a single contiguous range
> of memory mapped to ftrace, and this patch set does exactly that.

Well, please take a look at pstore. It should be able to do everything
you mention already; it just needs a way to define multiple regions if
you want to use an area outside of the persistent ram area defined by
Chrome OS's platform driver.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
  2024-03-09 20:40     ` Kees Cook
@ 2024-03-20  0:44       ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2024-03-20  0:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-trace-kernel, Joel Fernandes,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Vincent Donnefort, Daniel Bristot de Oliveira, Ingo Molnar,
	Peter Zijlstra, suleiman, Thomas Gleixner, Vineeth Pillai,
	Youssef Esmat, Beau Belgrave, Alexander Graf, Baoquan He,
	Borislav Petkov, Paul E. McKenney, David Howells

On Sat, 9 Mar 2024 12:40:51 -0800
Kees Cook <keescook@chromium.org> wrote:

> The part I'd like to get wired up sanely is having pstore find the
> nvdimm area automatically, but it never quite happened:
> https://lore.kernel.org/lkml/CAGXu5jLtmb3qinZnX3rScUJLUFdf+pRDVPjy=CS4KUtW9tLHtw@mail.gmail.com/

The automatic detection is what I'm looking for.

> 
> > Thanks for the info. We use pstore on ChromeOS, but it is currently
> > restricted to 1MB which is too small for the tracing buffers. From what
> > I understand, it's also in a specific location where there's only 1MB
> > available for contiguous memory.  
> 
> That's the area that is specifically hardware backed with persistent
> RAM.
> 
> > I'm looking at finding a way to get consistent memory outside that
> > range. That's what I'll be doing next week ;-)
> > 
> > But this code was just to see if I could get a single contiguous range
> > of memory mapped to ftrace, and this patch set does exactly that.  
> 
> Well, please take a look at pstore. It should be able to do everything
> you mention already; it just needs a way to define multiple regions if
> you want to use an area outside of the persistent ram area defined by
> Chrome OS's platform driver.

I'm not exactly sure how to use pstore here. At boot up I just need some
consistent memory reserved for the tracing buffer. It just needs to be the
same location at every boot up.

I don't need a front end. If you mean a way to access it from user space.
The front end is the tracefs directory, as I need all the features that the
tracefs directory gives.

I'm going to look to see how pstore is set up in ChromeOS and see if I can
use whatever it does to allocate another location.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-03-20  0:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-06  1:59 [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
2024-03-06  1:59 ` [PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping Steven Rostedt
2024-03-06  1:59 ` [PATCH 2/8] ring-buffer: Add ring_buffer_alloc_range() Steven Rostedt
2024-03-06  1:59 ` [PATCH 3/8] tracing: Create "boot_mapped" instance for memory mapped buffer Steven Rostedt
2024-03-06  1:59 ` [PATCH 4/8] HACK: Hard code in mapped tracing buffer address Steven Rostedt
2024-03-06  1:59 ` [PATCH 5/8] ring-buffer: Add ring_buffer_meta data Steven Rostedt
2024-03-08 10:42   ` kernel test robot
2024-03-08 10:54   ` kernel test robot
2024-03-06  1:59 ` [PATCH 6/8] ring-buffer: Add output of ring buffer meta page Steven Rostedt
2024-03-06  1:59 ` [PATCH 7/8] ring-buffer: Add test if range of boot buffer is valid Steven Rostedt
2024-03-06  1:59 ` [PATCH 8/8] ring-buffer: Validate boot range memory events Steven Rostedt
2024-03-08 15:39   ` kernel test robot
2024-03-06  2:01 ` [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
2024-03-06  2:01   ` Steven Rostedt
2024-03-09 18:27 ` Kees Cook
2024-03-09 18:51   ` Steven Rostedt
2024-03-09 20:40     ` Kees Cook
2024-03-20  0:44       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).