public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] ring-buffer: Making persistent ring buffers robust
@ 2026-02-26 13:38 Masami Hiramatsu (Google)
  2026-02-26 13:38 ` [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic Masami Hiramatsu (Google)
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-26 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

Hi,

Here is the 5th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:

https://lore.kernel.org/all/177191595057.435720.10116014529518464418.stgit@mhiramat.tok.corp.google.com/

This version fixes multiple issues in the previous version.
 - [1/3] Use appropriate functions for flushing cache and
   stopping event recrod.
 - [2/3] More fixes for masking commit fields and do not move
   unrelated rb_commit_index().
 - [3/3] Show total number of discarded pages instead of
    showing errors on each page.

Thank you,

---

Masami Hiramatsu (Google) (3):
      ring-buffer: Flush and stop persistent ring buffer on panic
      ring-buffer: Handle RB_MISSED_* flags on commit field correctly
      ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer


 kernel/trace/ring_buffer.c |   73 ++++++++++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 24 deletions(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
  2026-02-26 13:38 [PATCH v5 0/3] ring-buffer: Making persistent ring buffers robust Masami Hiramatsu (Google)
@ 2026-02-26 13:38 ` Masami Hiramatsu (Google)
  2026-02-26 17:25   ` kernel test robot
  2026-02-26 18:48   ` kernel test robot
  2026-02-26 13:38 ` [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly Masami Hiramatsu (Google)
  2026-02-26 13:38 ` [PATCH v5 3/3] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer Masami Hiramatsu (Google)
  2 siblings, 2 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-26 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.

To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.

Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v5:
   - Use ring_buffer_record_off() instead of ring_buffer_record_disable().
   - Use flush_cache_all() to ensure flush all cache.
 Changes in v3:
   - update patch description.
---
 kernel/trace/ring_buffer.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f16f053ef77d..0eb6e6595f37 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6,6 +6,7 @@
  */
 #include <linux/sched/isolation.h>
 #include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
 #include <linux/trace_events.h>
 #include <linux/ring_buffer.h>
 #include <linux/trace_clock.h>
@@ -589,6 +590,7 @@ struct trace_buffer {
 
 	unsigned long			range_addr_start;
 	unsigned long			range_addr_end;
+	struct notifier_block		flush_nb;
 
 	struct ring_buffer_meta		*meta;
 
@@ -2471,6 +2473,15 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 	kfree(cpu_buffer);
 }
 
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+	struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+	ring_buffer_record_off(buffer);
+	flush_cache_all();
+	return NOTIFY_DONE;
+}
+
 static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 					 int order, unsigned long start,
 					 unsigned long end,
@@ -2590,6 +2601,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 
 	mutex_init(&buffer->mutex);
 
+	/* Persistent ring buffer needs to flush cache before reboot. */
+	if (start & end) {
+		buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+		atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+	}
+
 	return_ptr(buffer);
 
  fail_free_buffers:
@@ -2677,6 +2694,9 @@ ring_buffer_free(struct trace_buffer *buffer)
 {
 	int cpu;
 
+	if (buffer->range_addr_start && buffer->range_addr_end)
+		atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
 	cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
 
 	irq_work_sync(&buffer->irq_work.work);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly
  2026-02-26 13:38 [PATCH v5 0/3] ring-buffer: Making persistent ring buffers robust Masami Hiramatsu (Google)
  2026-02-26 13:38 ` [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic Masami Hiramatsu (Google)
@ 2026-02-26 13:38 ` Masami Hiramatsu (Google)
  2026-03-05 18:03   ` Steven Rostedt
  2026-02-26 13:38 ` [PATCH v5 3/3] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer Masami Hiramatsu (Google)
  2 siblings, 1 reply; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-26 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Since the MSBs of rb_data_page::commit are used for storing
RB_MISSED_EVENTS and RB_MISSED_STORED, we need to mask out those bits
when it is used for finding the size of data pages.

Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Fixes: 5b7be9c709e1 ("ring-buffer: Add test to validate the time stamp deltas")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v5:
   - Do not move rb_commit_index().
   - Fix verify_event() and rb_cpu_meta_valid() too.
 Changes in v4:
   - Fix to move rb_commit_index() after ring_buffer_per_cpu definition.
---
 kernel/trace/ring_buffer.c |   27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0eb6e6595f37..2e9b8ce6b4dc 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -395,6 +395,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
 	return local_read(&bpage->page->commit);
 }
 
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
 static void free_buffer_page(struct buffer_page *bpage)
 {
 	/* Range pages are not to be freed */
@@ -676,7 +682,7 @@ static void verify_event(struct ring_buffer_per_cpu *cpu_buffer,
 	do {
 		if (page == tail_page || WARN_ON_ONCE(stop++ > 100))
 			done = true;
-		commit = local_read(&page->page->commit);
+		commit = rb_page_size(page);
 		write = local_read(&page->write);
 		if (addr >= (unsigned long)&page->page->data[commit] &&
 		    addr < (unsigned long)&page->page->data[write])
@@ -1820,13 +1826,16 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
 
 	/* Is the meta buffers and the subbufs themselves have correct data? */
 	for (i = 0; i < meta->nr_subbufs; i++) {
+		unsigned long commit;
+
 		if (meta->buffers[i] < 0 ||
 		    meta->buffers[i] >= meta->nr_subbufs) {
 			pr_info("Ring buffer boot meta [%d] array out of range\n", cpu);
 			return false;
 		}
 
-		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
+		commit = local_read(&subbuf->commit) & ~RB_MISSED_MASK;
+		if (commit > subbuf_size) {
 			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
 			return false;
 		}
@@ -1907,7 +1916,7 @@ static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
 	u64 delta;
 	int tail;
 
-	tail = local_read(&dpage->commit);
+	tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
 	return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
 }
 
@@ -1934,7 +1943,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 		goto invalid;
 	}
 	entries += ret;
-	entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
+	entry_bytes += rb_page_size(cpu_buffer->reader_page);
 	local_set(&cpu_buffer->reader_page->entries, ret);
 
 	ts = head_page->page->time_stamp;
@@ -2054,7 +2063,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 			local_inc(&cpu_buffer->pages_touched);
 
 		entries += ret;
-		entry_bytes += local_read(&head_page->page->commit);
+		entry_bytes += rb_page_size(head_page);
 		local_set(&cpu_buffer->head_page->entries, ret);
 
 		if (head_page == cpu_buffer->commit_page)
@@ -3256,12 +3265,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return NULL;
 }
 
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
-	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
 static __always_inline unsigned
 rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
 {
@@ -4432,7 +4435,7 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer,
 
 	if (tail == CHECK_FULL_PAGE) {
 		full = true;
-		tail = local_read(&bpage->commit);
+		tail = local_read(&bpage->commit) & ~RB_MISSED_MASK;
 	} else if (info->add_timestamp &
 		   (RB_ADD_STAMP_FORCE | RB_ADD_STAMP_ABSOLUTE)) {
 		/* Ignore events with absolute time stamps */


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 3/3] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
  2026-02-26 13:38 [PATCH v5 0/3] ring-buffer: Making persistent ring buffers robust Masami Hiramatsu (Google)
  2026-02-26 13:38 ` [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic Masami Hiramatsu (Google)
  2026-02-26 13:38 ` [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly Masami Hiramatsu (Google)
@ 2026-02-26 13:38 ` Masami Hiramatsu (Google)
  2 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-02-26 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Also, mark there are
missed events on the discarded buffer.

If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffersa that ar found to be corrupted.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
  Changes in v5:
  - Instead of showing errors for each page, just show the number
    of discarded pages at last.
  Changes in v3:
  - Record missed data event on commit.
---
 kernel/trace/ring_buffer.c |   28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2e9b8ce6b4dc..cfe1e15f1a93 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1927,6 +1927,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	struct buffer_page *head_page, *orig_head;
 	unsigned long entry_bytes = 0;
 	unsigned long entries = 0;
+	int discarded = 0;
 	int ret;
 	u64 ts;
 	int i;
@@ -2053,19 +2054,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 
 		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
 		if (ret < 0) {
-			pr_info("Ring buffer meta [%d] invalid buffer page\n",
-				cpu_buffer->cpu);
-			goto invalid;
-		}
-
-		/* If the buffer has content, update pages_touched */
-		if (ret)
-			local_inc(&cpu_buffer->pages_touched);
-
-		entries += ret;
-		entry_bytes += rb_page_size(head_page);
-		local_set(&cpu_buffer->head_page->entries, ret);
+			discarded++;
+			/* Instead of discard whole ring buffer, discard only this sub-buffer. */
+			local_set(&head_page->entries, 0);
+			local_set(&head_page->page->commit, RB_MISSED_EVENTS);
+		} else {
+			/* If the buffer has content, update pages_touched */
+			if (ret)
+				local_inc(&cpu_buffer->pages_touched);
 
+			entries += ret;
+			entry_bytes += rb_page_size(head_page);
+			local_set(&cpu_buffer->head_page->entries, ret);
+		}
 		if (head_page == cpu_buffer->commit_page)
 			break;
 	}
@@ -2079,7 +2080,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	local_set(&cpu_buffer->entries, entries);
 	local_set(&cpu_buffer->entries_bytes, entry_bytes);
 
-	pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+	pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
+		cpu_buffer->cpu, discarded);
 	return;
 
  invalid:


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
  2026-02-26 13:38 ` [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic Masami Hiramatsu (Google)
@ 2026-02-26 17:25   ` kernel test robot
  2026-02-26 18:48   ` kernel test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2026-02-26 17:25 UTC (permalink / raw)
  To: Masami Hiramatsu (Google), Steven Rostedt
  Cc: llvm, oe-kbuild-all, Masami Hiramatsu, Mathieu Desnoyers,
	linux-kernel, linux-trace-kernel

Hi Masami,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v7.0-rc1 next-20260226]
[cannot apply to trace/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/ring-buffer-Flush-and-stop-persistent-ring-buffer-on-panic/20260226-222418
base:   linus/master
patch link:    https://lore.kernel.org/r/177211311593.419230.2212568977306190482.stgit%40mhiramat.tok.corp.google.com
patch subject: [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
config: sparc64-defconfig (https://download.01.org/0day-ci/archive/20260227/202602270132.zddhkLDS-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260227/202602270132.zddhkLDS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602270132.zddhkLDS-lkp@intel.com/

All errors (new ones prefixed by >>):

>> kernel/trace/ring_buffer.c:2481:2: error: call to undeclared function 'flush_cache_all'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    2481 |         flush_cache_all();
         |         ^
   kernel/trace/ring_buffer.c:2481:2: note: did you mean 'flush_dcache_page'?
   arch/sparc/include/asm/cacheflush_64.h:51:20: note: 'flush_dcache_page' declared here
      51 | static inline void flush_dcache_page(struct page *page)
         |                    ^
   1 error generated.


vim +/flush_cache_all +2481 kernel/trace/ring_buffer.c

  2475	
  2476	static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
  2477	{
  2478		struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
  2479	
  2480		ring_buffer_record_off(buffer);
> 2481		flush_cache_all();
  2482		return NOTIFY_DONE;
  2483	}
  2484	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
  2026-02-26 13:38 ` [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic Masami Hiramatsu (Google)
  2026-02-26 17:25   ` kernel test robot
@ 2026-02-26 18:48   ` kernel test robot
  2026-02-27  7:44     ` Masami Hiramatsu
  1 sibling, 1 reply; 11+ messages in thread
From: kernel test robot @ 2026-02-26 18:48 UTC (permalink / raw)
  To: Masami Hiramatsu (Google), Steven Rostedt
  Cc: oe-kbuild-all, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

Hi Masami,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v7.0-rc1 next-20260226]
[cannot apply to trace/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/ring-buffer-Flush-and-stop-persistent-ring-buffer-on-panic/20260226-222418
base:   linus/master
patch link:    https://lore.kernel.org/r/177211311593.419230.2212568977306190482.stgit%40mhiramat.tok.corp.google.com
patch subject: [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
config: sparc-randconfig-001-20260227 (https://download.01.org/0day-ci/archive/20260227/202602270244.3JWhusi4-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260227/202602270244.3JWhusi4-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602270244.3JWhusi4-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/trace/ring_buffer.c: In function 'rb_flush_buffer_cb':
>> kernel/trace/ring_buffer.c:2481:9: error: implicit declaration of function 'flush_cache_all'; did you mean 'flush_cache_page'? [-Werror=implicit-function-declaration]
    2481 |         flush_cache_all();
         |         ^~~~~~~~~~~~~~~
         |         flush_cache_page
   cc1: some warnings being treated as errors


vim +2481 kernel/trace/ring_buffer.c

  2475	
  2476	static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
  2477	{
  2478		struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
  2479	
  2480		ring_buffer_record_off(buffer);
> 2481		flush_cache_all();
  2482		return NOTIFY_DONE;
  2483	}
  2484	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
  2026-02-26 18:48   ` kernel test robot
@ 2026-02-27  7:44     ` Masami Hiramatsu
  0 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2026-02-27  7:44 UTC (permalink / raw)
  To: kernel test robot
  Cc: Steven Rostedt, oe-kbuild-all, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

On Fri, 27 Feb 2026 02:48:26 +0800
kernel test robot <lkp@intel.com> wrote:

> Hi Masami,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v7.0-rc1 next-20260226]
> [cannot apply to trace/for-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/ring-buffer-Flush-and-stop-persistent-ring-buffer-on-panic/20260226-222418
> base:   linus/master
> patch link:    https://lore.kernel.org/r/177211311593.419230.2212568977306190482.stgit%40mhiramat.tok.corp.google.com
> patch subject: [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic
> config: sparc-randconfig-001-20260227 (https://download.01.org/0day-ci/archive/20260227/202602270244.3JWhusi4-lkp@intel.com/config)
> compiler: sparc64-linux-gcc (GCC) 11.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260227/202602270244.3JWhusi4-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202602270244.3JWhusi4-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    kernel/trace/ring_buffer.c: In function 'rb_flush_buffer_cb':
> >> kernel/trace/ring_buffer.c:2481:9: error: implicit declaration of function 'flush_cache_all'; did you mean 'flush_cache_page'? [-Werror=implicit-function-declaration]
>     2481 |         flush_cache_all();
>          |         ^~~~~~~~~~~~~~~
>          |         flush_cache_page
>    cc1: some warnings being treated as errors

I guess this is a miss of the sparc. Anyway, I will add asm/ring_buffer.h
to define a wrapper macro. At least I need to sync the cache on arm64 but
flush_cache_all() is a dummy on arm64. I would like to use dcache_clear_pop()
instead.

Thanks,

> 
> 
> vim +2481 kernel/trace/ring_buffer.c
> 
>   2475	
>   2476	static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
>   2477	{
>   2478		struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
>   2479	
>   2480		ring_buffer_record_off(buffer);
> > 2481		flush_cache_all();
>   2482		return NOTIFY_DONE;
>   2483	}
>   2484	
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly
  2026-02-26 13:38 ` [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly Masami Hiramatsu (Google)
@ 2026-03-05 18:03   ` Steven Rostedt
  2026-03-06  8:46     ` Masami Hiramatsu
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2026-03-05 18:03 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel

On Thu, 26 Feb 2026 22:38:43 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Since the MSBs of rb_data_page::commit are used for storing
> RB_MISSED_EVENTS and RB_MISSED_STORED, we need to mask out those bits
> when it is used for finding the size of data pages.
> 
> Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
> Fixes: 5b7be9c709e1 ("ring-buffer: Add test to validate the time stamp deltas")
> Cc: stable@vger.kernel.org

This is unneeded for the current way things work.

The missed events flags are added when a page is read, so the commits in
the write buffer should never have those flags set. If they did, the ring
buffer code itself would break.

But as patch 3 is adding a flag, you should likely merge this and patch 3
together, as the only way that flag would get set is if the validator set
it on a previous boot. And then this would be needed for subsequent boots
that did not reset the buffer.

Hmm, I don't think we even need to do that! Because if it is set, it would
simply warn again that a page is invalid, and I think we *want* that! As it
would preserve that pages were invalid and not be cleared with a simple
reboot.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly
  2026-03-05 18:03   ` Steven Rostedt
@ 2026-03-06  8:46     ` Masami Hiramatsu
  2026-03-06 14:59       ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Masami Hiramatsu @ 2026-03-06  8:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel

On Thu, 5 Mar 2026 13:03:48 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 26 Feb 2026 22:38:43 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Since the MSBs of rb_data_page::commit are used for storing
> > RB_MISSED_EVENTS and RB_MISSED_STORED, we need to mask out those bits
> > when it is used for finding the size of data pages.
> > 
> > Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
> > Fixes: 5b7be9c709e1 ("ring-buffer: Add test to validate the time stamp deltas")
> > Cc: stable@vger.kernel.org
> 
> This is unneeded for the current way things work.
> 
> The missed events flags are added when a page is read, so the commits in
> the write buffer should never have those flags set. If they did, the ring
> buffer code itself would break.

Hmm, but commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent 
ring buffer on reboot") may change it. Maybe we should treat it while
unwinding it?

> 
> But as patch 3 is adding a flag, you should likely merge this and patch 3
> together, as the only way that flag would get set is if the validator set
> it on a previous boot. And then this would be needed for subsequent boots
> that did not reset the buffer.

It is OK to combine these 2 patches. But my question is that when the flag
must be checked and when it must be ignored. Since the flags are encoded
to commit, if that is used for limiting or indexing inside the page,
we must mask the flag or check the max size to avoid accessing outside of
the subpage.

> 
> Hmm, I don't think we even need to do that! Because if it is set, it would
> simply warn again that a page is invalid, and I think we *want* that! As it
> would preserve that pages were invalid and not be cleared with a simple
> reboot.

OK, then I don't mark it, but just invalidate the subpage.

Thanks,

> 
> -- Steve
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly
  2026-03-06  8:46     ` Masami Hiramatsu
@ 2026-03-06 14:59       ` Steven Rostedt
  2026-03-10 10:11         ` Masami Hiramatsu
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2026-03-06 14:59 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel

On Fri, 6 Mar 2026 17:46:09 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Thu, 5 Mar 2026 13:03:48 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, 26 Feb 2026 22:38:43 +0900
> > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> >   
> > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > 
> > > Since the MSBs of rb_data_page::commit are used for storing
> > > RB_MISSED_EVENTS and RB_MISSED_STORED, we need to mask out those bits
> > > when it is used for finding the size of data pages.
> > > 
> > > Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
> > > Fixes: 5b7be9c709e1 ("ring-buffer: Add test to validate the time stamp deltas")
> > > Cc: stable@vger.kernel.org  
> > 
> > This is unneeded for the current way things work.
> > 
> > The missed events flags are added when a page is read, so the commits in
> > the write buffer should never have those flags set. If they did, the ring
> > buffer code itself would break.  
> 
> Hmm, but commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent 
> ring buffer on reboot") may change it. Maybe we should treat it while
> unwinding it?

Might change what?

> 
> > 
> > But as patch 3 is adding a flag, you should likely merge this and patch 3
> > together, as the only way that flag would get set is if the validator set
> > it on a previous boot. And then this would be needed for subsequent boots
> > that did not reset the buffer.  
> 
> It is OK to combine these 2 patches. But my question is that when the flag
> must be checked and when it must be ignored. Since the flags are encoded
> to commit, if that is used for limiting or indexing inside the page,
> we must mask the flag or check the max size to avoid accessing outside of
> the subpage.
> 
> > 
> > Hmm, I don't think we even need to do that! Because if it is set, it would
> > simply warn again that a page is invalid, and I think we *want* that! As it
> > would preserve that pages were invalid and not be cleared with a simple
> > reboot.  
> 
> OK, then I don't mark it, but just invalidate the subpage.

No, I mean you can mark it, but then just have the validator skip it.
Something like:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 156ed19fb569..c98ab86cf384 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1950,6 +1951,10 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	rb_dec_page(&head_page);
 	for (i = 0; i < meta->nr_subbufs + 1; i++, rb_dec_page(&head_page)) {
 
+		/* Skip if this buffer was flagged as bad before */
+		if (local_read(&head_page->page->commit) ==  RB_MISSED_EVENTS)
+			continue;
+
 		/* Rewind until tail (writer) page. */
 		if (head_page == cpu_buffer->tail_page)
 			break;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly
  2026-03-06 14:59       ` Steven Rostedt
@ 2026-03-10 10:11         ` Masami Hiramatsu
  0 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2026-03-10 10:11 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel

On Fri, 6 Mar 2026 09:59:35 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 6 Mar 2026 17:46:09 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > On Thu, 5 Mar 2026 13:03:48 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Thu, 26 Feb 2026 22:38:43 +0900
> > > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> > >   
> > > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > > 
> > > > Since the MSBs of rb_data_page::commit are used for storing
> > > > RB_MISSED_EVENTS and RB_MISSED_STORED, we need to mask out those bits
> > > > when it is used for finding the size of data pages.
> > > > 
> > > > Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
> > > > Fixes: 5b7be9c709e1 ("ring-buffer: Add test to validate the time stamp deltas")
> > > > Cc: stable@vger.kernel.org  
> > > 
> > > This is unneeded for the current way things work.
> > > 
> > > The missed events flags are added when a page is read, so the commits in
> > > the write buffer should never have those flags set. If they did, the ring
> > > buffer code itself would break.  
> > 
> > Hmm, but commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent 
> > ring buffer on reboot") may change it. Maybe we should treat it while
> > unwinding it?
> 
> Might change what?

So the above commit removes clearing (resetting) meta->commit after read the
page (thus the missed events flags will be there). But those pages will be
recovered (rewound) after reboot. Thus the validation runs on those pages
(and detect invalid pages)

> 
> > 
> > > 
> > > But as patch 3 is adding a flag, you should likely merge this and patch 3
> > > together, as the only way that flag would get set is if the validator set
> > > it on a previous boot. And then this would be needed for subsequent boots
> > > that did not reset the buffer.  
> > 
> > It is OK to combine these 2 patches. But my question is that when the flag
> > must be checked and when it must be ignored. Since the flags are encoded
> > to commit, if that is used for limiting or indexing inside the page,
> > we must mask the flag or check the max size to avoid accessing outside of
> > the subpage.
> > 
> > > 
> > > Hmm, I don't think we even need to do that! Because if it is set, it would
> > > simply warn again that a page is invalid, and I think we *want* that! As it
> > > would preserve that pages were invalid and not be cleared with a simple
> > > reboot.  
> > 
> > OK, then I don't mark it, but just invalidate the subpage.
> 
> No, I mean you can mark it, but then just have the validator skip it.
> Something like:

This may not work because RB_MISSED_EVENTS are added, instead of set.
So the commit will be `original_commit + RB_MISSED_EVENTS`.

Anyway, in v8, any of invalid (out of range) commit value is treated
as corrupted in rb_validate_buffer().

Thanks,

> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 156ed19fb569..c98ab86cf384 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1950,6 +1951,10 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
>  	rb_dec_page(&head_page);
>  	for (i = 0; i < meta->nr_subbufs + 1; i++, rb_dec_page(&head_page)) {
>  
> +		/* Skip if this buffer was flagged as bad before */
> +		if (local_read(&head_page->page->commit) ==  RB_MISSED_EVENTS)
> +			continue;
> +
>  		/* Rewind until tail (writer) page. */
>  		if (head_page == cpu_buffer->tail_page)
>  			break;
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-10 10:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 13:38 [PATCH v5 0/3] ring-buffer: Making persistent ring buffers robust Masami Hiramatsu (Google)
2026-02-26 13:38 ` [PATCH v5 1/3] ring-buffer: Flush and stop persistent ring buffer on panic Masami Hiramatsu (Google)
2026-02-26 17:25   ` kernel test robot
2026-02-26 18:48   ` kernel test robot
2026-02-27  7:44     ` Masami Hiramatsu
2026-02-26 13:38 ` [PATCH v5 2/3] ring-buffer: Handle RB_MISSED_* flags on commit field correctly Masami Hiramatsu (Google)
2026-03-05 18:03   ` Steven Rostedt
2026-03-06  8:46     ` Masami Hiramatsu
2026-03-06 14:59       ` Steven Rostedt
2026-03-10 10:11         ` Masami Hiramatsu
2026-02-26 13:38 ` [PATCH v5 3/3] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer Masami Hiramatsu (Google)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox