Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation
From: Matthew Wilcox @ 2026-03-09 14:47 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Brendan Jackman,
	Johannes Weiner, Zi Yan, Oscar Salvador, Qi Zheng, Shakeel Butt,
	linux-kernel, linux-mm, linux-trace-kernel, linux-pm
In-Reply-To: <aa7XdpIVtLFS3FIu@shell.ilvokhin.com>

On Mon, Mar 09, 2026 at 02:21:42PM +0000, Dmitry Ilvokhin wrote:
> On Mon, Mar 09, 2026 at 01:10:46PM +0000, Matthew Wilcox wrote:
> > On Fri, Feb 27, 2026 at 04:00:22PM +0000, Dmitry Ilvokhin wrote:
> > > This patch series adds dedicated tracepoint instrumentation to
> > > zone lock, following the existing mmap_lock tracing model.
> > 
> > I don't like this at all.  We have CONFIG_LOCK_STAT.  That should be
> > improved insted of coming up with one-offs for every single lock
> > that someone deems "special".
> 
> Thanks for the feedback, Matthew.
> 
> CONFIG_LOCK_STAT provides useful statistics, but it is primarily a
> debug facility and is generally too heavyweight for the production
> environments.

Yes, agreed.  I think that is what needs to change.

> The motivation for this series was to provide lightweight observability
> for the zone lock in production workloads.

I read that.  But first it was the mmap lock.  Now it's the zone lock.
Which lock will be next?  This is too heavyweight a procedure to
follow for each lock of interest.

> I agree that improving generic lock instrumentation would be preferable.
> I did consider whether something similar could be done generically for
> spinlocks, but the unlock path there is typically just a single atomic
> store, so adding generic lightweight instrumentation without affecting
> the fast path is difficult.

This is why we have tracepoint_enabled() and friends.  But ... LOCK_STAT
doesn't affect the unlock path at all.  It only changes the acquire side
to call lock_acquired() (and lock_contended() if the trylock failed).

> In parallel, I've been experimenting with improving observability for
> sleepable locks by adding a contended_release tracepoint, which would
> allow correlating lock holders and waiters in a more generic way. I've
> posted an RFC here:
> 
> https://lore.kernel.org/all/cover.1772642407.git.d@ilvokhin.com/
> 
> I'd appreciate feedback on whether that direction makes sense for
> improving the generic lock tracing infrastructure.

It seems fine to me, but I don't have the depth in the locking code to
give it the thorough review it deserves.

^ permalink raw reply

* Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation
From: Dmitry Ilvokhin @ 2026-03-09 14:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Brendan Jackman,
	Johannes Weiner, Zi Yan, Oscar Salvador, Qi Zheng, Shakeel Butt,
	linux-kernel, linux-mm, linux-trace-kernel, linux-pm
In-Reply-To: <aa7G1nD7Rd9F4eBH@casper.infradead.org>

On Mon, Mar 09, 2026 at 01:10:46PM +0000, Matthew Wilcox wrote:
> On Fri, Feb 27, 2026 at 04:00:22PM +0000, Dmitry Ilvokhin wrote:
> > Zone lock contention can significantly impact allocation and
> > reclaim latency, as it is a central synchronization point in
> > the page allocator and reclaim paths. Improved visibility into
> > its behavior is therefore important for diagnosing performance
> > issues in memory-intensive workloads.
> > 
> > On some production workloads at Meta, we have observed noticeable
> > zone lock contention. Deeper analysis of lock holders and waiters
> > is currently difficult with existing instrumentation.
> > 
> > While generic lock contention_begin/contention_end tracepoints
> > cover the slow path, they do not provide sufficient visibility
> > into lock hold times. In particular, the lack of a release-side
> > event makes it difficult to identify long lock holders and
> > correlate them with waiters. As a result, distinguishing between
> > short bursts of contention and pathological long hold times
> > requires additional instrumentation.
> > 
> > This patch series adds dedicated tracepoint instrumentation to
> > zone lock, following the existing mmap_lock tracing model.
> 
> I don't like this at all.  We have CONFIG_LOCK_STAT.  That should be
> improved insted of coming up with one-offs for every single lock
> that someone deems "special".

Thanks for the feedback, Matthew.

CONFIG_LOCK_STAT provides useful statistics, but it is primarily a
debug facility and is generally too heavyweight for the production
environments.

The motivation for this series was to provide lightweight observability
for the zone lock in production workloads.

I agree that improving generic lock instrumentation would be preferable.
I did consider whether something similar could be done generically for
spinlocks, but the unlock path there is typically just a single atomic
store, so adding generic lightweight instrumentation without affecting
the fast path is difficult.

In parallel, I've been experimenting with improving observability for
sleepable locks by adding a contended_release tracepoint, which would
allow correlating lock holders and waiters in a more generic way. I've
posted an RFC here:

https://lore.kernel.org/all/cover.1772642407.git.d@ilvokhin.com/

I'd appreciate feedback on whether that direction makes sense for
improving the generic lock tracing infrastructure.

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Steven Rostedt @ 2026-03-09 13:53 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260309223244.3be47d8abbb91b87680da2a3@kernel.org>

On Mon, 9 Mar 2026 22:32:44 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > > Hmm, but if the kernel crash and reboot when it sets RB_MISSED_EVENTS,
> > > we will see the bit is set and commit size is different.   
> > 
> > The RB_MISSED_EVENTS is only set on the reader page.  
> 
> When is it reset after read? I think I removed that with commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent ring buffer on reboot"). So the
> flags will be remains until the page is reused (as a writer page).

And that shouldn't be a problem.

> 
> > 
> > If the kernel crashes no boot up while reading the validated buffer,
> > then that's a bit more than what this is supposed to handle.  
> 
> But the above commit recovers the subbufs which has been read in the
> previous boot. This means commit field of those recovered subbufs
> can have those flags.

Yes, and since those flags are only set by the validator, if the validator
sees them set, it should skip processing the subbuffer. It knows it was
already flagged as corrupt.

> 
> >   
> > > 
> > > Note, I think the reader_page RB_MISSED_EVENTS flag is not cleared after
> > > read. commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent
> > > ring buffer on reboot") drops clearing commit field for unwinding the
> > > buffer.  
> > 
> > But that should be fine, as it's only read only. Once tracing is
> > started, it should be reset.  
> 
> IIUC, RB_MISSED_* flags are using 30 and 31 th bits and committed
> bytes counter is usually use 0-11 th bits. So other bits should be
> cleared. 
> 
> So, if "commit & RB_MISSED_MASK" is under the subbuf_size, this
> subbuf looks OK for checking its entries(timestamp data).
> e.g.

But it shouldn't be.  The only way the RB_MISSED flag gets set in the
writer portion is if on a previous boot the validator detected that the
subbuffer was corrupted. If they are set for any other reason, the
subbuffer should be considered corrupted anyway.

-- Steve

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu @ 2026-03-09 13:32 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260308221901.545e7b8b@robin>

Hi Steve,

On Sun, 8 Mar 2026 22:19:01 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 9 Mar 2026 08:53:17 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > Also, I mentioned that if the commit == RB_MISSED_EVENTS, then we know
> > > the sub buffer was corrupted and should be skipped.  
> > 
> > Yes, if RB_MISSED_EVENTS bit is set, the commit field is out of range.
> > That is checked in rb_validate_buffer().
> > 
> > > 
> > > And honestly, the commit should never be greater than the subbuf_size,
> > > even if corrupted. As we are only worried about corruption due to cache
> > > not writing out. That should not corrupt the commit size (now we can
> > > ignore the flags and use page size instead).  
> > 
> > Hmm, but if the kernel crash and reboot when it sets RB_MISSED_EVENTS,
> > we will see the bit is set and commit size is different. 
> 
> The RB_MISSED_EVENTS is only set on the reader page.

When is it reset after read? I think I removed that with commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent ring buffer on reboot"). So the
flags will be remains until the page is reused (as a writer page).

> 
> If the kernel crashes no boot up while reading the validated buffer,
> then that's a bit more than what this is supposed to handle.

But the above commit recovers the subbufs which has been read in the
previous boot. This means commit field of those recovered subbufs
can have those flags.

> 
> > 
> > Note, I think the reader_page RB_MISSED_EVENTS flag is not cleared after
> > read. commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent
> > ring buffer on reboot") drops clearing commit field for unwinding the
> > buffer.
> 
> But that should be fine, as it's only read only. Once tracing is
> started, it should be reset.

IIUC, RB_MISSED_* flags are using 30 and 31 th bits and committed
bytes counter is usually use 0-11 th bits. So other bits should be
cleared. 

So, if "commit & RB_MISSED_MASK" is under the subbuf_size, this
subbuf looks OK for checking its entries(timestamp data).
e.g.

unsigned long commit;

commit = local_read(subbuf->commit) & ~RB_MISSED_MASK;
if (commit > meta->subbuf_size)
  return -EINVAL;
return ;

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation
From: Matthew Wilcox @ 2026-03-09 13:10 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Brendan Jackman,
	Johannes Weiner, Zi Yan, Oscar Salvador, Qi Zheng, Shakeel Butt,
	linux-kernel, linux-mm, linux-trace-kernel, linux-pm
In-Reply-To: <cover.1772206930.git.d@ilvokhin.com>

On Fri, Feb 27, 2026 at 04:00:22PM +0000, Dmitry Ilvokhin wrote:
> Zone lock contention can significantly impact allocation and
> reclaim latency, as it is a central synchronization point in
> the page allocator and reclaim paths. Improved visibility into
> its behavior is therefore important for diagnosing performance
> issues in memory-intensive workloads.
> 
> On some production workloads at Meta, we have observed noticeable
> zone lock contention. Deeper analysis of lock holders and waiters
> is currently difficult with existing instrumentation.
> 
> While generic lock contention_begin/contention_end tracepoints
> cover the slow path, they do not provide sufficient visibility
> into lock hold times. In particular, the lack of a release-side
> event makes it difficult to identify long lock holders and
> correlate them with waiters. As a result, distinguishing between
> short bursts of contention and pathological long hold times
> requires additional instrumentation.
> 
> This patch series adds dedicated tracepoint instrumentation to
> zone lock, following the existing mmap_lock tracing model.

I don't like this at all.  We have CONFIG_LOCK_STAT.  That should be
improved insted of coming up with one-offs for every single lock
that someone deems "special".

^ permalink raw reply

* Re: [PATCH 09/12] dt-bindings: input: Document hid-over-spi DT schema
From: Dmitry Torokhov @ 2026-03-09  5:44 UTC (permalink / raw)
  To: Val Packett
  Cc: Jingyuan Liang, Jiri Kosina, Benjamin Tissoires, Jonathan Corbet,
	Mark Brown, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-input,
	linux-doc, linux-kernel, linux-spi, linux-trace-kernel,
	devicetree, hbarnor, Dmitry Antipov, Jarrett Schultz
In-Reply-To: <1cc6de61-8b56-492e-ab78-e3aa448f58ad@packett.cool>

On Sat, Mar 07, 2026 at 04:25:44AM -0300, Val Packett wrote:
> 
> On 3/3/26 3:13 AM, Jingyuan Liang wrote:
> > Documentation describes the required and optional properties for
> > implementing Device Tree for a Microsoft G6 Touch Digitizer that
> > supports HID over SPI Protocol 1.0 specification.
> > […]
> > +properties:
> > +  compatible:
> > +    oneOf:
> > +      - items:
> > +          - enum:
> > +              - microsoft,g6-touch-digitizer
> > +          - const: hid-over-spi
> > +      - description: Just "hid-over-spi" alone is allowed, but not recommended.
> > […]
> > +required:
> > +  - compatible
> > +  - interrupts
> > +  - reset-gpios
> 
> Why is reset required? Is it so implausible on some device implementing the
> spec there wouldn't be a reset gpio?

No, because it is mandated by the spec:

"HID SPI peripheral must provide a dedicated reset line, driven by the
HOST, which, when toggled (pulled LOW for at least 10ms, normally HIGH),
will have the effect of resetting the device. If a HID SPI peripheral is
enumerated via ACPI, the device ASL configuration must expose an ACPI
FLDR (_RST) method to control this line."

The spec also states that the host must initiate reset during
initialization of the device.

> 
> > +  - vdd-supply
> Linux makes up a dummy regulator if DT doesn't provide one, so can
> regulators even be required?

There is still a supply line to the chip even if it is not exposed to
the OS control. So as far as chip is concerned the supply is required.

Thanks.

-- 
Dmitry

^ permalink raw reply

* [PATCH v8 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-09  5:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <177303264034.767813.5345788067082238396.stgit@mhiramat.tok.corp.google.com>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).

If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffersa that ar found to be corrupted.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
  Changes in v8:
  - Add comment in rb_valudate_buffer()
  - Clear the RB_MISSED_* flags in rb_valudate_buffer() instead of
    skipping subbuf.
  - Remove unused subbuf local variable from rb_cpu_meta_valid().
  Changes in v7:
  - Combined with Handling RB_MISSED_* flags patch, focus on validation at boot.
  - Remove checking subbuffer data when validating metadata, because it should be done
    later.
  - Do not mark the discarded sub buffer page but just reset it.
  Changes in v6:
  - Show invalid page detection message once per CPU.
  Changes in v5:
  - Instead of showing errors for each page, just show the number
    of discarded pages at last.
  Changes in v3:
  - Record missed data event on commit.
---
 kernel/trace/ring_buffer.c |   80 ++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 36 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 156ed19fb569..a86a036b4100 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
 	return local_read(&bpage->page->commit);
 }
 
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
 static void free_buffer_page(struct buffer_page *bpage)
 {
 	/* Range pages are not to be freed */
@@ -1791,7 +1797,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
 			      unsigned long *subbuf_mask)
 {
 	int subbuf_size = PAGE_SIZE;
-	struct buffer_data_page *subbuf;
 	unsigned long buffers_start;
 	unsigned long buffers_end;
 	int i;
@@ -1815,11 +1820,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
 		return false;
 	}
 
-	subbuf = rb_subbufs_from_meta(meta);
-
 	bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
 
-	/* Is the meta buffers and the subbufs themselves have correct data? */
+	/*
+	 * Ensure the meta::buffers array has correct data. The data in each subbufs
+	 * are checked later in rb_meta_validate_events().
+	 */
 	for (i = 0; i < meta->nr_subbufs; i++) {
 		if (meta->buffers[i] < 0 ||
 		    meta->buffers[i] >= meta->nr_subbufs) {
@@ -1827,18 +1833,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
 			return false;
 		}
 
-		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
-			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
-			return false;
-		}
-
 		if (test_bit(meta->buffers[i], subbuf_mask)) {
 			pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
 			return false;
 		}
 
 		set_bit(meta->buffers[i], subbuf_mask);
-		subbuf = (void *)subbuf + subbuf_size;
 	}
 
 	return true;
@@ -1902,13 +1902,22 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
 	return events;
 }
 
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+			      struct ring_buffer_cpu_meta *meta)
 {
 	unsigned long long ts;
+	unsigned long tail;
 	u64 delta;
-	int tail;
 
-	tail = local_read(&dpage->commit);
+	/*
+	 * When a sub-buffer is recovered from a read, the commit value may
+	 * have RB_MISSED_* bits set, as these bits are reset on reuse.
+	 * Even after clearing these bits, a commit value greater than the
+	 * subbuf_suze is considered invalid.
+	 */
+	tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
+	if (tail > meta->subbuf_size)
+		return -1;
 	return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
 }
 
@@ -1919,6 +1928,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	struct buffer_page *head_page, *orig_head;
 	unsigned long entry_bytes = 0;
 	unsigned long entries = 0;
+	int discarded = 0;
 	int ret;
 	u64 ts;
 	int i;
@@ -1929,13 +1939,13 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	orig_head = head_page = cpu_buffer->head_page;
 
 	/* Do the reader page first */
-	ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+	ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
 	if (ret < 0) {
 		pr_info("Ring buffer reader page is invalid\n");
 		goto invalid;
 	}
 	entries += ret;
-	entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
+	entry_bytes += rb_page_size(cpu_buffer->reader_page);
 	local_set(&cpu_buffer->reader_page->entries, ret);
 
 	ts = head_page->page->time_stamp;
@@ -1964,7 +1974,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 			break;
 
 		/* Stop rewind if the page is invalid. */
-		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
 		if (ret < 0)
 			break;
 
@@ -2043,21 +2053,24 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 		if (head_page == cpu_buffer->reader_page)
 			continue;
 
-		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
 		if (ret < 0) {
-			pr_info("Ring buffer meta [%d] invalid buffer page\n",
-				cpu_buffer->cpu);
-			goto invalid;
-		}
-
-		/* If the buffer has content, update pages_touched */
-		if (ret)
-			local_inc(&cpu_buffer->pages_touched);
-
-		entries += ret;
-		entry_bytes += local_read(&head_page->page->commit);
-		local_set(&cpu_buffer->head_page->entries, ret);
+			if (!discarded)
+				pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+					cpu_buffer->cpu);
+			discarded++;
+			/* Instead of discard whole ring buffer, discard only this sub-buffer. */
+			local_set(&head_page->entries, 0);
+			local_set(&head_page->page->commit, 0);
+		} else {
+			/* If the buffer has content, update pages_touched */
+			if (ret)
+				local_inc(&cpu_buffer->pages_touched);
 
+			entries += ret;
+			entry_bytes += rb_page_size(head_page);
+			local_set(&cpu_buffer->head_page->entries, ret);
+		}
 		if (head_page == cpu_buffer->commit_page)
 			break;
 	}
@@ -2071,7 +2084,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	local_set(&cpu_buffer->entries, entries);
 	local_set(&cpu_buffer->entries_bytes, entry_bytes);
 
-	pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+	pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
+		cpu_buffer->cpu, discarded);
 	return;
 
  invalid:
@@ -3258,12 +3272,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return NULL;
 }
 
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
-	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
 static __always_inline unsigned
 rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
 {


^ permalink raw reply related

* [PATCH v8 1/2] ring-buffer: Flush and stop persistent ring buffer on panic
From: Masami Hiramatsu (Google) @ 2026-03-09  5:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <177303264034.767813.5345788067082238396.stgit@mhiramat.tok.corp.google.com>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.

To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.

Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v6:
   - Introduce asm/ring_buffer.h for arch_ring_buffer_flush_range().
   - Use flush_cache_vmap() instead of flush_cache_all().
 Changes in v5:
   - Use ring_buffer_record_off() instead of ring_buffer_record_disable().
   - Use flush_cache_all() to ensure flush all cache.
 Changes in v3:
   - update patch description.
---
 arch/alpha/include/asm/Kbuild        |    1 +
 arch/arc/include/asm/Kbuild          |    1 +
 arch/arm/include/asm/Kbuild          |    1 +
 arch/arm64/include/asm/ring_buffer.h |   10 ++++++++++
 arch/csky/include/asm/Kbuild         |    1 +
 arch/hexagon/include/asm/Kbuild      |    1 +
 arch/loongarch/include/asm/Kbuild    |    1 +
 arch/m68k/include/asm/Kbuild         |    1 +
 arch/microblaze/include/asm/Kbuild   |    1 +
 arch/mips/include/asm/Kbuild         |    1 +
 arch/nios2/include/asm/Kbuild        |    1 +
 arch/openrisc/include/asm/Kbuild     |    1 +
 arch/parisc/include/asm/Kbuild       |    1 +
 arch/powerpc/include/asm/Kbuild      |    1 +
 arch/riscv/include/asm/Kbuild        |    1 +
 arch/s390/include/asm/Kbuild         |    1 +
 arch/sh/include/asm/Kbuild           |    1 +
 arch/sparc/include/asm/Kbuild        |    1 +
 arch/um/include/asm/Kbuild           |    1 +
 arch/x86/include/asm/Kbuild          |    1 +
 arch/xtensa/include/asm/Kbuild       |    1 +
 include/asm-generic/ring_buffer.h    |   13 +++++++++++++
 kernel/trace/ring_buffer.c           |   22 ++++++++++++++++++++++
 23 files changed, 65 insertions(+)
 create mode 100644 arch/arm64/include/asm/ring_buffer.h
 create mode 100644 include/asm-generic/ring_buffer.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 483965c5a4de..b154b4e3dfa8 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += agp.h
 generic-y += asm-offsets.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4c69522e0328..483caacc6988 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -5,5 +5,6 @@ generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 03657ff8fbe3..decad5f2c826 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -3,6 +3,7 @@ generic-y += early_ioremap.h
 generic-y += extable.h
 generic-y += flat.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 
 generated-y += mach-types.h
 generated-y += unistd-nr.h
diff --git a/arch/arm64/include/asm/ring_buffer.h b/arch/arm64/include/asm/ring_buffer.h
new file mode 100644
index 000000000000..62316c406888
--- /dev/null
+++ b/arch/arm64/include/asm/ring_buffer.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_ARM64_RING_BUFFER_H
+#define _ASM_ARM64_RING_BUFFER_H
+
+#include <asm/cacheflush.h>
+
+/* Flush D-cache on persistent ring buffer */
+#define arch_ring_buffer_flush_range(start, end)	dcache_clean_pop(start, end)
+
+#endif /* _ASM_ARM64_RING_BUFFER_H */
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 3a5c7f6e5aac..7dca0c6cdc84 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += qrwlock.h
 generic-y += qrwlock_types.h
 generic-y += qspinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
 generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 1efa1e993d4b..0f887d4238ed 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += extable.h
 generic-y += iomap.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 9034b583a88a..7e92957baf6a 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -10,5 +10,6 @@ generic-y += qrwlock.h
 generic-y += user.h
 generic-y += ioctl.h
 generic-y += mmzone.h
+generic-y += ring_buffer.h
 generic-y += statfs.h
 generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index b282e0dd8dc1..62543bf305ff 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,6 @@ generated-y += syscall_table.h
 generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += spinlock.h
 generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 7178f990e8b3..0030309b47ad 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += syscalls.h
 generic-y += tlb.h
 generic-y += user.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 684569b2ecd6..9771c3d85074 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,5 +12,6 @@ generic-y += mcs_spinlock.h
 generic-y += parport.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 28004301c236..0a2530964413 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += cmpxchg.h
 generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += spinlock.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..8aa34621702d 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -8,4 +8,5 @@ generic-y += spinlock_types.h
 generic-y += spinlock.h
 generic-y += qrwlock_types.h
 generic-y += qrwlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 4fb596d94c89..d48d158f7241 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
 generic-y += agp.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2e23533b67e3..805b5aeebb6f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,4 +5,5 @@ generated-y += syscall_table_spu.h
 generic-y += agp.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
+generic-y += ring_buffer.h
 generic-y += early_ioremap.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index bd5fc9403295..7721b63642f4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -14,5 +14,6 @@ generic-y += ticket_spinlock.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
 generic-y += qspinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 80bad7de7a04..0c1fc47c3ba0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,3 +7,4 @@ generated-y += unistd_nr.h
 generic-y += asm-offsets.h
 generic-y += mcs_spinlock.h
 generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 4d3f10ed8275..f0403d3ee8ab 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,4 +3,5 @@ generated-y += syscall_table.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 17ee8a273aa6..49c6bb326b75 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
 generic-y += agp.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..2a1629ba8140 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += module.lds.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
+generic-y += ring_buffer.h
 generic-y += runtime-const.h
 generic-y += softirq_stack.h
 generic-y += switch_to.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 4566000e15c4..078fd2c0d69d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -14,3 +14,4 @@ generic-y += early_ioremap.h
 generic-y += fprobe.h
 generic-y += mcs_spinlock.h
 generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 13fe45dea296..e57af619263a 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -6,5 +6,6 @@ generic-y += mcs_spinlock.h
 generic-y += parport.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/include/asm-generic/ring_buffer.h b/include/asm-generic/ring_buffer.h
new file mode 100644
index 000000000000..dbe94f5785f9
--- /dev/null
+++ b/include/asm-generic/ring_buffer.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generiuc arch dependent ring_buffer macros.
+ */
+#ifndef __ASM_GENERIC_RING_BUFFER_H__
+#define __ASM_GENERIC_RING_BUFFER_H__
+
+#include <linux/cacheflush.h>
+
+/* Flush cache on ring buffer range if needed */
+#define arch_ring_buffer_flush_range(start, end)	flush_cache_vmap(start, end)
+
+#endif /* __ASM_GENERIC_RING_BUFFER_H__ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f16f053ef77d..156ed19fb569 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6,6 +6,7 @@
  */
 #include <linux/sched/isolation.h>
 #include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
 #include <linux/trace_events.h>
 #include <linux/ring_buffer.h>
 #include <linux/trace_clock.h>
@@ -30,6 +31,7 @@
 #include <linux/oom.h>
 #include <linux/mm.h>
 
+#include <asm/ring_buffer.h>
 #include <asm/local64.h>
 #include <asm/local.h>
 #include <asm/setup.h>
@@ -589,6 +591,7 @@ struct trace_buffer {
 
 	unsigned long			range_addr_start;
 	unsigned long			range_addr_end;
+	struct notifier_block		flush_nb;
 
 	struct ring_buffer_meta		*meta;
 
@@ -2471,6 +2474,16 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 	kfree(cpu_buffer);
 }
 
+/* Stop recording on a persistent buffer and flush cache if needed. */
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+	struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+	ring_buffer_record_off(buffer);
+	arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
+	return NOTIFY_DONE;
+}
+
 static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 					 int order, unsigned long start,
 					 unsigned long end,
@@ -2590,6 +2603,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 
 	mutex_init(&buffer->mutex);
 
+	/* Persistent ring buffer needs to flush cache before reboot. */
+	if (start & end) {
+		buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+		atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+	}
+
 	return_ptr(buffer);
 
  fail_free_buffers:
@@ -2677,6 +2696,9 @@ ring_buffer_free(struct trace_buffer *buffer)
 {
 	int cpu;
 
+	if (buffer->range_addr_start && buffer->range_addr_end)
+		atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
 	cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
 
 	irq_work_sync(&buffer->irq_work.work);


^ permalink raw reply related

* [PATCH v8 0/2] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-03-09  5:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

Hi,

Here is the 8th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:

https://lore.kernel.org/all/177289358078.248514.14947007976699929481.stgit@devnote2/

In this version, I cleaned up and add comments, remove unused var,
and clear RB_MISSED_* flags when reading recovered subbufs.[1/2]

Thank you,

---

Masami Hiramatsu (Google) (2):
      ring-buffer: Flush and stop persistent ring buffer on panic
      ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer


 arch/alpha/include/asm/Kbuild        |    1 
 arch/arc/include/asm/Kbuild          |    1 
 arch/arm/include/asm/Kbuild          |    1 
 arch/arm64/include/asm/ring_buffer.h |   10 +++
 arch/csky/include/asm/Kbuild         |    1 
 arch/hexagon/include/asm/Kbuild      |    1 
 arch/loongarch/include/asm/Kbuild    |    1 
 arch/m68k/include/asm/Kbuild         |    1 
 arch/microblaze/include/asm/Kbuild   |    1 
 arch/mips/include/asm/Kbuild         |    1 
 arch/nios2/include/asm/Kbuild        |    1 
 arch/openrisc/include/asm/Kbuild     |    1 
 arch/parisc/include/asm/Kbuild       |    1 
 arch/powerpc/include/asm/Kbuild      |    1 
 arch/riscv/include/asm/Kbuild        |    1 
 arch/s390/include/asm/Kbuild         |    1 
 arch/sh/include/asm/Kbuild           |    1 
 arch/sparc/include/asm/Kbuild        |    1 
 arch/um/include/asm/Kbuild           |    1 
 arch/x86/include/asm/Kbuild          |    1 
 arch/xtensa/include/asm/Kbuild       |    1 
 include/asm-generic/ring_buffer.h    |   13 ++++
 kernel/trace/ring_buffer.c           |  102 ++++++++++++++++++++++------------
 23 files changed, 109 insertions(+), 36 deletions(-)
 create mode 100644 arch/arm64/include/asm/ring_buffer.h
 create mode 100644 include/asm-generic/ring_buffer.h

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [syzbot] [trace?] WARNING in ring_buffer_nr_dirty_pages
From: syzbot @ 2026-03-09  3:18 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, mathieu.desnoyers, mhiramat,
	rostedt, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    ecc64d2dc9ff Merge tag 'sysctl-7.00-fixes-rc3' of git://gi..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=109bf0ba580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=163cf0fb07ea84d3
dashboard link: https://syzkaller.appspot.com/bug?extid=855a87cfc76c52ce7a59
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/23b82e01c157/disk-ecc64d2d.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/e98b1b4abbdf/vmlinux-ecc64d2d.xz
kernel image: https://storage.googleapis.com/syzbot-assets/41e18c482bbb/bzImage-ecc64d2d.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+855a87cfc76c52ce7a59@syzkaller.appspotmail.com

Invalid ELF header magic: != \x7fELF
------------[ cut here ]------------
read > cnt + 1
WARNING: kernel/trace/ring_buffer.c:788 at ring_buffer_nr_dirty_pages+0x207/0x290 kernel/trace/ring_buffer.c:788, CPU#1: syz.1.4985/28848
Modules linked in:
CPU: 1 UID: 0 PID: 28848 Comm: syz.1.4985 Tainted: G     U       L      syzkaller #0 PREEMPT(full) 
Tainted: [U]=USER, [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
RIP: 0010:ring_buffer_nr_dirty_pages+0x207/0x290 kernel/trace/ring_buffer.c:788
Code: 83 c3 01 48 89 ee 48 89 df e8 95 24 fc ff 48 39 eb 72 11 31 db eb cc e8 67 2a fc ff 90 0f 0b 90 31 db eb bf e8 5a 2a fc ff 90 <0f> 0b 90 31 db eb b2 4c 89 e7 e8 4a af 67 00 e9 f8 fe ff ff 4c 89
RSP: 0018:ffffc900052875c0 EFLAGS: 00010087
RAX: 000000000000014d RBX: 0000000000000001 RCX: ffffc90007725000
RDX: 0000000000080000 RSI: ffffffff820be6e6 RDI: ffff88805e5f0000
RBP: 0000000000000002 R08: 0000000000000006 R09: 0000000000000001
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88813fea5000 R14: ffff88813fe5a4b0 R15: ffff88813fea5000
FS:  00007f8292df66c0(0000) GS:ffff88812444e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2e2ebff8 CR3: 00000000621de000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 full_hit kernel/trace/ring_buffer.c:810 [inline]
 rb_watermark_hit+0x1f5/0x300 kernel/trace/ring_buffer.c:907
 ring_buffer_poll_wait+0x2b2/0x490 kernel/trace/ring_buffer.c:1064
 trace_poll kernel/trace/trace.c:5895 [inline]
 tracing_buffers_poll+0x1cf/0x250 kernel/trace/trace.c:7841
 vfs_poll include/linux/poll.h:82 [inline]
 select_poll_one fs/select.c:480 [inline]
 do_select+0xd54/0x1850 fs/select.c:536
 core_sys_select+0x55b/0xbb0 fs/select.c:677
 kern_select+0x20c/0x270 fs/select.c:718
 __do_sys_select fs/select.c:725 [inline]
 __se_sys_select fs/select.c:722 [inline]
 __x64_sys_select+0xbd/0x160 fs/select.c:722
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8294b9c799
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f8292df6028 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
RAX: ffffffffffffffda RBX: 00007f8294e15fa0 RCX: 00007f8294b9c799
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000e
RBP: 00007f8294c32bd9 R08: 0000000000000000 R09: 0000000000000000
R10: 00002000000002c0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f8294e16038 R14: 00007f8294e15fa0 R15: 00007ffe7dd14a88
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [RFC PATCH 1/2] locking: add mutex_lock_nospin()
From: Yafang Shao @ 2026-03-09  2:34 UTC (permalink / raw)
  To: David Laight, Steven Rostedt
  Cc: Waiman Long, Peter Zijlstra, mingo, will, boqun, mhiramat,
	mark.rutland, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	bpf
In-Reply-To: <20260306100047.7c4725de@pumpkin>

On Fri, Mar 6, 2026 at 6:00 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Fri, 6 Mar 2026 10:22:11 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > On Thu, Mar 5, 2026 at 9:20 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Thu, 5 Mar 2026 13:40:27 +0800
> > > Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > Exactly. ftrace is intended for debugging and should not significantly
> > > > impact real workloads. Therefore, it's reasonable to make it sleep if
> > > > it cannot acquire the lock immediately, rather than spinning and
> > > > consuming CPU cycles.
> > >
> > > Actually, ftrace is more than just debugging. It is the infrastructure for
> > > live kernel patching as well.
> >
> > good to know.
> >
> > >
> > > >
> > > > >
> > > > > BTW, you should expand the commit log of patch 1 to include the
> > > > > rationale of why we should add this feature to mutex as the information
> > > > > in the cover letter won't get included in the git log if this patch
> > > > > series is merged. You should also elaborate in comment on under what
> > > > > conditions should this this new mutex API be used.
> > > >
> > > > Sure.  I will update it.
> > > >
> > > > BTW, these issues are notably hard to find. I suspect there are other
> > > > locks out there with the same problem.
> > >
> > > As I mentioned, I'm not against the change. I just want to make sure the
> > > rationale is strong enough to make the change.
> > >
> > > One thing that should be modified with your patch is the name. "nospin"
> > > references the implementation of the mutex. Instead it should be called
> > > something like: "noncritical" or "slowpath" stating that the grabbing of
> > > this mutex is not of a critical section.
> > >
> > > Maybe an entirely new interface should be defined:
> > >
> > >
> > > struct slow_mutex;
> >
> > Is it necessary to define a new structure for this slow mutex? We
> > could simply reuse the existing struct mutex instead. Alternatively,
> > should we add some new flags to this slow_mutex for debugging
> > purposes?
> >
> > >
> > > slow_mutex_lock()
> > > slow_mutex_unlock()
> >
> > These two APIs appear sufficient to handle this use case.
>
> Don't semaphores still exist?

While semaphores may present similar challenges, I'm not currently
aware of specific instances that share this exact issue. Should we
encounter any problematic semaphores in production workloads, we can
address them at that time.

> IIRC they always sleep.
>
> Although I wonder if the mutex need to be held for as long at it is.
> ISTR one of the tracebacks was one the 'address to name' lookup,
> that code will be slow.
> Since the mutex can't be held across the multiple reads that are done
> to read the full list of tracepoints it must surely be possible to
> release it across the name lookup?

That's a great point, though I'm not entirely certain at the moment.
Perhaps Steven can provide further insight.

-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Steven Rostedt @ 2026-03-09  2:19 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260309095307.19a504c6880407bbf36b2cca@kernel.org>

On Mon, 9 Mar 2026 09:53:07 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > > 
> > > I just realized that the origin didn't have correct grammar. But we
> > > still check the subbufs, why remove that comment?
> > > 
> > > The original should have said:
> > > 
> > > 	/* Do the meta buffers and subbufs have correct data? */  
> > 
> > I just removed the data check from this loop, so I think this should
> > focus on checking metadata itself. The data is checked later.  
> 
> Other checks in the loop are;
> 
> - the entries in meta::buffers[] are inside correct range.
> - the duplicated entries in the meta::buffers[].
> 
> So this only checks the meta::buffers[] (index array) now.
> 
> /*
>  * Ensure the meta::buffers have correct data. The data in each subbufs are
>  * checked later in rb_meta_validate_events().
>  */
> 
> This will be more clear.

Sure.

-- Steve

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Steven Rostedt @ 2026-03-09  2:19 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260309085317.6679cf91151767eff7130cc4@kernel.org>

On Mon, 9 Mar 2026 08:53:17 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > > @@ -1827,11 +1833,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
> > >  			return false;
> > >  		}
> > >  
> > > -		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
> > > -			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
> > > -			return false;
> > > -		}  
> > 
> > This should still be checked, although it doesn't need to fail the loop
> > but instead continue to the next buffer.  
> 
> We already have another check of the data in the loop in
> rb_meta_validate_events() so data corruption should be
> handled there.

Hmm, OK.

> 
> > 
> > Also, I mentioned that if the commit == RB_MISSED_EVENTS, then we know
> > the sub buffer was corrupted and should be skipped.  
> 
> Yes, if RB_MISSED_EVENTS bit is set, the commit field is out of range.
> That is checked in rb_validate_buffer().
> 
> > 
> > And honestly, the commit should never be greater than the subbuf_size,
> > even if corrupted. As we are only worried about corruption due to cache
> > not writing out. That should not corrupt the commit size (now we can
> > ignore the flags and use page size instead).  
> 
> Hmm, but if the kernel crash and reboot when it sets RB_MISSED_EVENTS,
> we will see the bit is set and commit size is different. 

The RB_MISSED_EVENTS is only set on the reader page.

If the kernel crashes no boot up while reading the validated buffer,
then that's a bit more than what this is supposed to handle.

> 
> Note, I think the reader_page RB_MISSED_EVENTS flag is not cleared after
> read. commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent
> ring buffer on reboot") drops clearing commit field for unwinding the
> buffer.

But that should be fine, as it's only read only. Once tracing is
started, it should be reset.

-- Steve

^ permalink raw reply

* Re: [PATCH v13 00/32] Tracefs support for pKVM
From: Steven Rostedt @ 2026-03-09  1:45 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: mhiramat, mathieu.desnoyers, linux-trace-kernel, maz,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, kvmarm,
	linux-arm-kernel, jstultz, qperret, will, aneesh.kumar,
	kernel-team, linux-kernel
In-Reply-To: <20260306143536.339777-1-vdonnefort@google.com>

Hi Vincent,

Unfortunately, this doesn't apply cleanly to 7.0-rc3. Can you rebase on that?

Thanks,

-- Steve

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu @ 2026-03-09  0:53 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260309085317.6679cf91151767eff7130cc4@kernel.org>

On Mon, 9 Mar 2026 08:53:17 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Sat, 7 Mar 2026 10:27:11 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Sat,  7 Mar 2026 23:26:38 +0900
> > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> > 
> > >  kernel/trace/ring_buffer.c |   63 +++++++++++++++++++++++---------------------
> > >  1 file changed, 33 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index b6f3ac99834f..8599de5cf59b 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c
> > > @@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
> > >  	return local_read(&bpage->page->commit);
> > >  }
> > >  
> > > +/* Size is determined by what has been committed */
> > > +static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
> > > +{
> > > +	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
> > > +}
> > > +
> > >  static void free_buffer_page(struct buffer_page *bpage)
> > >  {
> > >  	/* Range pages are not to be freed */
> > > @@ -1819,7 +1825,7 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
> > >  
> > >  	bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
> > >  
> > > -	/* Is the meta buffers and the subbufs themselves have correct data? */
> > > +	/* Is the meta buffers themselves have correct data? */
> > 
> > I just realized that the origin didn't have correct grammar. But we
> > still check the subbufs, why remove that comment?
> > 
> > The original should have said:
> > 
> > 	/* Do the meta buffers and subbufs have correct data? */
> 
> I just removed the data check from this loop, so I think this should
> focus on checking metadata itself. The data is checked later.

Other checks in the loop are;

- the entries in meta::buffers[] are inside correct range.
- the duplicated entries in the meta::buffers[].

So this only checks the meta::buffers[] (index array) now.

/*
 * Ensure the meta::buffers have correct data. The data in each subbufs are
 * checked later in rb_meta_validate_events().
 */

This will be more clear.

> 
> > 
> > >  	for (i = 0; i < meta->nr_subbufs; i++) {
> > >  		if (meta->buffers[i] < 0 ||
> > >  		    meta->buffers[i] >= meta->nr_subbufs) {
> > > @@ -1827,11 +1833,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
> > >  			return false;
> > >  		}
> > >  
> > > -		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
> > > -			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
> > > -			return false;
> > > -		}
> > 
> > This should still be checked, although it doesn't need to fail the loop
> > but instead continue to the next buffer.
> 
> We already have another check of the data in the loop in
> rb_meta_validate_events() so data corruption should be
> handled there.
> 
> > 
> > Also, I mentioned that if the commit == RB_MISSED_EVENTS, then we know
> > the sub buffer was corrupted and should be skipped.
> 
> Yes, if RB_MISSED_EVENTS bit is set, the commit field is out of range.
> That is checked in rb_validate_buffer().
> 
> > 
> > And honestly, the commit should never be greater than the subbuf_size,
> > even if corrupted. As we are only worried about corruption due to cache
> > not writing out. That should not corrupt the commit size (now we can
> > ignore the flags and use page size instead).
> 
> Hmm, but if the kernel crash and reboot when it sets RB_MISSED_EVENTS,
> we will see the bit is set and commit size is different. 
> 
> Note, I think the reader_page RB_MISSED_EVENTS flag is not cleared after
> read. commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent
> ring buffer on reboot") drops clearing commit field for unwinding the
> buffer.
> 
> @@ -5342,7 +5440,6 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>          */
>         local_set(&cpu_buffer->reader_page->write, 0);
>         local_set(&cpu_buffer->reader_page->entries, 0);
> -       local_set(&cpu_buffer->reader_page->page->commit, 0);
>         cpu_buffer->reader_page->real_end = 0;
>  
> Should we clear the RB_MISSED_* bits here?

Ah, no. ignore this. If there is a sudden reboot, the broken
commit will be there anyway. But we can recover it.

Thank you,

> 
> Thanks,
> 
> > 
> > So, perhaps we should invalidate the entire buffer if the commit part
> > is corrupted, as that is a major corruption.
> > 
> > -- Steve
> > 
> 
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu @ 2026-03-08 23:53 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260307102711.50932648@robin>

On Sat, 7 Mar 2026 10:27:11 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat,  7 Mar 2026 23:26:38 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> >  kernel/trace/ring_buffer.c |   63 +++++++++++++++++++++++---------------------
> >  1 file changed, 33 insertions(+), 30 deletions(-)
> > 
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index b6f3ac99834f..8599de5cf59b 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
> >  	return local_read(&bpage->page->commit);
> >  }
> >  
> > +/* Size is determined by what has been committed */
> > +static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
> > +{
> > +	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
> > +}
> > +
> >  static void free_buffer_page(struct buffer_page *bpage)
> >  {
> >  	/* Range pages are not to be freed */
> > @@ -1819,7 +1825,7 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
> >  
> >  	bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
> >  
> > -	/* Is the meta buffers and the subbufs themselves have correct data? */
> > +	/* Is the meta buffers themselves have correct data? */
> 
> I just realized that the origin didn't have correct grammar. But we
> still check the subbufs, why remove that comment?
> 
> The original should have said:
> 
> 	/* Do the meta buffers and subbufs have correct data? */

I just removed the data check from this loop, so I think this should
focus on checking metadata itself. The data is checked later.

> 
> >  	for (i = 0; i < meta->nr_subbufs; i++) {
> >  		if (meta->buffers[i] < 0 ||
> >  		    meta->buffers[i] >= meta->nr_subbufs) {
> > @@ -1827,11 +1833,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
> >  			return false;
> >  		}
> >  
> > -		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
> > -			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
> > -			return false;
> > -		}
> 
> This should still be checked, although it doesn't need to fail the loop
> but instead continue to the next buffer.

We already have another check of the data in the loop in
rb_meta_validate_events() so data corruption should be
handled there.

> 
> Also, I mentioned that if the commit == RB_MISSED_EVENTS, then we know
> the sub buffer was corrupted and should be skipped.

Yes, if RB_MISSED_EVENTS bit is set, the commit field is out of range.
That is checked in rb_validate_buffer().

> 
> And honestly, the commit should never be greater than the subbuf_size,
> even if corrupted. As we are only worried about corruption due to cache
> not writing out. That should not corrupt the commit size (now we can
> ignore the flags and use page size instead).

Hmm, but if the kernel crash and reboot when it sets RB_MISSED_EVENTS,
we will see the bit is set and commit size is different. 

Note, I think the reader_page RB_MISSED_EVENTS flag is not cleared after
read. commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent
ring buffer on reboot") drops clearing commit field for unwinding the
buffer.

@@ -5342,7 +5440,6 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
         */
        local_set(&cpu_buffer->reader_page->write, 0);
        local_set(&cpu_buffer->reader_page->entries, 0);
-       local_set(&cpu_buffer->reader_page->page->commit, 0);
        cpu_buffer->reader_page->real_end = 0;
 
Should we clear the RB_MISSED_* bits here?

Thanks,

> 
> So, perhaps we should invalidate the entire buffer if the commit part
> is corrupted, as that is a major corruption.
> 
> -- Steve
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Steven Rostedt @ 2026-03-07 15:27 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <177289359843.248514.164858607457269337.stgit@devnote2>

On Sat,  7 Mar 2026 23:26:38 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

>  kernel/trace/ring_buffer.c |   63 +++++++++++++++++++++++---------------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index b6f3ac99834f..8599de5cf59b 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
>  	return local_read(&bpage->page->commit);
>  }
>  
> +/* Size is determined by what has been committed */
> +static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
> +{
> +	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
> +}
> +
>  static void free_buffer_page(struct buffer_page *bpage)
>  {
>  	/* Range pages are not to be freed */
> @@ -1819,7 +1825,7 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
>  
>  	bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
>  
> -	/* Is the meta buffers and the subbufs themselves have correct data? */
> +	/* Is the meta buffers themselves have correct data? */

I just realized that the origin didn't have correct grammar. But we
still check the subbufs, why remove that comment?

The original should have said:

	/* Do the meta buffers and subbufs have correct data? */

>  	for (i = 0; i < meta->nr_subbufs; i++) {
>  		if (meta->buffers[i] < 0 ||
>  		    meta->buffers[i] >= meta->nr_subbufs) {
> @@ -1827,11 +1833,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
>  			return false;
>  		}
>  
> -		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
> -			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
> -			return false;
> -		}

This should still be checked, although it doesn't need to fail the loop
but instead continue to the next buffer.

Also, I mentioned that if the commit == RB_MISSED_EVENTS, then we know
the sub buffer was corrupted and should be skipped.

And honestly, the commit should never be greater than the subbuf_size,
even if corrupted. As we are only worried about corruption due to cache
not writing out. That should not corrupt the commit size (now we can
ignore the flags and use page size instead).

So, perhaps we should invalidate the entire buffer if the commit part
is corrupted, as that is a major corruption.

-- Steve

^ permalink raw reply

* [PATCH v7 2/2] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-07 14:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <177289358078.248514.14947007976699929481.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).

If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffersa that ar found to be corrupted.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
  Changes in v7:
  - Combined with Handling RB_MISSED_* flags patch, focus on validation at boot.
  - Remove checking subbuffer data when validating metadata, because it should be done
    later.
  - Do not mark the discarded sub buffer page but just reset it.
  Changes in v6:
  - Show invalid page detection message once per CPU.
  Changes in v5:
  - Instead of showing errors for each page, just show the number
    of discarded pages at last.
  Changes in v3:
  - Record missed data event on commit.
---
 kernel/trace/ring_buffer.c |   63 +++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b6f3ac99834f..8599de5cf59b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
 	return local_read(&bpage->page->commit);
 }
 
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
 static void free_buffer_page(struct buffer_page *bpage)
 {
 	/* Range pages are not to be freed */
@@ -1819,7 +1825,7 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
 
 	bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
 
-	/* Is the meta buffers and the subbufs themselves have correct data? */
+	/* Is the meta buffers themselves have correct data? */
 	for (i = 0; i < meta->nr_subbufs; i++) {
 		if (meta->buffers[i] < 0 ||
 		    meta->buffers[i] >= meta->nr_subbufs) {
@@ -1827,11 +1833,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
 			return false;
 		}
 
-		if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
-			pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
-			return false;
-		}
-
 		if (test_bit(meta->buffers[i], subbuf_mask)) {
 			pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
 			return false;
@@ -1902,13 +1903,16 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
 	return events;
 }
 
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+			      struct ring_buffer_cpu_meta *meta)
 {
 	unsigned long long ts;
 	u64 delta;
 	int tail;
 
 	tail = local_read(&dpage->commit);
+	if (tail <= 0 || tail > meta->subbuf_size)
+		return -1;
 	return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
 }
 
@@ -1919,6 +1923,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	struct buffer_page *head_page, *orig_head;
 	unsigned long entry_bytes = 0;
 	unsigned long entries = 0;
+	int discarded = 0;
 	int ret;
 	u64 ts;
 	int i;
@@ -1929,13 +1934,13 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	orig_head = head_page = cpu_buffer->head_page;
 
 	/* Do the reader page first */
-	ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+	ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
 	if (ret < 0) {
 		pr_info("Ring buffer reader page is invalid\n");
 		goto invalid;
 	}
 	entries += ret;
-	entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
+	entry_bytes += rb_page_size(cpu_buffer->reader_page);
 	local_set(&cpu_buffer->reader_page->entries, ret);
 
 	ts = head_page->page->time_stamp;
@@ -1964,7 +1969,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 			break;
 
 		/* Stop rewind if the page is invalid. */
-		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
 		if (ret < 0)
 			break;
 
@@ -2043,21 +2048,24 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 		if (head_page == cpu_buffer->reader_page)
 			continue;
 
-		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+		ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
 		if (ret < 0) {
-			pr_info("Ring buffer meta [%d] invalid buffer page\n",
-				cpu_buffer->cpu);
-			goto invalid;
-		}
-
-		/* If the buffer has content, update pages_touched */
-		if (ret)
-			local_inc(&cpu_buffer->pages_touched);
-
-		entries += ret;
-		entry_bytes += local_read(&head_page->page->commit);
-		local_set(&cpu_buffer->head_page->entries, ret);
+			if (!discarded)
+				pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+					cpu_buffer->cpu);
+			discarded++;
+			/* Instead of discard whole ring buffer, discard only this sub-buffer. */
+			local_set(&head_page->entries, 0);
+			local_set(&head_page->page->commit, RB_MISSED_EVENTS);
+		} else {
+			/* If the buffer has content, update pages_touched */
+			if (ret)
+				local_inc(&cpu_buffer->pages_touched);
 
+			entries += ret;
+			entry_bytes += rb_page_size(head_page);
+			local_set(&cpu_buffer->head_page->entries, ret);
+		}
 		if (head_page == cpu_buffer->commit_page)
 			break;
 	}
@@ -2071,7 +2079,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 	local_set(&cpu_buffer->entries, entries);
 	local_set(&cpu_buffer->entries_bytes, entry_bytes);
 
-	pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+	pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
+		cpu_buffer->cpu, discarded);
 	return;
 
  invalid:
@@ -3258,12 +3267,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return NULL;
 }
 
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
-	return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
 static __always_inline unsigned
 rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
 {


^ permalink raw reply related

* [PATCH v7 1/2] ring-buffer: Flush and stop persistent ring buffer on panic
From: Masami Hiramatsu (Google) @ 2026-03-07 14:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <177289358078.248514.14947007976699929481.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.

To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.

Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 arch/alpha/include/asm/Kbuild        |    1 +
 arch/arc/include/asm/Kbuild          |    1 +
 arch/arm/include/asm/Kbuild          |    1 +
 arch/arm64/include/asm/ring_buffer.h |   10 ++++++++++
 arch/csky/include/asm/Kbuild         |    1 +
 arch/hexagon/include/asm/Kbuild      |    1 +
 arch/loongarch/include/asm/Kbuild    |    1 +
 arch/m68k/include/asm/Kbuild         |    1 +
 arch/microblaze/include/asm/Kbuild   |    1 +
 arch/mips/include/asm/Kbuild         |    1 +
 arch/nios2/include/asm/Kbuild        |    1 +
 arch/openrisc/include/asm/Kbuild     |    1 +
 arch/parisc/include/asm/Kbuild       |    1 +
 arch/powerpc/include/asm/Kbuild      |    1 +
 arch/riscv/include/asm/Kbuild        |    1 +
 arch/s390/include/asm/Kbuild         |    1 +
 arch/sh/include/asm/Kbuild           |    1 +
 arch/sparc/include/asm/Kbuild        |    1 +
 arch/um/include/asm/Kbuild           |    1 +
 arch/x86/include/asm/Kbuild          |    1 +
 arch/xtensa/include/asm/Kbuild       |    1 +
 include/asm-generic/ring_buffer.h    |   13 +++++++++++++
 kernel/trace/ring_buffer.c           |   22 ++++++++++++++++++++++
 23 files changed, 65 insertions(+)
 create mode 100644 arch/arm64/include/asm/ring_buffer.h
 create mode 100644 include/asm-generic/ring_buffer.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 483965c5a4de..b154b4e3dfa8 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += agp.h
 generic-y += asm-offsets.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4c69522e0328..483caacc6988 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -5,5 +5,6 @@ generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 03657ff8fbe3..decad5f2c826 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -3,6 +3,7 @@ generic-y += early_ioremap.h
 generic-y += extable.h
 generic-y += flat.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 
 generated-y += mach-types.h
 generated-y += unistd-nr.h
diff --git a/arch/arm64/include/asm/ring_buffer.h b/arch/arm64/include/asm/ring_buffer.h
new file mode 100644
index 000000000000..62316c406888
--- /dev/null
+++ b/arch/arm64/include/asm/ring_buffer.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_ARM64_RING_BUFFER_H
+#define _ASM_ARM64_RING_BUFFER_H
+
+#include <asm/cacheflush.h>
+
+/* Flush D-cache on persistent ring buffer */
+#define arch_ring_buffer_flush_range(start, end)	dcache_clean_pop(start, end)
+
+#endif /* _ASM_ARM64_RING_BUFFER_H */
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 3a5c7f6e5aac..7dca0c6cdc84 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += qrwlock.h
 generic-y += qrwlock_types.h
 generic-y += qspinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
 generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 1efa1e993d4b..0f887d4238ed 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += extable.h
 generic-y += iomap.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 9034b583a88a..7e92957baf6a 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -10,5 +10,6 @@ generic-y += qrwlock.h
 generic-y += user.h
 generic-y += ioctl.h
 generic-y += mmzone.h
+generic-y += ring_buffer.h
 generic-y += statfs.h
 generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index b282e0dd8dc1..62543bf305ff 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,6 @@ generated-y += syscall_table.h
 generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += spinlock.h
 generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 7178f990e8b3..0030309b47ad 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += syscalls.h
 generic-y += tlb.h
 generic-y += user.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 684569b2ecd6..9771c3d85074 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,5 +12,6 @@ generic-y += mcs_spinlock.h
 generic-y += parport.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 28004301c236..0a2530964413 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += cmpxchg.h
 generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += spinlock.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..8aa34621702d 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -8,4 +8,5 @@ generic-y += spinlock_types.h
 generic-y += spinlock.h
 generic-y += qrwlock_types.h
 generic-y += qrwlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 4fb596d94c89..d48d158f7241 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
 generic-y += agp.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2e23533b67e3..805b5aeebb6f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,4 +5,5 @@ generated-y += syscall_table_spu.h
 generic-y += agp.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
+generic-y += ring_buffer.h
 generic-y += early_ioremap.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index bd5fc9403295..7721b63642f4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -14,5 +14,6 @@ generic-y += ticket_spinlock.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
 generic-y += qspinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 80bad7de7a04..0c1fc47c3ba0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,3 +7,4 @@ generated-y += unistd_nr.h
 generic-y += asm-offsets.h
 generic-y += mcs_spinlock.h
 generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 4d3f10ed8275..f0403d3ee8ab 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,4 +3,5 @@ generated-y += syscall_table.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 17ee8a273aa6..49c6bb326b75 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
 generic-y += agp.h
 generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
 generic-y += text-patching.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..2a1629ba8140 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += module.lds.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
+generic-y += ring_buffer.h
 generic-y += runtime-const.h
 generic-y += softirq_stack.h
 generic-y += switch_to.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 4566000e15c4..078fd2c0d69d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -14,3 +14,4 @@ generic-y += early_ioremap.h
 generic-y += fprobe.h
 generic-y += mcs_spinlock.h
 generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 13fe45dea296..e57af619263a 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -6,5 +6,6 @@ generic-y += mcs_spinlock.h
 generic-y += parport.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
+generic-y += ring_buffer.h
 generic-y += user.h
 generic-y += text-patching.h
diff --git a/include/asm-generic/ring_buffer.h b/include/asm-generic/ring_buffer.h
new file mode 100644
index 000000000000..dbe94f5785f9
--- /dev/null
+++ b/include/asm-generic/ring_buffer.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generiuc arch dependent ring_buffer macros.
+ */
+#ifndef __ASM_GENERIC_RING_BUFFER_H__
+#define __ASM_GENERIC_RING_BUFFER_H__
+
+#include <linux/cacheflush.h>
+
+/* Flush cache on ring buffer range if needed */
+#define arch_ring_buffer_flush_range(start, end)	flush_cache_vmap(start, end)
+
+#endif /* __ASM_GENERIC_RING_BUFFER_H__ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 17d0ea0cc3e6..b6f3ac99834f 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6,6 +6,7 @@
  */
 #include <linux/sched/isolation.h>
 #include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
 #include <linux/trace_events.h>
 #include <linux/ring_buffer.h>
 #include <linux/trace_clock.h>
@@ -30,6 +31,7 @@
 #include <linux/oom.h>
 #include <linux/mm.h>
 
+#include <asm/ring_buffer.h>
 #include <asm/local64.h>
 #include <asm/local.h>
 #include <asm/setup.h>
@@ -589,6 +591,7 @@ struct trace_buffer {
 
 	unsigned long			range_addr_start;
 	unsigned long			range_addr_end;
+	struct notifier_block		flush_nb;
 
 	struct ring_buffer_meta		*meta;
 
@@ -2471,6 +2474,16 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 	kfree(cpu_buffer);
 }
 
+/* Stop recording on a persistent buffer and flush cache if needed. */
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+	struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+	ring_buffer_record_off(buffer);
+	arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
+	return NOTIFY_DONE;
+}
+
 static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 					 int order, unsigned long start,
 					 unsigned long end,
@@ -2590,6 +2603,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
 
 	mutex_init(&buffer->mutex);
 
+	/* Persistent ring buffer needs to flush cache before reboot. */
+	if (start & end) {
+		buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+		atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+	}
+
 	return_ptr(buffer);
 
  fail_free_buffers:
@@ -2677,6 +2696,9 @@ ring_buffer_free(struct trace_buffer *buffer)
 {
 	int cpu;
 
+	if (buffer->range_addr_start && buffer->range_addr_end)
+		atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
 	cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
 
 	irq_work_sync(&buffer->irq_work.work);


^ permalink raw reply related

* [PATCH v7 0/2] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-03-07 14:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

Hi,

Here is the 7th version of improvement patches for making persistent
ring buffers robust to failures. The previous version is here:

https://lore.kernel.org/all/177218401821.1988514.5579163042147205021.stgit@mhiramat.tok.corp.google.com/

In this version, I dropped Handle RB_MISSED_* flags patch and partially
include required change to the last one[2/2], it also removes a redundant
subbuffer data check in per-cpu metadata validation and do NOT mark
RB_MISSED_EVENTS bit on discarded pages.

Thank you,

---

Masami Hiramatsu (Google) (2):
      ring-buffer: Flush and stop persistent ring buffer on panic
      ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer


 arch/alpha/include/asm/Kbuild        |    1 
 arch/arc/include/asm/Kbuild          |    1 
 arch/arm/include/asm/Kbuild          |    1 
 arch/arm64/include/asm/ring_buffer.h |   10 ++++
 arch/csky/include/asm/Kbuild         |    1 
 arch/hexagon/include/asm/Kbuild      |    1 
 arch/loongarch/include/asm/Kbuild    |    1 
 arch/m68k/include/asm/Kbuild         |    1 
 arch/microblaze/include/asm/Kbuild   |    1 
 arch/mips/include/asm/Kbuild         |    1 
 arch/nios2/include/asm/Kbuild        |    1 
 arch/openrisc/include/asm/Kbuild     |    1 
 arch/parisc/include/asm/Kbuild       |    1 
 arch/powerpc/include/asm/Kbuild      |    1 
 arch/riscv/include/asm/Kbuild        |    1 
 arch/s390/include/asm/Kbuild         |    1 
 arch/sh/include/asm/Kbuild           |    1 
 arch/sparc/include/asm/Kbuild        |    1 
 arch/um/include/asm/Kbuild           |    1 
 arch/x86/include/asm/Kbuild          |    1 
 arch/xtensa/include/asm/Kbuild       |    1 
 include/asm-generic/ring_buffer.h    |   13 +++++
 kernel/trace/ring_buffer.c           |   85 ++++++++++++++++++++++------------
 23 files changed, 98 insertions(+), 30 deletions(-)
 create mode 100644 arch/arm64/include/asm/ring_buffer.h
 create mode 100644 include/asm-generic/ring_buffer.h

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH] lib/bootconfig: fix typo "budy" in _xbc_exit() comment
From: Josh Law @ 2026-03-07 12:26 UTC (permalink / raw)
  To: mhiramat, akpm; +Cc: linux-kernel, linux-trace-kernel

From: Josh Law <objecting@objecting.org>

Fix a spelling error in the _xbc_exit() kernel-doc comment where
"budy system" was written instead of "buddy system".

Signed-off-by: Josh Law <objecting@objecting.org>
---
 lib/bootconfig.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 449369a60846..2bcd5c2aa87e 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -911,7 +911,7 @@ static int __init xbc_parse_tree(void)
 
 /**
  * _xbc_exit() - Clean up all parsed bootconfig
- * @early: Set true if this is called before budy system is initialized.
+ * @early: Set true if this is called before buddy system is initialized.
  *
  * This clears all data structures of parsed bootconfig on memory.
  * If you need to reuse xbc_init() with new boot config, you can
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v3 01/13] devlink: expose devlink instance index over netlink
From: Jiri Pirko @ 2026-03-07  7:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, horms, donald.hunter, corbet,
	skhan, saeedm, leon, tariqt, mbloch, przemyslaw.kitszel, mschmidt,
	andrew+netdev, rostedt, mhiramat, mathieu.desnoyers, chuck.lever,
	matttbe, cjubran, daniel.zahka, linux-doc, linux-rdma,
	linux-trace-kernel
In-Reply-To: <20260306193253.6d7d2383@kernel.org>

Sat, Mar 07, 2026 at 04:32:53AM +0100, kuba@kernel.org wrote:
>On Wed,  4 Mar 2026 17:00:10 +0100 Jiri Pirko wrote:
>> +      -
>> +        name: index
>> +        type: uint
>> +        doc: Unique devlink instance index.
>
>AI complains on patch 6 that the index is truncated because it's saved
>to a u32. Let's add:
>
>        checks:
>           max: u32-max
>
>here and the policy will take care of the check, you can then remove
>the explicit checks too

Okay. Thanks!

>-- 
>pw-bot: cr

^ permalink raw reply

* Re: [PATCH 09/12] dt-bindings: input: Document hid-over-spi DT schema
From: Val Packett @ 2026-03-07  7:25 UTC (permalink / raw)
  To: Jingyuan Liang, Jiri Kosina, Benjamin Tissoires, Jonathan Corbet,
	Mark Brown, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Dmitry Torokhov, Rob Herring, Krzysztof Kozlowski, Conor Dooley
  Cc: linux-input, linux-doc, linux-kernel, linux-spi,
	linux-trace-kernel, devicetree, hbarnor, Dmitry Antipov,
	Jarrett Schultz
In-Reply-To: <20260303-send-upstream-v1-9-1515ba218f3d@chromium.org>


On 3/3/26 3:13 AM, Jingyuan Liang wrote:
> Documentation describes the required and optional properties for
> implementing Device Tree for a Microsoft G6 Touch Digitizer that
> supports HID over SPI Protocol 1.0 specification.
> […]
> +properties:
> +  compatible:
> +    oneOf:
> +      - items:
> +          - enum:
> +              - microsoft,g6-touch-digitizer
> +          - const: hid-over-spi
> +      - description: Just "hid-over-spi" alone is allowed, but not recommended.
> […]
> +required:
> +  - compatible
> +  - interrupts
> +  - reset-gpios

Why is reset required? Is it so implausible on some device implementing 
the spec there wouldn't be a reset gpio?

> +  - vdd-supply
Linux makes up a dummy regulator if DT doesn't provide one, so can 
regulators even be required?
> […]
> +        compatible = "hid-over-spi";
Not following your own recommendation from above :)
> +        reg = <0x0>;
> +        interrupts-extended = <&gpio 42 IRQ_TYPE_EDGE_FALLING>;
> +        reset-gpios = <&gpio 27 GPIO_ACTIVE_LOW>;
> +        vdd-supply = <&pm8350c_l3>;
> +        pinctrl-names = "default";
> +        pinctrl-0 = <&ts_d6_reset_assert &ts_d6_int_bias>;

Heh, "reset_assert" is a name implying it would actually set the value 
from the pinctrl properties, which is what had to be done before 
reset-gpios were supported. But now reset-gpios are supported.


Thanks,
~val


P.S. happy to see work on this happen again!


^ permalink raw reply

* Re: [PATCH net-next v3 01/13] devlink: expose devlink instance index over netlink
From: Jakub Kicinski @ 2026-03-07  3:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, edumazet, pabeni, horms, donald.hunter, corbet,
	skhan, saeedm, leon, tariqt, mbloch, przemyslaw.kitszel, mschmidt,
	andrew+netdev, rostedt, mhiramat, mathieu.desnoyers, chuck.lever,
	matttbe, cjubran, daniel.zahka, linux-doc, linux-rdma,
	linux-trace-kernel
In-Reply-To: <20260304160022.6114-2-jiri@resnulli.us>

On Wed,  4 Mar 2026 17:00:10 +0100 Jiri Pirko wrote:
> +      -
> +        name: index
> +        type: uint
> +        doc: Unique devlink instance index.

AI complains on patch 6 that the index is truncated because it's saved
to a u32. Let's add:

        checks:
           max: u32-max

here and the policy will take care of the check, you can then remove
the explicit checks too
-- 
pw-bot: cr

^ permalink raw reply

* [PATCH v2] tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G
From: Calvin Owens @ 2026-03-07  3:19 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
In-Reply-To: <20260306213730.566b9511@robin>

Some of the sizing logic through tracer_alloc_buffers() uses int
internally, causing unexpected behavior if the user passes a value that
does not fit in an int (on my x86 machine, the result is uselessly tiny
buffers).

Fix by plumbing the parameter's real type (unsigned long) through to the
ring buffer allocation functions, which already use unsigned long.

It has always been possible to create larger ring buffers via the sysfs
interface: this only affects the cmdline parameter.

Fixes: 73c5162aa362 ("tracing: keep ring buffer to minimum size till used")
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
---
Changes in v2:
* Leave scratch_size as an int in allocate_trace_buffer(), undo callee
  changes required after making it unsigned long [Steven]

 kernel/trace/trace.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1e7c032a72d2..ebd996f8710e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9350,7 +9350,7 @@ static void setup_trace_scratch(struct trace_array *tr,
 }
 
 static int
-allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size)
+allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned long size)
 {
 	enum ring_buffer_flags rb_flags;
 	struct trace_scratch *tscratch;
@@ -9405,7 +9405,7 @@ static void free_trace_buffer(struct array_buffer *buf)
 	}
 }
 
-static int allocate_trace_buffers(struct trace_array *tr, int size)
+static int allocate_trace_buffers(struct trace_array *tr, unsigned long size)
 {
 	int ret;
 
@@ -10769,7 +10769,7 @@ __init static void enable_instances(void)
 
 __init static int tracer_alloc_buffers(void)
 {
-	int ring_buf_size;
+	unsigned long ring_buf_size;
 	int ret = -ENOMEM;
 
 
-- 
2.47.3


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox