Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v2] mm: vmscan: rework lru_shrink and write_folio tracepoints
From: Andrew Morton @ 2026-05-08 23:47 UTC (permalink / raw)
  To: qiwu.chen
  Cc: rostedt, mhiramat, hannes, david, mhocko, willy,
	linux-trace-kernel, linux-mm, qiwu.chen
In-Reply-To: <20260506083652.100160-1-qiwu.chen@transsion.com>

On Wed,  6 May 2026 16:36:52 +0800 "qiwu.chen" <qiwuchen55@gmail.com> wrote:

> From: "qiwu.chen" <qiwuchen55@gmail.com>
> Signed-off-by: qiwu.chen <qiwu.chen@transsion.com>

Which should we use?  If it's the transsion.com address (which I
assumed) then this can be communicated by placing an explicit From:
line at start-of-changelog.


^ permalink raw reply

* Re: [PATCH v6 01/43] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings
From: Ackerley Tng @ 2026-05-08 23:36 UTC (permalink / raw)
  To: Ackerley Tng via B4 Relay, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
	oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
	shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
	forkloop, pratyush, suzuki.poulose, aneesh.kumar, liam,
	Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
  Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-1-91ab5a8b19a4@google.com>

Ackerley Tng via B4 Relay <devnull+ackerleytng.google.com@kernel.org>
writes:

>
> [...snip...]
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b28..5011d38820d0d 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
>  #include <linux/falloc.h>
>  #include <linux/fs.h>
>  #include <linux/kvm_host.h>
> +#include <linux/maple_tree.h>
>  #include <linux/mempolicy.h>
>  #include <linux/pseudo_fs.h>
>  #include <linux/pagemap.h>
> @@ -33,6 +34,13 @@ struct gmem_inode {
>  	struct list_head gmem_file_list;
>
>  	u64 flags;
> +	/*
> +	 * Every index in this inode, whether memory is populated or
> +	 * not, is tracked in attributes. The entire range of indices,
> +	 * corresponding to the size of this inode, is represented in
> +	 * this maple tree.

Concretely, if the entire guest_memfd is 2M in size, indices [0, 511] is
represented with some value, either 0 (SHARED) or
KVM_MEMORY_ATTRIBUTE_PRIVATE. [512, ULONG_MAX] is also defined in the
tree, as NULL.

Since guest_memfd uses xa_mk_value(0) to store the value 0 ("SHARED"),
that makes 0 distinct from NULL, which works for guest_memfd.


(Liam and I discussed this off-list due to a email configuration issue)

> +	 */
> +	struct maple_tree attributes;
>  };
>
>
> [...snip...]
>

^ permalink raw reply

* Re: [PATCH v1 1/2] spi: qcom-geni: trace: Add trace events for Qualcomm GENI SPI
From: Trilok Soni @ 2026-05-08 23:14 UTC (permalink / raw)
  To: Mark Brown, Praveen Talari
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, linux-arm-msm, linux-spi,
	MukeshKumarSavaliyamukesh.savaliya, AniketRandiveaniket.randive,
	chandana.chiluveru, jyothi.seerapu
In-Reply-To: <af3spostNgoRU0Vv@sirena.co.uk>

On 5/8/2026 7:01 AM, Mark Brown wrote:
> On Thu, May 07, 2026 at 11:03:39PM +0530, Praveen Talari wrote:
>> On 07-05-2026 13:43, Mark Brown wrote:
> 
>>> By generic I mean this should not be driver specific at all.
> 
>> I hope these changes are fine. Please let me know if you have any concerns
>> or feedback.
> 
> The data tracepoints look plausible but I would expect them to be
> generated by the core, they'll be there for everything so I'd expect
> them to work for everything.

I agree here. Praveen - this is similar to suggestion I had for the i2c
internally. 

---Trilok Soni


^ permalink raw reply

* Re: [PATCH 0/2] tools/bootconfig: render kernel.* subtree as a cmdline string
From: Andrew Morton @ 2026-05-08 21:56 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Masami Hiramatsu, linux-kernel, linux-trace-kernel, paulmck, oss,
	kernel-team
In-Reply-To: <20260508-bootconfig_using_tools-v1-0-1132219aa773@debian.org>

On Fri, 08 May 2026 06:55:02 -0700 Breno Leitao <leitao@debian.org> wrote:

> Add a bootconfig -> kernel cmdline rendering capability shared between
> the kernel parser library and the userspace tools/bootconfig binary.
> 
> The new userspace mode "tools/bootconfig -C <file>" walks a bootconfig
> file's "kernel" subtree and prints it as a flat, space-separated
> cmdline string suitable for direct use as (or appending to) a kernel
> command line.
> 
> This series prepares tools/bootconfig and lib/bootconfig.c for an
> upcoming feature that lets the kernel build render an embedded
> bootconfig file's "kernel" subtree to a flat cmdline string and embed
> it in the kernel image.
> 
> The follow-up series (sent separately) wires this into setup_arch() so
> early_param() handlers see values supplied via CONFIG_BOOT_CONFIG_EMBED_FILE,
> following Masami suggestion in [1]
> 
> These two patches are pure groundwork. They add no kernel feature,
> change no runtime behavior, and are useful on their own (the new
> "tools/bootconfig -C" mode lets anyone render a .bootconfig file to
> a cmdline string from the shell).
> 
> Landing them independently lets the follow-up series focus on the
> kernel-side plumbing without dragging the refactor and tool addition
> through the same review cycle.

I'll assume that Masami will process this, although
`scripts/get_maintainer.pl lib/bootconfig.c' doesn't mention a git
tree.

https://sashiko.dev/#/patchset/20260508-bootconfig_using_tools-v1-0-1132219aa773@debian.org
says a bunch of picky things which seem pretty ignorable to me.  Your
call ;)

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Ackerley Tng @ 2026-05-08 21:21 UTC (permalink / raw)
  To: Sean Christopherson, Michael Roth
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, tabba, willy,
	wyihan, yan.y.zhao, forkloop, pratyush, suzuki.poulose,
	aneesh.kumar, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Jason Gunthorpe,
	Vlastimil Babka, kvm, linux-kernel, linux-trace-kernel, linux-doc,
	linux-kselftest, linux-mm
In-Reply-To: <af4gJ6xZ3e7UXOuO@google.com>

Sean Christopherson <seanjc@google.com> writes:

>
> [...snip...]
>
>
> Summarizing this week's PUCK call[*]:
>
> Scrap PRESERVE and ZERO, and simply rely on vendor specific semantics.
>
>
> [...snip...]
>

Thanks for the summary! Please see v6 here:

https://lore.kernel.org/all/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com/T/

^ permalink raw reply

* Re: [PATCH 2/2] selftests/mm: add zone->lock tracepoint verification test
From: David Hildenbrand (Arm) @ 2026-05-08 20:15 UTC (permalink / raw)
  To: hawk, Andrew Morton, linux-mm
  Cc: Vlastimil Babka, Steven Rostedt, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, Lorenzo Stoakes, Shuah Khan, linux-kernel,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260508162207.3315781-2-hawk@kernel.org>

On 5/8/26 18:22, hawk@kernel.org wrote:
> From: Jesper Dangaard Brouer <hawk@kernel.org>
> 
> Add a selftest to verify the kmem:mm_zone_lock_contended,
> kmem:mm_zone_locked, and kmem:mm_zone_lock_unlock tracepoints.
> 
> The test has two components:
> 
> zone_lock_contention.c - a workload that spawns threads doing rapid
> page allocation and freeing to generate zone->lock contention. It
> shrinks PCP lists via percpu_pagelist_high_fraction to force frequent
> free_pcppages_bulk() and rmqueue_bulk() calls.
> 
> test_zone_lock_tracepoints.sh - uses bpftrace to verify tracepoints
> exist, have the expected fields, fire under load, and that wait_ns
> is populated when contention occurs.
> 
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
> ---
>  tools/testing/selftests/mm/Makefile           |   2 +
>  .../mm/test_zone_lock_tracepoints.sh          | 212 ++++++++++++++++++
>  .../selftests/mm/zone_lock_contention.c       | 166 ++++++++++++++

This really looks excessive and ... not really how we usually treat tracepoints?

I don't know about others, but I don't think this is really what we want as a MM
selftest.

-- 
Cheers,

David

^ permalink raw reply

* [PATCH] tracing: Avoid NULL return from hist_field_name() on truncation
From: David Carlier @ 2026-05-08 19:57 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: rostedt, mhiramat, mathieu.desnoyers, zanussi, pengpeng,
	linux-kernel, David Carlier

hist_field_name() returns "" everywhere except the fully-qualified
VAR_REF/EXPR case, where snprintf() truncation returns NULL early
and bypasses the bottom NULL->"" guard. Callers don't expect NULL:
strcat(expr, hist_field_name(field, 0)) at trace_events_hist.c:1758
and the strcmp() in the sort-key match loop at :4804 both deref it.

system and event_name are bounded by MAX_EVENT_NAME_LEN, but the
field name on a VAR_REF is kstrdup'd from a histogram variable
name parsed out of the trigger string and has no length cap, so
a long enough var name in a fully qualified reference can reach
the truncation path.

Keep the length check but leave field_name as "" on overflow.

Fixes: 5ec1d1e97de1 ("tracing: Rebuild full_name on each hist_field_name() call")
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 kernel/trace/trace_events_hist.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0dbbf6cca9bc..eb2c2bc8bc3d 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1369,10 +1369,8 @@ static const char *hist_field_name(struct hist_field *field,
 			len = snprintf(full_name, sizeof(full_name), fmt,
 				       field->system, field->event_name,
 				       field->name);
-			if (len >= sizeof(full_name))
-				return NULL;
-
-			field_name = full_name;
+			if (len < sizeof(full_name))
+				field_name = full_name;
 		} else
 			field_name = field->name;
 	} else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH 2/5] docs: fix repeated word 'that' across documentation
From: David Howells @ 2026-05-08 19:43 UTC (permalink / raw)
  To: Adrien Reynard
  Cc: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Jonathan Corbet, Shuah Khan, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, David Howells,
	Paulo Alcantara, Masami Hiramatsu,
	open list:READ-COPY UPDATE (RCU), open list:DOCUMENTATION,
	open list, open list:DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS,
	open list:FILESYSTEMS [NETFS LIBRARY],
	open list:FILESYSTEMS [NETFS LIBRARY], open list:TRACING
In-Reply-To: <20260508163759.16231-1-reynard.adrien.08@gmail.com>

Adrien Reynard <reynard.adrien.08@gmail.com> wrote:

> -  three states, we know that that CPU has exited any previous RCU

This is arguably correct.  The two 'that' words are functionally different.
If you look at another language, say Hungarian, they are different words
(e.g. 'hogy' vs 'az').

David


^ permalink raw reply

* Re: [PATCH 2/5] docs: fix repeated word 'that' across documentation
From: David Laight @ 2026-05-08 18:26 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Adrien Reynard, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Jonathan Corbet, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, David Howells,
	Paulo Alcantara, Masami Hiramatsu,
	open list:READ-COPY UPDATE (RCU), open list:DOCUMENTATION,
	open list, open list:DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS,
	open list:FILESYSTEMS [NETFS LIBRARY],
	open list:FILESYSTEMS [NETFS LIBRARY], open list:TRACING
In-Reply-To: <1501caea-8cff-4968-aca6-e8d4b20e0e80@linuxfoundation.org>

On Fri, 8 May 2026 11:15:28 -0600
Shuah Khan <skhan@linuxfoundation.org> wrote:

> On 5/8/26 10:37, Adrien Reynard wrote:
> 
> Missing commit log in all your patches - I don't patch 1/5 in
> my Inbox.
> 
> > Signed-off-by: Adrien Reynard <reynard.adrien.08@gmail.com>
> > ---
> >   Documentation/RCU/rcu.rst                          | 2 +-
> >   Documentation/driver-api/driver-model/overview.rst | 2 +-
> >   Documentation/filesystems/netfs_library.rst        | 2 +-
> >   Documentation/trace/histogram-design.rst           | 2 +-
> >   Documentation/trace/histogram.rst                  | 2 +-
> >   5 files changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/RCU/rcu.rst b/Documentation/RCU/rcu.rst
> > index bf6617b330a7..320ad3292b75 100644
> > --- a/Documentation/RCU/rcu.rst
> > +++ b/Documentation/RCU/rcu.rst
> > @@ -32,7 +32,7 @@ Frequently Asked Questions
> >     Just as with spinlocks, RCU readers are not permitted to
> >     block, switch to user-mode execution, or enter the idle loop.
> >     Therefore, as soon as a CPU is seen passing through any of these
> > -  three states, we know that that CPU has exited any previous RCU
> > +  three states, we know that CPU has exited any previous RCU  
> 
> The original intent might have been to say, "that cpu", so adding
> the missing comma after the first "that" or change "that" to "the"
> would make sense.
...

I don't think adding a comma would be correct.
The clause splits as 'we know that' 'that CPU' and the repeated 'that'
is absolutely correct.
Maybe 'that CPU' could be replaced by 'it'; but it can be difficult to
work out what back references like 'it' refer to.

You can re-order it, as (say):
	Therefore we know that as soon as a CPU is seen passing through any of these
	three states it has exited any previous RCU read-side critical sections.

But just because some grammar book says you shouldn't have repeated words
doesn't mean there aren't exceptions.

The sign writer was doing a new sign for the 'Pig and Whistle'.
Unfortunately the gaps between Pig and and and and and Whistle
ended up visibly different.

-- David



^ permalink raw reply

* Re: [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions
From: Dmitry Ilvokhin @ 2026-05-08 18:07 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Andrew Morton, hawk, Matthew Wilcox, linux-mm, Steven Rostedt,
	Suren Baghdasaryan, Michal Hocko, Zi Yan, David Hildenbrand,
	Lorenzo Stoakes, Shuah Khan, linux-kernel, linux-trace-kernel,
	kernel-team
In-Reply-To: <4f61457e-deff-430f-8a1e-d3c33c925db3@kernel.org>

On Fri, May 08, 2026 at 07:40:51PM +0200, Vlastimil Babka (SUSE) wrote:
> On 5/8/26 7:38 PM, Vlastimil Babka (SUSE) wrote:
> > On 5/8/26 7:29 PM, Andrew Morton wrote:
> >> e .configOn Fri,  8 May 2026 18:22:06 +0200 hawk@kernel.org wrote:
> >>
> >>> Add tracepoints to the page allocator fast paths that acquire
> >>> zone->lock, allowing diagnosis of lock contention in production.
> >>
> >> Thanks, I'm surprised we haven't done this yet.
> > 
> > There was a recent attempt [1]. Not being a generic solution wasn't welcome.
> > 
> > [1] https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/
> 
> And this is the generic solution I think?
> 
> https://lore.kernel.org/all/cover.1777999826.git.d@ilvokhin.com/

Thanks for cc'ing me, Vlastimil.

Yes, this is an attempt at a generic solution for tracing contended
locks, including spinlocks, so it should also cover the use case
proposed in this patchset.

In fact, zone->lock contention was one of the primary motivations for
this work.

^ permalink raw reply

* Re: [PATCH 2/5] docs: fix repeated word 'that' across documentation
From: Paul E. McKenney @ 2026-05-08 17:52 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Shuah Khan, Adrien Reynard, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Jonathan Corbet, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, David Howells, Paulo Alcantara,
	Masami Hiramatsu, open list:READ-COPY UPDATE (RCU),
	open list:DOCUMENTATION, open list,
	open list:DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS,
	open list:FILESYSTEMS [NETFS LIBRARY],
	open list:FILESYSTEMS [NETFS LIBRARY], open list:TRACING
In-Reply-To: <5f68ac30-21ac-494b-a140-2307e236f0a2@infradead.org>

On Fri, May 08, 2026 at 10:40:49AM -0700, Randy Dunlap wrote:
> 
> 
> On 5/8/26 10:15 AM, Shuah Khan wrote:
> > On 5/8/26 10:37, Adrien Reynard wrote:
> > 
> > Missing commit log in all your patches - I don't patch 1/5 in
> > my Inbox.
> > 
> >> Signed-off-by: Adrien Reynard <reynard.adrien.08@gmail.com>
> >> ---
> >>   Documentation/RCU/rcu.rst                          | 2 +-
> >>   Documentation/driver-api/driver-model/overview.rst | 2 +-
> >>   Documentation/filesystems/netfs_library.rst        | 2 +-
> >>   Documentation/trace/histogram-design.rst           | 2 +-
> >>   Documentation/trace/histogram.rst                  | 2 +-
> >>   5 files changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/Documentation/RCU/rcu.rst b/Documentation/RCU/rcu.rst
> >> index bf6617b330a7..320ad3292b75 100644
> >> --- a/Documentation/RCU/rcu.rst
> >> +++ b/Documentation/RCU/rcu.rst
> >> @@ -32,7 +32,7 @@ Frequently Asked Questions
> >>     Just as with spinlocks, RCU readers are not permitted to
> >>     block, switch to user-mode execution, or enter the idle loop.
> >>     Therefore, as soon as a CPU is seen passing through any of these
> >> -  three states, we know that that CPU has exited any previous RCU
> >> +  three states, we know that CPU has exited any previous RCU
> > 
> > The original intent might have been to say, "that cpu", so adding
> > the missing comma after the first "that" or change "that" to "the"
> > would make sense.
> 
> Not a comma, please.
> I don't see a problem with "that that," but "that the" could also be OK.

This CPU was already mentioned.  So if for whatever reason we cannnot
stomach "that that", then "that this" would be better than "that the".

I suppose that false positives from simple grammar checkers might be
sufficient reason, but in this brave new world of LLMs, shouldn't we
be hoping for better?  ;-)

						Thanx, Paul

> >>     read-side critical sections.  So, if we remove an item from a
> >>     linked list, and then wait until all CPUs have switched context,
> >>     executed in user mode, or executed in the idle loop, we can
> 
> 
> -- 
> ~Randy
> 

^ permalink raw reply

* Re: [PATCH 2/5] docs: fix repeated word 'that' across documentation
From: Randy Dunlap @ 2026-05-08 17:40 UTC (permalink / raw)
  To: Shuah Khan, Adrien Reynard, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Jonathan Corbet, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, David Howells,
	Paulo Alcantara, Masami Hiramatsu,
	open list:READ-COPY UPDATE (RCU), open list:DOCUMENTATION,
	open list, open list:DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS,
	open list:FILESYSTEMS [NETFS LIBRARY],
	open list:FILESYSTEMS [NETFS LIBRARY], open list:TRACING
In-Reply-To: <1501caea-8cff-4968-aca6-e8d4b20e0e80@linuxfoundation.org>



On 5/8/26 10:15 AM, Shuah Khan wrote:
> On 5/8/26 10:37, Adrien Reynard wrote:
> 
> Missing commit log in all your patches - I don't patch 1/5 in
> my Inbox.
> 
>> Signed-off-by: Adrien Reynard <reynard.adrien.08@gmail.com>
>> ---
>>   Documentation/RCU/rcu.rst                          | 2 +-
>>   Documentation/driver-api/driver-model/overview.rst | 2 +-
>>   Documentation/filesystems/netfs_library.rst        | 2 +-
>>   Documentation/trace/histogram-design.rst           | 2 +-
>>   Documentation/trace/histogram.rst                  | 2 +-
>>   5 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/RCU/rcu.rst b/Documentation/RCU/rcu.rst
>> index bf6617b330a7..320ad3292b75 100644
>> --- a/Documentation/RCU/rcu.rst
>> +++ b/Documentation/RCU/rcu.rst
>> @@ -32,7 +32,7 @@ Frequently Asked Questions
>>     Just as with spinlocks, RCU readers are not permitted to
>>     block, switch to user-mode execution, or enter the idle loop.
>>     Therefore, as soon as a CPU is seen passing through any of these
>> -  three states, we know that that CPU has exited any previous RCU
>> +  three states, we know that CPU has exited any previous RCU
> 
> The original intent might have been to say, "that cpu", so adding
> the missing comma after the first "that" or change "that" to "the"
> would make sense.

Not a comma, please.
I don't see a problem with "that that," but "that the" could also be OK.

> 
> 
>>     read-side critical sections.  So, if we remove an item from a
>>     linked list, and then wait until all CPUs have switched context,
>>     executed in user mode, or executed in the idle loop, we can


-- 
~Randy


^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Sean Christopherson @ 2026-05-08 17:40 UTC (permalink / raw)
  To: Michael Roth
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <foi2zvv5qrfdcspnx4fstrvzl74m6xp6zrsw5omlbprxh4jrhx@vxnwk7fr46gu>

On Wed, Apr 29, 2026, Michael Roth wrote:
> On Fri, Apr 24, 2026 at 12:08:45PM -0700, Ackerley Tng wrote:
> > Michael Roth <michael.roth@amd.com> writes:
> > 
> > Thank you for your patches!
> > 
> > >
> > > [...snip...]
> > >
> > >>
> > >> I also did some minor updates (prefixed with a "[squash]" tag) to advertise
> > >> the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by
> > >
> > > Though I'm not sure how we deal with it if SNP/TDX at some point become
> > > capable of using the PRESERVED flag *after* populate... but maybe that's
> > > too unlikely to worry about? If we wanted to address it though, we could
> > > have both PRESERVED and PRESERVED_BEFORE_LAUNCH so they can be
> > > enumerated separately from the start.
> > >
> > 
> > Not sure how likely it is, but if SNP and TDX can honor PRESERVE
> > semantics after populate, I think we could implement support under a new
> > flag like CIPHER.
> 
> That works, but it still makes things *slightly* awkward due to special-casing
> the PRESERVE semantics for 1 guest type vs. another.

Summarizing this week's PUCK call[*]:

Scrap PRESERVE and ZERO, and simply rely on vendor specific semantics.

My desire to enforce PRESERVE and ZERO semantics and avoid relying on vendor specific
behavior (i.e. on trusted firmware semantics) is a pipe dream.  Unless KVM does
a truly insane amount of per-gfn tracking, KVM can't know the state of memory for
a given page, and so can't guarantee PRESERVE or ZERO will be honored.

If userspace requests PRESERVE, just because it's _possible_ to preserve contents
(e.g. during the pre-boot phase on TDX), doesn't mean the contents are _guaranteed_
to be preserved.  If userspace doesn't actually ADD the memory to the guest's
initial image, then the contents won't be preserved.  Ditto for SNP.

To guarantee PRESERVE, KVM would need to track per-gfn information to know if the
memory was actually preserved.  And enforcing PRESERVE would be all kinds of crazy;
KVM would have to kill the VM or something?  And that would still require userspace
to be aware of vendor specific details.

The same holds true for ZERO.  On a private=>shared conversion, KVM can't guarantee
the memory is zeroed by trusted firmware unless KVM tracks, per-gfn, whether or
not the memory was actually fully assigned to the guest.  E.g. if userspace does
shared=>private and then private=>shared(ZERO), without the memory being faulted
into the guest, then the TDX-Module won't have "seen" the page and so wont' have
zeroed it on the private=>shared conversion.

And trying to special case SNP's "validated CPUID" behavior, where memory can be
preserved on private=>shared after a failed shared=>private, would also require
tracking that the page was never actually converted to private.

Note, regarding ZERO, someone (Mike? Ackerley?) pointed out that userspace typically
doesn't rely on the kernel to zero memory, and so supporting ZERO for private=>shared
isn't really all that valuable in the first place.

[*] https://drive.google.com/file/d/1w0ifzh5PmNViJ1SKru9jK9x52MybXSNa/view?usp=drive_link

^ permalink raw reply

* Re: [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions
From: Vlastimil Babka (SUSE) @ 2026-05-08 17:40 UTC (permalink / raw)
  To: Andrew Morton, hawk, Dmitry Ilvokhin, Matthew Wilcox
  Cc: linux-mm, Steven Rostedt, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, David Hildenbrand, Lorenzo Stoakes, Shuah Khan,
	linux-kernel, linux-trace-kernel, kernel-team
In-Reply-To: <832e4333-4079-4865-8ad8-3dd8868fb964@kernel.org>

On 5/8/26 7:38 PM, Vlastimil Babka (SUSE) wrote:
> On 5/8/26 7:29 PM, Andrew Morton wrote:
>> e .configOn Fri,  8 May 2026 18:22:06 +0200 hawk@kernel.org wrote:
>>
>>> Add tracepoints to the page allocator fast paths that acquire
>>> zone->lock, allowing diagnosis of lock contention in production.
>>
>> Thanks, I'm surprised we haven't done this yet.
> 
> There was a recent attempt [1]. Not being a generic solution wasn't welcome.
> 
> [1] https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/

And this is the generic solution I think?

https://lore.kernel.org/all/cover.1777999826.git.d@ilvokhin.com/

>> Unfortunately "mm: use spinlock guards for zone lock" messed this up
>> (https://lore.kernel.org/all/cover.1777462630.git.d@ilvokhin.com/).
>>
>> So please let's give it a few days for reviewers to comment then redo
>> against mm.git's mm-unstable branch?
> 


^ permalink raw reply

* Re: [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions
From: Vlastimil Babka (SUSE) @ 2026-05-08 17:38 UTC (permalink / raw)
  To: Andrew Morton, hawk, Dmitry Ilvokhin, Matthew Wilcox
  Cc: linux-mm, Steven Rostedt, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, David Hildenbrand, Lorenzo Stoakes, Shuah Khan,
	linux-kernel, linux-trace-kernel, kernel-team
In-Reply-To: <20260508102948.b1c687e623fabec65580f258@linux-foundation.org>

On 5/8/26 7:29 PM, Andrew Morton wrote:
> e .configOn Fri,  8 May 2026 18:22:06 +0200 hawk@kernel.org wrote:
> 
>> Add tracepoints to the page allocator fast paths that acquire
>> zone->lock, allowing diagnosis of lock contention in production.
> 
> Thanks, I'm surprised we haven't done this yet.

There was a recent attempt [1]. Not being a generic solution wasn't welcome.

[1] https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/

> Unfortunately "mm: use spinlock guards for zone lock" messed this up
> (https://lore.kernel.org/all/cover.1777462630.git.d@ilvokhin.com/).
> 
> So please let's give it a few days for reviewers to comment then redo
> against mm.git's mm-unstable branch?


^ permalink raw reply

* Re: [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions
From: Andrew Morton @ 2026-05-08 17:29 UTC (permalink / raw)
  To: hawk
  Cc: linux-mm, Vlastimil Babka, Steven Rostedt, Suren Baghdasaryan,
	Michal Hocko, Zi Yan, David Hildenbrand, Lorenzo Stoakes,
	Shuah Khan, linux-kernel, linux-trace-kernel, kernel-team
In-Reply-To: <20260508162207.3315781-1-hawk@kernel.org>

e .configOn Fri,  8 May 2026 18:22:06 +0200 hawk@kernel.org wrote:

> Add tracepoints to the page allocator fast paths that acquire
> zone->lock, allowing diagnosis of lock contention in production.

Thanks, I'm surprised we haven't done this yet.

Unfortunately "mm: use spinlock guards for zone lock" messed this up
(https://lore.kernel.org/all/cover.1777462630.git.d@ilvokhin.com/).

So please let's give it a few days for reviewers to comment then redo
against mm.git's mm-unstable branch?

^ permalink raw reply

* Re: [PATCH v1 1/2] serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
From: Steven Rostedt @ 2026-05-08 17:25 UTC (permalink / raw)
  To: Praveen Talari
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Greg Kroah-Hartman,
	Jiri Slaby, Konrad Dybcio, linux-kernel, linux-trace-kernel,
	linux-arm-msm, linux-serial, Mukesh Kumar Savaliya,
	Aniket Randive, chandana.chiluveru, jyothi.seerapu
In-Reply-To: <20260506-add-tracepoints-for-qcom-geni-serial-v1-1-544b22612e08@oss.qualcomm.com>

On Wed, 06 May 2026 22:54:44 +0530
Praveen Talari <praveen.talari@oss.qualcomm.com> wrote:

> +TRACE_EVENT(geni_serial_tx_data,
> +	    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> +	    TP_ARGS(dev, buf, len),
> +
> +	    TP_STRUCT__entry(__string(name, dev_name(dev))
> +			     __field(unsigned int, len)
> +			     __dynamic_array(u8, data, len)
> +	    ),
> +
> +	    TP_fast_assign(__assign_str(name);
> +			   __entry->len = len;
> +			   memcpy(__get_dynamic_array(data), buf, len);
> +	    ),
> +
> +	    TP_printk("%s: tx_len=%u data=%s",
> +		      __get_str(name), __entry->len,
> +		      __print_hex(__get_dynamic_array(data), __entry->len))
> +);
> +
> +TRACE_EVENT(geni_serial_rx_data,
> +	    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> +	    TP_ARGS(dev, buf, len),
> +
> +	    TP_STRUCT__entry(__string(name, dev_name(dev))
> +			     __field(unsigned int, len)
> +			     __dynamic_array(u8, data, len)
> +	    ),
> +
> +	    TP_fast_assign(__assign_str(name);
> +			   __entry->len = len;
> +			   memcpy(__get_dynamic_array(data), buf, len);
> +	    ),
> +
> +	    TP_printk("%s: rx_len=%u data=%s",

Do you really need to say "tx_len" and "rx_len", could it just be "len" and
have the name of the tracepoint show what it is?

Each TRACE_EVENT() is really just a:

  DECLARE_EVENT_CLASS() followed by a DEFINE_EVENT()

underneath.

And each TRACE_EVENT() costs around 5K in size, where most of that is in
the DECLARE_EVENT_CLASS() portion. Thus, you can save some memory by using
DECLARE_EVENT_CLASS() and then define the above two events with
DEFINE_EVENT().

-- Steve


> +		      __get_str(name), __entry->len,
> +		      __print_hex(__get_dynamic_array(data), __entry->len))
> +);
> +


^ permalink raw reply

* Re: [PATCH 2/5] docs: fix repeated word 'that' across documentation
From: Shuah Khan @ 2026-05-08 17:15 UTC (permalink / raw)
  To: Adrien Reynard, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Jonathan Corbet, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, David Howells,
	Paulo Alcantara, Masami Hiramatsu,
	open list:READ-COPY UPDATE (RCU), open list:DOCUMENTATION,
	open list, open list:DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS,
	open list:FILESYSTEMS [NETFS LIBRARY],
	open list:FILESYSTEMS [NETFS LIBRARY], open list:TRACING,
	Shuah Khan
In-Reply-To: <20260508163759.16231-1-reynard.adrien.08@gmail.com>

On 5/8/26 10:37, Adrien Reynard wrote:

Missing commit log in all your patches - I don't patch 1/5 in
my Inbox.

> Signed-off-by: Adrien Reynard <reynard.adrien.08@gmail.com>
> ---
>   Documentation/RCU/rcu.rst                          | 2 +-
>   Documentation/driver-api/driver-model/overview.rst | 2 +-
>   Documentation/filesystems/netfs_library.rst        | 2 +-
>   Documentation/trace/histogram-design.rst           | 2 +-
>   Documentation/trace/histogram.rst                  | 2 +-
>   5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/RCU/rcu.rst b/Documentation/RCU/rcu.rst
> index bf6617b330a7..320ad3292b75 100644
> --- a/Documentation/RCU/rcu.rst
> +++ b/Documentation/RCU/rcu.rst
> @@ -32,7 +32,7 @@ Frequently Asked Questions
>     Just as with spinlocks, RCU readers are not permitted to
>     block, switch to user-mode execution, or enter the idle loop.
>     Therefore, as soon as a CPU is seen passing through any of these
> -  three states, we know that that CPU has exited any previous RCU
> +  three states, we know that CPU has exited any previous RCU

The original intent might have been to say, "that cpu", so adding
the missing comma after the first "that" or change "that" to "the"
would make sense.


>     read-side critical sections.  So, if we remove an item from a
>     linked list, and then wait until all CPUs have switched context,
>     executed in user mode, or executed in the idle loop, we can
> diff --git a/Documentation/driver-api/driver-model/overview.rst b/Documentation/driver-api/driver-model/overview.rst
> index b3f447bf9f07..c1966d506d55 100644
> --- a/Documentation/driver-api/driver-model/overview.rst
> +++ b/Documentation/driver-api/driver-model/overview.rst
> @@ -55,7 +55,7 @@ struct pci_dev now looks like this::
>   Note first that the struct device dev within the struct pci_dev is
>   statically allocated. This means only one allocation on device discovery.
>   
> -Note also that that struct device dev is not necessarily defined at the
> +Note also that struct device dev is not necessarily defined at the

Sam comment here, replace "that" with "the" or add missing comma

>   front of the pci_dev structure.  This is to make people think about what
>   they're doing when switching between the bus driver and the global driver,
>   and to discourage meaningless and incorrect casts between the two.
> diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
> index ddd799df6ce3..4033de4535ac 100644
> --- a/Documentation/filesystems/netfs_library.rst
> +++ b/Documentation/filesystems/netfs_library.rst
> @@ -626,7 +626,7 @@ A number of members are available for access/use by the filesystem:
>   
>      These are set by the filesystem or the cache in ->prepare_read() or
>      ->prepare_write() for each subrequest to indicate the maximum number of
> -   bytes and, optionally, the maximum number of segments (if not 0) that that
> +   bytes and, optionally, the maximum number of segments (if not 0) that

Same here.

>      subrequest can support.
>   
>    * ``submit_extendable_to``
> diff --git a/Documentation/trace/histogram-design.rst b/Documentation/trace/histogram-design.rst
> index e92f56ebd0b5..949bbfdb0f16 100644
> --- a/Documentation/trace/histogram-design.rst
> +++ b/Documentation/trace/histogram-design.rst
> @@ -738,7 +738,7 @@ creates its own variable, wakeup_lat, but nothing yet uses it::
>   
>   Looking at the sched_waking 'hist_debug' output, in addition to the
>   normal key and value hist_fields, in the val fields section we see a
> -field with the HIST_FIELD_FL_VAR flag, which indicates that that field
> +field with the HIST_FIELD_FL_VAR flag, which indicates that field

Same here


>   represents a variable.  Note that in addition to the variable name,
>   contained in the var.name field, it includes the var.idx, which is the
>   index into the tracing_map_elt.vars[] array of the actual variable
> diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
> index 340bcb5099e7..5b303fabdf32 100644
> --- a/Documentation/trace/histogram.rst
> +++ b/Documentation/trace/histogram.rst
> @@ -1700,7 +1700,7 @@ to that rule is that any variable used in an expression is essentially
>   'read-once' - once it's used by an expression in a subsequent event,
>   it's reset to its 'unset' state, which means it can't be used again
>   unless it's set again.  This ensures not only that an event doesn't
> -use an uninitialized variable in a calculation, but that that variable
> +use an uninitialized variable in a calculation, but that variable

Same here

>   is used only once and not for any unrelated subsequent match.
>   
>   The basic syntax for saving a variable is to simply prefix a unique

thanks,
-- Shuah


^ permalink raw reply

* [PATCH 2/5] docs: fix repeated word 'that' across documentation
From: Adrien Reynard @ 2026-05-08 16:37 UTC (permalink / raw)
  To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Jonathan Corbet, Shuah Khan, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, David Howells,
	Paulo Alcantara, Masami Hiramatsu,
	open list:READ-COPY UPDATE (RCU), open list:DOCUMENTATION,
	open list, open list:DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS,
	open list:FILESYSTEMS [NETFS LIBRARY],
	open list:FILESYSTEMS [NETFS LIBRARY], open list:TRACING
  Cc: Adrien Reynard

Signed-off-by: Adrien Reynard <reynard.adrien.08@gmail.com>
---
 Documentation/RCU/rcu.rst                          | 2 +-
 Documentation/driver-api/driver-model/overview.rst | 2 +-
 Documentation/filesystems/netfs_library.rst        | 2 +-
 Documentation/trace/histogram-design.rst           | 2 +-
 Documentation/trace/histogram.rst                  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/RCU/rcu.rst b/Documentation/RCU/rcu.rst
index bf6617b330a7..320ad3292b75 100644
--- a/Documentation/RCU/rcu.rst
+++ b/Documentation/RCU/rcu.rst
@@ -32,7 +32,7 @@ Frequently Asked Questions
   Just as with spinlocks, RCU readers are not permitted to
   block, switch to user-mode execution, or enter the idle loop.
   Therefore, as soon as a CPU is seen passing through any of these
-  three states, we know that that CPU has exited any previous RCU
+  three states, we know that CPU has exited any previous RCU
   read-side critical sections.  So, if we remove an item from a
   linked list, and then wait until all CPUs have switched context,
   executed in user mode, or executed in the idle loop, we can
diff --git a/Documentation/driver-api/driver-model/overview.rst b/Documentation/driver-api/driver-model/overview.rst
index b3f447bf9f07..c1966d506d55 100644
--- a/Documentation/driver-api/driver-model/overview.rst
+++ b/Documentation/driver-api/driver-model/overview.rst
@@ -55,7 +55,7 @@ struct pci_dev now looks like this::
 Note first that the struct device dev within the struct pci_dev is
 statically allocated. This means only one allocation on device discovery.
 
-Note also that that struct device dev is not necessarily defined at the
+Note also that struct device dev is not necessarily defined at the
 front of the pci_dev structure.  This is to make people think about what
 they're doing when switching between the bus driver and the global driver,
 and to discourage meaningless and incorrect casts between the two.
diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
index ddd799df6ce3..4033de4535ac 100644
--- a/Documentation/filesystems/netfs_library.rst
+++ b/Documentation/filesystems/netfs_library.rst
@@ -626,7 +626,7 @@ A number of members are available for access/use by the filesystem:
 
    These are set by the filesystem or the cache in ->prepare_read() or
    ->prepare_write() for each subrequest to indicate the maximum number of
-   bytes and, optionally, the maximum number of segments (if not 0) that that
+   bytes and, optionally, the maximum number of segments (if not 0) that
    subrequest can support.
 
  * ``submit_extendable_to``
diff --git a/Documentation/trace/histogram-design.rst b/Documentation/trace/histogram-design.rst
index e92f56ebd0b5..949bbfdb0f16 100644
--- a/Documentation/trace/histogram-design.rst
+++ b/Documentation/trace/histogram-design.rst
@@ -738,7 +738,7 @@ creates its own variable, wakeup_lat, but nothing yet uses it::
 
 Looking at the sched_waking 'hist_debug' output, in addition to the
 normal key and value hist_fields, in the val fields section we see a
-field with the HIST_FIELD_FL_VAR flag, which indicates that that field
+field with the HIST_FIELD_FL_VAR flag, which indicates that field
 represents a variable.  Note that in addition to the variable name,
 contained in the var.name field, it includes the var.idx, which is the
 index into the tracing_map_elt.vars[] array of the actual variable
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 340bcb5099e7..5b303fabdf32 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1700,7 +1700,7 @@ to that rule is that any variable used in an expression is essentially
 'read-once' - once it's used by an expression in a subsequent event,
 it's reset to its 'unset' state, which means it can't be used again
 unless it's set again.  This ensures not only that an event doesn't
-use an uninitialized variable in a calculation, but that that variable
+use an uninitialized variable in a calculation, but that variable
 is used only once and not for any unrelated subsequent match.
 
 The basic syntax for saving a variable is to simply prefix a unique
-- 
2.54.0


^ permalink raw reply related

* [PATCH 2/2] selftests/mm: add zone->lock tracepoint verification test
From: hawk @ 2026-05-08 16:22 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Vlastimil Babka, Steven Rostedt, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, David Hildenbrand, Lorenzo Stoakes, Shuah Khan,
	linux-kernel, linux-trace-kernel, kernel-team, hawk
In-Reply-To: <20260508162207.3315781-1-hawk@kernel.org>

From: Jesper Dangaard Brouer <hawk@kernel.org>

Add a selftest to verify the kmem:mm_zone_lock_contended,
kmem:mm_zone_locked, and kmem:mm_zone_lock_unlock tracepoints.

The test has two components:

zone_lock_contention.c - a workload that spawns threads doing rapid
page allocation and freeing to generate zone->lock contention. It
shrinks PCP lists via percpu_pagelist_high_fraction to force frequent
free_pcppages_bulk() and rmqueue_bulk() calls.

test_zone_lock_tracepoints.sh - uses bpftrace to verify tracepoints
exist, have the expected fields, fire under load, and that wait_ns
is populated when contention occurs.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 tools/testing/selftests/mm/Makefile           |   2 +
 .../mm/test_zone_lock_tracepoints.sh          | 212 ++++++++++++++++++
 .../selftests/mm/zone_lock_contention.c       | 166 ++++++++++++++
 3 files changed, 380 insertions(+)
 create mode 100755 tools/testing/selftests/mm/test_zone_lock_tracepoints.sh
 create mode 100644 tools/testing/selftests/mm/zone_lock_contention.c

diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index cd24596cdd27..af6cfdf3c8a0 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -106,6 +106,7 @@ TEST_GEN_FILES += guard-regions
 TEST_GEN_FILES += merge
 TEST_GEN_FILES += rmap
 TEST_GEN_FILES += folio_split_race_test
+TEST_GEN_FILES += zone_lock_contention
 
 ifneq ($(ARCH),arm64)
 TEST_GEN_FILES += soft-dirty
@@ -173,6 +174,7 @@ TEST_PROGS += ksft_thp.sh
 TEST_PROGS += ksft_userfaultfd.sh
 TEST_PROGS += ksft_vma_merge.sh
 TEST_PROGS += ksft_vmalloc.sh
+TEST_PROGS += test_zone_lock_tracepoints.sh
 
 TEST_FILES := test_vmalloc.sh
 TEST_FILES += test_hmm.sh
diff --git a/tools/testing/selftests/mm/test_zone_lock_tracepoints.sh b/tools/testing/selftests/mm/test_zone_lock_tracepoints.sh
new file mode 100755
index 000000000000..7fa3dab1f6c5
--- /dev/null
+++ b/tools/testing/selftests/mm/test_zone_lock_tracepoints.sh
@@ -0,0 +1,212 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# test_zone_lock_tracepoints.sh - Verify mm_zone_lock tracepoints fire
+#
+# Generates zone->lock contention and uses bpftrace to verify that the
+# kmem:mm_zone_lock_contended, kmem:mm_zone_locked, and
+# kmem:mm_zone_lock_unlock tracepoints activate and produce output.
+#
+# Requirements: bpftrace, root privileges, CONFIG_FTRACE=y
+#
+# Usage: ./test_zone_lock_tracepoints.sh [duration_sec]
+#        Default duration: 5 seconds
+#
+# For running in a VM via virtme-ng:
+#   make -C tools/testing/selftests/mm zone_lock_contention
+#   vng --cpus 4 --memory 2G \
+#       --rwdir tools/testing/selftests/mm \
+#       --exec "cd tools/testing/selftests/mm && ./test_zone_lock_tracepoints.sh 5"
+
+set -e
+
+DURATION=${1:-5}
+TESTDIR="$(cd "$(dirname "$0")" && pwd)"
+WORKLOAD="$TESTDIR/zone_lock_contention"
+NR_THREADS=4
+PASS=0
+FAIL=0
+SKIP=0
+
+# --- helpers ---
+
+pass() { echo "PASS: $1"; PASS=$((PASS + 1)); }
+fail() { echo "FAIL: $1"; FAIL=$((FAIL + 1)); }
+skip() { echo "SKIP: $1"; SKIP=$((SKIP + 1)); }
+
+check_root() {
+	if [ "$(id -u)" -ne 0 ]; then
+		echo "ERROR: must run as root"
+		exit 4  # ksft SKIP
+	fi
+}
+
+check_bpftrace() {
+	if ! command -v bpftrace >/dev/null 2>&1; then
+		echo "SKIP: bpftrace not found"
+		exit 4
+	fi
+}
+
+check_workload() {
+	if [ ! -x "$WORKLOAD" ]; then
+		echo "SKIP: $WORKLOAD not found, run 'make -C tools/testing/selftests/mm' first"
+		exit 4
+	fi
+}
+
+check_tracepoint_exists() {
+	local tp="$1"
+	if [ ! -d "/sys/kernel/tracing/events/kmem/$tp" ]; then
+		skip "$tp tracepoint not in kernel"
+		return 1
+	fi
+	return 0
+}
+
+# --- Test 1: verify tracepoints exist in tracefs ---
+
+test_tracepoints_exist() {
+	echo "--- Test 1: tracepoints exist in tracefs ---"
+	for tp in mm_zone_lock_contended mm_zone_locked mm_zone_lock_unlock; do
+		if check_tracepoint_exists "$tp"; then
+			pass "$tp exists"
+		fi
+	done
+}
+
+# --- Test 2: verify format fields ---
+
+test_tracepoint_fields() {
+	echo "--- Test 2: tracepoint format fields ---"
+	local fmt
+
+	if [ -f /sys/kernel/tracing/events/kmem/mm_zone_lock_contended/format ]; then
+		fmt=$(cat /sys/kernel/tracing/events/kmem/mm_zone_lock_contended/format)
+		for field in node_id name count caller; do
+			if echo "$fmt" | grep -q "field.*$field"; then
+				pass "mm_zone_lock_contended has field '$field'"
+			else
+				fail "mm_zone_lock_contended missing field '$field'"
+			fi
+		done
+	fi
+
+	if [ -f /sys/kernel/tracing/events/kmem/mm_zone_locked/format ]; then
+		fmt=$(cat /sys/kernel/tracing/events/kmem/mm_zone_locked/format)
+		for field in node_id name count contended caller wait_ns; do
+			if echo "$fmt" | grep -q "field.*$field"; then
+				pass "mm_zone_locked has field '$field'"
+			else
+				fail "mm_zone_locked missing field '$field'"
+			fi
+		done
+	fi
+}
+
+# --- Test 3: bpftrace counts tracepoint hits under load ---
+
+test_bpftrace_counts() {
+	echo "--- Test 3: bpftrace tracepoint activation under contention ---"
+
+	if ! check_tracepoint_exists mm_zone_locked; then
+		return
+	fi
+
+	local BPFTRACE_OUT
+	BPFTRACE_OUT=$(mktemp /tmp/zone_lock_bt.XXXXXX)
+
+	# bpftrace one-liner: count hits per tracepoint
+	bpftrace -e '
+		tracepoint:kmem:mm_zone_lock_contended { @contended = count(); }
+		tracepoint:kmem:mm_zone_locked          { @locked = count(); }
+		tracepoint:kmem:mm_zone_lock_unlock     { @unlock = count(); }
+	' -c "$WORKLOAD $DURATION $NR_THREADS" > "$BPFTRACE_OUT" 2>&1 &
+	local BT_PID=$!
+
+	# Wait for bpftrace + workload to finish
+	wait $BT_PID 2>/dev/null || true
+
+	echo "bpftrace output:"
+	cat "$BPFTRACE_OUT"
+
+	# Check that mm_zone_locked fired (it fires on every acquisition)
+	if grep -q '@locked: [0-9]' "$BPFTRACE_OUT"; then
+		pass "mm_zone_locked tracepoint fired"
+	else
+		fail "mm_zone_locked tracepoint did NOT fire"
+	fi
+
+	# Check that mm_zone_lock_unlock fired
+	if grep -q '@unlock: [0-9]' "$BPFTRACE_OUT"; then
+		pass "mm_zone_lock_unlock tracepoint fired"
+	else
+		fail "mm_zone_lock_unlock tracepoint did NOT fire"
+	fi
+
+	# contended may or may not fire depending on actual contention
+	if grep -q '@contended: [0-9]' "$BPFTRACE_OUT"; then
+		pass "mm_zone_lock_contended tracepoint fired (contention detected)"
+	else
+		skip "mm_zone_lock_contended did not fire (no contention observed)"
+	fi
+
+	rm -f "$BPFTRACE_OUT"
+}
+
+# --- Test 4: bpftrace verifies wait_ns > 0 when contended ---
+
+test_wait_ns() {
+	echo "--- Test 4: wait_ns is populated when contended ---"
+
+	if ! check_tracepoint_exists mm_zone_locked; then
+		return
+	fi
+
+	local BPFTRACE_OUT
+	BPFTRACE_OUT=$(mktemp /tmp/zone_lock_wait.XXXXXX)
+
+	bpftrace -e '
+		tracepoint:kmem:mm_zone_locked /args->contended/ {
+			@has_wait_ns = count();
+			@wait_ns = hist(args->wait_ns);
+		}
+	' -c "$WORKLOAD $DURATION $NR_THREADS" > "$BPFTRACE_OUT" 2>&1 &
+	local BT_PID=$!
+
+	wait $BT_PID 2>/dev/null || true
+
+	echo "bpftrace wait_ns output:"
+	cat "$BPFTRACE_OUT"
+
+	if grep -q '@has_wait_ns: [0-9]' "$BPFTRACE_OUT"; then
+		pass "wait_ns populated on contended acquisitions"
+	else
+		skip "no contended acquisitions observed for wait_ns check"
+	fi
+
+	rm -f "$BPFTRACE_OUT"
+}
+
+# --- Main ---
+
+echo "=== zone->lock tracepoint selftest ==="
+echo "Duration: ${DURATION}s, Threads: ${NR_THREADS}"
+echo
+
+check_root
+check_bpftrace
+check_workload
+
+test_tracepoints_exist
+test_tracepoint_fields
+test_bpftrace_counts
+test_wait_ns
+
+echo
+echo "=== Results: $PASS passed, $FAIL failed, $SKIP skipped ==="
+
+if [ "$FAIL" -gt 0 ]; then
+	exit 1
+fi
+exit 0
diff --git a/tools/testing/selftests/mm/zone_lock_contention.c b/tools/testing/selftests/mm/zone_lock_contention.c
new file mode 100644
index 000000000000..35ddad7670b1
--- /dev/null
+++ b/tools/testing/selftests/mm/zone_lock_contention.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * zone_lock_contention.c - Generate zone->lock contention for tracepoint testing
+ *
+ * Spawns multiple threads that rapidly allocate and free pages to force
+ * PCP (per-cpu pageset) drains and refills, which acquire zone->lock via
+ * free_pcppages_bulk() and rmqueue_bulk().
+ *
+ * Reducing percpu_pagelist_high_fraction makes PCP lists smaller, causing
+ * more frequent zone->lock acquisitions and thus more contention.
+ *
+ * Usage: zone_lock_contention [duration_sec] [nr_threads]
+ *        Defaults: 5 seconds, 4 threads
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <errno.h>
+#include <time.h>
+
+/* Each thread mmaps/touches/munmaps in a loop to churn pages */
+#define CHUNK_SIZE	(2 * 1024 * 1024)	/* 2 MB per iteration */
+#define PAGE_SZ		4096
+
+static volatile int stop;
+
+struct thread_stats {
+	unsigned long iterations;
+	unsigned long pages_touched;
+};
+
+static void *churn_thread(void *arg)
+{
+	struct thread_stats *stats = arg;
+	unsigned long iter = 0;
+	unsigned long pages = 0;
+
+	while (!stop) {
+		char *p;
+		size_t i;
+
+		p = mmap(NULL, CHUNK_SIZE, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+		if (p == MAP_FAILED) {
+			perror("mmap");
+			break;
+		}
+
+		/* Touch every page to ensure allocation */
+		for (i = 0; i < CHUNK_SIZE; i += PAGE_SZ)
+			p[i] = 1;
+
+		pages += CHUNK_SIZE / PAGE_SZ;
+
+		/* Free pages back - forces PCP drain */
+		munmap(p, CHUNK_SIZE);
+		iter++;
+	}
+
+	stats->iterations = iter;
+	stats->pages_touched = pages;
+	return NULL;
+}
+
+static int write_sysctl(const char *path, const char *val)
+{
+	FILE *f = fopen(path, "w");
+
+	if (!f)
+		return -1;
+	fputs(val, f);
+	fclose(f);
+	return 0;
+}
+
+static int read_sysctl(const char *path, char *buf, size_t len)
+{
+	FILE *f = fopen(path, "r");
+
+	if (!f)
+		return -1;
+	if (!fgets(buf, len, f)) {
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int duration = 5;
+	int nr_threads = 4;
+	char orig_fraction[32] = "";
+	const char *sysctl_path = "/proc/sys/vm/percpu_pagelist_high_fraction";
+	pthread_t *threads;
+	struct thread_stats *stats;
+	unsigned long total_iter = 0, total_pages = 0;
+	int i;
+
+	if (argc > 1)
+		duration = atoi(argv[1]);
+	if (argc > 2)
+		nr_threads = atoi(argv[2]);
+
+	if (duration <= 0 || nr_threads <= 0) {
+		fprintf(stderr, "Usage: %s [duration_sec] [nr_threads]\n", argv[0]);
+		return 1;
+	}
+
+	printf("zone_lock_contention: %d threads, %d seconds\n",
+	       nr_threads, duration);
+
+	/* Shrink PCP lists to force more zone->lock acquisitions */
+	read_sysctl(sysctl_path, orig_fraction, sizeof(orig_fraction));
+	if (write_sysctl(sysctl_path, "100") < 0)
+		fprintf(stderr, "WARNING: cannot write %s (not root?)\n",
+			sysctl_path);
+	else
+		printf("Set percpu_pagelist_high_fraction=100 (was %s)\n",
+		       orig_fraction);
+
+	threads = calloc(nr_threads, sizeof(*threads));
+	stats = calloc(nr_threads, sizeof(*stats));
+	if (!threads || !stats) {
+		perror("calloc");
+		return 1;
+	}
+
+	for (i = 0; i < nr_threads; i++) {
+		if (pthread_create(&threads[i], NULL, churn_thread, &stats[i])) {
+			perror("pthread_create");
+			return 1;
+		}
+	}
+
+	sleep(duration);
+	stop = 1;
+
+	for (i = 0; i < nr_threads; i++) {
+		pthread_join(threads[i], NULL);
+		total_iter += stats[i].iterations;
+		total_pages += stats[i].pages_touched;
+	}
+
+	printf("Total: %lu iterations, %lu pages (%lu MB) churned\n",
+	       total_iter, total_pages,
+	       (total_pages * PAGE_SZ) / (1024 * 1024));
+
+	/* Restore original sysctl */
+	if (orig_fraction[0]) {
+		/* Strip trailing newline */
+		orig_fraction[strcspn(orig_fraction, "\n")] = '\0';
+		write_sysctl(sysctl_path, orig_fraction);
+		printf("Restored percpu_pagelist_high_fraction=%s\n",
+		       orig_fraction);
+	}
+
+	free(threads);
+	free(stats);
+	return 0;
+}
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions
From: hawk @ 2026-05-08 16:22 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Vlastimil Babka, Steven Rostedt, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, David Hildenbrand, Lorenzo Stoakes, Shuah Khan,
	linux-kernel, linux-trace-kernel, kernel-team, hawk

From: Jesper Dangaard Brouer <hawk@kernel.org>

Add tracepoints to the page allocator fast paths that acquire
zone->lock, allowing diagnosis of lock contention in production.

Three tracepoints are introduced:
  kmem:mm_zone_lock_contended - fires when trylock fails (lock is held)
  kmem:mm_zone_locked         - fires on every acquisition
  kmem:mm_zone_lock_unlock    - fires on every release

Each event records the NUMA node, zone name, batch count, and caller.
The mm_zone_locked event additionally records wait_ns: the time spent
spinning when contended, measured via local_clock() with IRQs disabled
to ensure accurate same-CPU timestamps.

The lock/unlock paths are wrapped in __zone_lock()/__zone_unlock()
helpers that use trylock-first to separate the contended and
uncontended cases.  Only the fast paths (free_pcppages_bulk,
rmqueue_bulk, free_one_page) are covered.  Other zone->lock holders
such as compaction, page isolation, and memory hotplug are not
instrumented.

For minimum overhead in production, enable only mm_zone_lock_contended
which fires only on actual contention.  Enable mm_zone_locked for
wait-time analysis, and add mm_zone_lock_unlock for hold-time
measurement.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 include/trace/events/kmem.h | 101 ++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c             |  50 +++++++++++++++---
 2 files changed, 145 insertions(+), 6 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index cd7920c81f85..870c68c70d57 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -458,6 +458,107 @@ TRACE_EVENT(rss_stat,
 		__print_symbolic(__entry->member, TRACE_MM_PAGES),
 		__entry->size)
 	);
+
+/*
+ * Tracepoints for zone->lock on the page allocator fast paths only.
+ * Other code paths that acquire zone->lock (compaction, isolation,
+ * memory hotplug, vmstat, etc.) are not covered here.
+ *
+ * Three events:
+ *   mm_zone_lock_contended - trylock failed, about to spin
+ *   mm_zone_locked         - lock acquired, includes wait_ns when
+ *                            contended (zero otherwise)
+ *   mm_zone_lock_unlock    - lock released
+ *
+ * For production use with minimum overhead, enable only
+ * mm_zone_lock_contended -- it fires only when trylock detects the
+ * lock is already held.
+ *
+ * For wait-time analysis, enable mm_zone_locked -- its wait_ns
+ * field gives the spin duration directly.  Adding unlock allows
+ * hold-time measurement, at the cost of one event per acquisition.
+ */
+TRACE_EVENT(mm_zone_lock_contended,
+
+	TP_PROTO(struct zone *zone, int count, unsigned long caller),
+
+	TP_ARGS(zone, count, caller),
+
+	TP_STRUCT__entry(
+		__field(	int,		node_id		)
+		__string(	name,		zone->name	)
+		__field(	int,		count		)
+		__field(	unsigned long,	caller		)
+	),
+
+	TP_fast_assign(
+		__entry->node_id = zone_to_nid(zone);
+		__assign_str(name);
+		__entry->count = count;
+		__entry->caller = caller;
+	),
+
+	TP_printk("node=%d zone=%-8s count=%-5d caller=%pS",
+		  __entry->node_id, __get_str(name),
+		  __entry->count, (void *)__entry->caller)
+);
+
+TRACE_EVENT(mm_zone_locked,
+
+	TP_PROTO(struct zone *zone, int count, bool contended,
+		 unsigned long caller, u64 wait_ns),
+
+	TP_ARGS(zone, count, contended, caller, wait_ns),
+
+	TP_STRUCT__entry(
+		__field(	int,		node_id		)
+		__string(	name,		zone->name	)
+		__field(	int,		count		)
+		__field(	bool,		contended	)
+		__field(	unsigned long,	caller		)
+		__field(	u64,		wait_ns		)
+	),
+
+	TP_fast_assign(
+		__entry->node_id = zone_to_nid(zone);
+		__assign_str(name);
+		__entry->count = count;
+		__entry->contended = contended;
+		__entry->caller = caller;
+		__entry->wait_ns = wait_ns;
+	),
+
+	TP_printk("node=%d zone=%-8s count=%-5d contended=%d caller=%pS wait=%llu ns",
+		  __entry->node_id, __get_str(name),
+		  __entry->count, __entry->contended,
+		  (void *)__entry->caller, __entry->wait_ns)
+);
+
+TRACE_EVENT(mm_zone_lock_unlock,
+
+	TP_PROTO(struct zone *zone, int count, unsigned long caller),
+
+	TP_ARGS(zone, count, caller),
+
+	TP_STRUCT__entry(
+		__field(	int,		node_id		)
+		__string(	name,		zone->name	)
+		__field(	int,		count		)
+		__field(	unsigned long,	caller		)
+	),
+
+	TP_fast_assign(
+		__entry->node_id = zone_to_nid(zone);
+		__assign_str(name);
+		__entry->count = count;
+		__entry->caller = caller;
+	),
+
+	TP_printk("node=%d zone=%-8s count=%-5d caller=%pS",
+		  __entry->node_id, __get_str(name),
+		  __entry->count, (void *)__entry->caller)
+);
+
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 227d58dc3de6..08018e9beab4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -19,6 +19,7 @@
 #include <linux/highmem.h>
 #include <linux/interrupt.h>
 #include <linux/jiffies.h>
+#include <linux/sched/clock.h>
 #include <linux/compiler.h>
 #include <linux/kernel.h>
 #include <linux/kasan.h>
@@ -1447,6 +1448,43 @@ bool free_pages_prepare(struct page *page, unsigned int order)
 	return __free_pages_prepare(page, order, FPI_NONE);
 }
 
+/*
+ * Helper functions for locking zone->lock with tracepoints.
+ *
+ * This makes it easier to diagnose locking issues and contention in
+ * production environments.  The @count parameter indicates the number
+ * of pages being freed or allocated in the batch operation.
+ *
+ * For minimum overhead attach to kmem:mm_zone_lock_contended, which
+ * only gets activated when trylock detects lock is contended.
+ */
+static inline void
+__zone_lock(struct zone *zone, int count, unsigned long *flags)
+	__acquires(&zone->lock)
+{
+	unsigned long caller = _RET_IP_;
+	u64 wait_start, wait_time = 0;
+	bool contended;
+
+	local_irq_save(*flags);
+	contended = !spin_trylock(&zone->lock);
+	if (contended) {
+		wait_start = local_clock();
+		trace_mm_zone_lock_contended(zone, count, caller);
+		spin_lock(&zone->lock);
+		wait_time = local_clock() - wait_start;
+	}
+	trace_mm_zone_locked(zone, count, contended, caller, wait_time);
+}
+
+static inline void
+__zone_unlock(struct zone *zone, int count, unsigned long *flags)
+	__releases(&zone->lock)
+{
+	trace_mm_zone_lock_unlock(zone, count, _RET_IP_);
+	spin_unlock_irqrestore(&zone->lock, *flags);
+}
+
 /*
  * Frees a number of pages from the PCP lists
  * Assumes all pages on list are in same zone.
@@ -1469,7 +1507,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	/* Ensure requested pindex is drained first. */
 	pindex = pindex - 1;
 
-	spin_lock_irqsave(&zone->lock, flags);
+	__zone_lock(zone, count, &flags);
 
 	while (count > 0) {
 		struct list_head *list;
@@ -1502,7 +1540,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 		} while (count > 0 && !list_empty(list));
 	}
 
-	spin_unlock_irqrestore(&zone->lock, flags);
+	__zone_unlock(zone, count, &flags);
 }
 
 /* Split a multi-block free page into its individual pageblocks. */
@@ -1551,7 +1589,7 @@ static void free_one_page(struct zone *zone, struct page *page,
 			return;
 		}
 	} else {
-		spin_lock_irqsave(&zone->lock, flags);
+		__zone_lock(zone, 1 << order, &flags);
 	}
 
 	/* The lock succeeded. Process deferred pages. */
@@ -1569,7 +1607,7 @@ static void free_one_page(struct zone *zone, struct page *page,
 		}
 	}
 	split_large_buddy(zone, page, pfn, order, fpi_flags);
-	spin_unlock_irqrestore(&zone->lock, flags);
+	__zone_unlock(zone, 1 << order, &flags);
 
 	__count_vm_events(PGFREE, 1 << order);
 }
@@ -2525,7 +2563,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		if (!spin_trylock_irqsave(&zone->lock, flags))
 			return 0;
 	} else {
-		spin_lock_irqsave(&zone->lock, flags);
+		__zone_lock(zone, count, &flags);
 	}
 	for (i = 0; i < count; ++i) {
 		struct page *page = __rmqueue(zone, order, migratetype,
@@ -2545,7 +2583,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 */
 		list_add_tail(&page->pcp_list, list);
 	}
-	spin_unlock_irqrestore(&zone->lock, flags);
+	__zone_unlock(zone, i, &flags);
 
 	return i;
 }
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v1 1/2] spi: qcom-geni: trace: Add trace events for Qualcomm GENI SPI
From: Mark Brown @ 2026-05-08 14:01 UTC (permalink / raw)
  To: Praveen Talari
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, linux-arm-msm, linux-spi,
	MukeshKumarSavaliyamukesh.savaliya, AniketRandiveaniket.randive,
	chandana.chiluveru, jyothi.seerapu
In-Reply-To: <59e36f20-891d-4a58-8cc4-6822d03daa23@oss.qualcomm.com>

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

On Thu, May 07, 2026 at 11:03:39PM +0530, Praveen Talari wrote:
> On 07-05-2026 13:43, Mark Brown wrote:

> > By generic I mean this should not be driver specific at all.

> I hope these changes are fine. Please let me know if you have any concerns
> or feedback.

The data tracepoints look plausible but I would expect them to be
generated by the core, they'll be there for everything so I'd expect
them to work for everything.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 2/3] init: use static buffers for bootconfig extra command line
From: Breno Leitao @ 2026-05-08 13:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andrew Morton, oss, paulmck, linux-trace-kernel, linux-kernel,
	kernel-team
In-Reply-To: <20260429172721.c89072381aa98d1090ad383f@kernel.org>

Hello Masami,

On Wed, Apr 29, 2026 at 05:27:21PM +0900, Masami Hiramatsu wrote:
> On Fri, 17 Apr 2026 08:38:16 -0700
> Breno Leitao <leitao@debian.org> wrote:
> > On Fri, Apr 17, 2026 at 10:44:36AM +0900, Masami Hiramatsu wrote:
> > > On Wed, 15 Apr 2026 03:51:11 -0700
> > > Breno Leitao <leitao@debian.org> wrote:
> > >
> > > But if we can do it, should we continue using bootconfig? I mean
> > > it is easy to make a tool (or add a feature in tools/bootconfig)
> > > which converts bootconfig file to command line string and embeds
> > > it in the kernel. Hmm.
> >
> > Sure, you are talking about a a tool that embeddeds it in the kernel binary,
> > something like:
> >
> >
> > 0) Get a kernel and define CONFIG_BOOT_CONFIG_EMBED_FILE=".bootconfig"
> >
> > 1) Add an option in tools/bootconfig to convert bootconfig (.bootconfig)
> >    to a cmdline string ($ bootconfig -C kernel .bootconfig).
> >    Something like:
> >    # tools/bootconfig/bootconfig -C kernel .bootconfig
> >      mem=2G loglevel=7 debug nokaslr %
> >
> > 2) At kernel build time, run that tool on .bootconfig and embed the
> >    resulting string into the kernel image as a .init.rodata symbol
> >    (embedded_kernel_cmdline[]).
> >
> >    # gdb -batch -ex 'x/s &embedded_kernel_cmdline' vmlinux
> >    0xffffffff87e108f8:    "mem=2G loglevel=7 debug nokaslr "

> Yeah, I think this looks good to me.

Thank you for the feedback. I've begun working on the bootconfig patches
following the approach outlined in Step 1 above. Note that I've
simplified the -C option by removing the "kernel" argument mentioned in
the earlier example.

The patch series is available here:

https://lore.kernel.org/all/20260508-bootconfig_using_tools-v1-0-1132219aa773@debian.org/

I appreciate your continued support.
--breno

^ permalink raw reply

* [PATCH 2/2] tools/bootconfig: render kernel.* subtree as cmdline string with -C
From: Breno Leitao @ 2026-05-08 13:55 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton
  Cc: linux-kernel, linux-trace-kernel, paulmck, oss, Breno Leitao,
	kernel-team
In-Reply-To: <20260508-bootconfig_using_tools-v1-0-1132219aa773@debian.org>

Add a -C option that finds the "kernel" subtree of a bootconfig file
and prints it as a flat, space-separated cmdline string by calling the
shared xbc_snprint_cmdline() renderer. An empty or absent kernel.*
subtree produces empty output and exits successfully.

This lets the kernel build embed a bootconfig file as a plain cmdline
string at build time, so embedded bootconfig values can reach
parse_early_param() during architecture setup without parsing the
bootconfig at runtime.

The renderer is intentionally limited to the kernel.* subtree: that is
the only thing the kernel build needs to embed; init.* and other
subtrees keep going through the runtime parser.

Example of this new mode:
	# cat /tmp/test.bconf
	kernel {
		foo = bar
		baz = "hello world"
		arr = 1, 2
	}
	init.foo = nope

	# ./tools/bootconfig/bootconfig -C /tmp/test.bconf
	foo=bar baz="hello world" arr=1 arr=2 %

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 tools/bootconfig/main.c | 60 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/tools/bootconfig/main.c b/tools/bootconfig/main.c
index 643f707b8f1da..e1bfab044fbcb 100644
--- a/tools/bootconfig/main.c
+++ b/tools/bootconfig/main.c
@@ -286,7 +286,41 @@ static int init_xbc_with_error(char *buf, int len)
 	return ret;
 }
 
-static int show_xbc(const char *path, bool list)
+static int show_xbc_kernel_cmdline(void)
+{
+	struct xbc_node *root;
+	char *buf = NULL;
+	int len, ret;
+
+	root = xbc_find_node("kernel");
+	if (!root)
+		return 0;	/* no kernel.* keys: emit empty output */
+
+	len = xbc_snprint_cmdline(NULL, 0, root);
+	if (len < 0) {
+		pr_err("Failed to size cmdline output: %d\n", len);
+		return len;
+	}
+	if (len == 0)
+		return 0;
+
+	buf = malloc(len + 1);
+	if (!buf)
+		return -ENOMEM;
+
+	ret = xbc_snprint_cmdline(buf, len + 1, root);
+	if (ret < 0) {
+		pr_err("Failed to render cmdline output: %d\n", ret);
+		free(buf);
+		return ret;
+	}
+
+	fputs(buf, stdout);
+	free(buf);
+	return 0;
+}
+
+static int show_xbc(const char *path, bool list, bool render_cmdline)
 {
 	int ret, fd;
 	char *buf = NULL;
@@ -322,11 +356,14 @@ static int show_xbc(const char *path, bool list)
 		if (init_xbc_with_error(buf, ret) < 0)
 			goto out;
 	}
-	if (list)
+	if (render_cmdline)
+		ret = show_xbc_kernel_cmdline();
+	else if (list)
 		xbc_show_list();
 	else
 		xbc_show_compact_tree();
-	ret = 0;
+	if (ret > 0)
+		ret = 0;
 out:
 	free(buf);
 
@@ -486,7 +523,10 @@ static int usage(void)
 		" Options:\n"
 		"		-a <config>: Apply boot config to initrd\n"
 		"		-d : Delete boot config file from initrd\n"
-		"		-l : list boot config in initrd or file\n\n"
+		"		-l : list boot config in initrd or file\n"
+		"		-C : render the kernel.* subtree as a flat cmdline\n"
+		"		     string (suitable for embedding in a kernel image)\n"
+		"		     and print it to stdout\n\n"
 		" If no option is given, show the bootconfig in the given file.\n");
 	return -1;
 }
@@ -495,10 +535,11 @@ int main(int argc, char **argv)
 {
 	char *path = NULL;
 	char *apply = NULL;
+	bool render_cmdline = false;
 	bool delete = false, list = false;
 	int opt;
 
-	while ((opt = getopt(argc, argv, "hda:l")) != -1) {
+	while ((opt = getopt(argc, argv, "hda:lC")) != -1) {
 		switch (opt) {
 		case 'd':
 			delete = true;
@@ -509,14 +550,17 @@ int main(int argc, char **argv)
 		case 'l':
 			list = true;
 			break;
+		case 'C':
+			render_cmdline = true;
+			break;
 		case 'h':
 		default:
 			return usage();
 		}
 	}
 
-	if ((apply && delete) || (delete && list) || (apply && list)) {
-		pr_err("Error: You can give one of -a, -d or -l at once.\n");
+	if ((!!apply + !!delete + !!list + !!render_cmdline) > 1) {
+		pr_err("Error: You can give one of -a, -d, -l or -C at once.\n");
 		return usage();
 	}
 
@@ -532,5 +576,5 @@ int main(int argc, char **argv)
 	else if (delete)
 		return delete_xbc(path);
 
-	return show_xbc(path, list);
+	return show_xbc(path, list, render_cmdline);
 }

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH 1/2] bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c
From: Breno Leitao @ 2026-05-08 13:55 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton
  Cc: linux-kernel, linux-trace-kernel, paulmck, oss, Breno Leitao,
	kernel-team
In-Reply-To: <20260508-bootconfig_using_tools-v1-0-1132219aa773@debian.org>

Move xbc_snprint_cmdline() from init/main.c to lib/bootconfig.c so the
function (and its xbc_namebuf scratch buffer) becomes part of the shared
parser library. tools/bootconfig already compiles lib/bootconfig.c
directly, which lets a follow-up patch reuse the same renderer in the
userspace tool to convert a bootconfig file into a flat cmdline string
at build time.

No functional change.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bootconfig.h |  3 +++
 init/main.c                | 45 -------------------------------------
 lib/bootconfig.c           | 56 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 45 deletions(-)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 692a5acc2ffc4..1c7f3b74ffcf3 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -265,6 +265,9 @@ static inline struct xbc_node * __init xbc_node_get_subkey(struct xbc_node *node
 int __init xbc_node_compose_key_after(struct xbc_node *root,
 			struct xbc_node *node, char *buf, size_t size);
 
+/* Render key/value pairs under @root as a flat cmdline string */
+int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root);
+
 /**
  * xbc_node_compose_key() - Compose full key string of the XBC node
  * @node: An XBC node.
diff --git a/init/main.c b/init/main.c
index 96f93bb06c490..e363232b428b4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -324,51 +324,6 @@ static void * __init get_boot_config_from_initrd(size_t *_size)
 
 #ifdef CONFIG_BOOT_CONFIG
 
-static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
-
-#define rest(dst, end) ((end) > (dst) ? (end) - (dst) : 0)
-
-static int __init xbc_snprint_cmdline(char *buf, size_t size,
-				      struct xbc_node *root)
-{
-	struct xbc_node *knode, *vnode;
-	char *end = buf + size;
-	const char *val, *q;
-	int ret;
-
-	xbc_node_for_each_key_value(root, knode, val) {
-		ret = xbc_node_compose_key_after(root, knode,
-					xbc_namebuf, XBC_KEYLEN_MAX);
-		if (ret < 0)
-			return ret;
-
-		vnode = xbc_node_get_child(knode);
-		if (!vnode) {
-			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
-			if (ret < 0)
-				return ret;
-			buf += ret;
-			continue;
-		}
-		xbc_array_for_each_value(vnode, val) {
-			/*
-			 * For prettier and more readable /proc/cmdline, only
-			 * quote the value when necessary, i.e. when it contains
-			 * whitespace.
-			 */
-			q = strpbrk(val, " \t\r\n") ? "\"" : "";
-			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
-				       xbc_namebuf, q, val, q);
-			if (ret < 0)
-				return ret;
-			buf += ret;
-		}
-	}
-
-	return buf - (end - size);
-}
-#undef rest
-
 /* Make an extra command line under given key word */
 static char * __init xbc_make_cmdline(const char *key)
 {
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index c470b93d5dbc2..f445b7703fdd9 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -408,6 +408,62 @@ const char * __init xbc_node_find_next_key_value(struct xbc_node *root,
 		return "";	/* No value key */
 }
 
+static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
+
+#define rest(dst, end) ((end) > (dst) ? (end) - (dst) : 0)
+
+/**
+ * xbc_snprint_cmdline() - Render bootconfig keys under @root as a cmdline string
+ * @buf: Destination buffer (may be NULL when @size is 0 to query the length)
+ * @size: Size of @buf in bytes
+ * @root: Subtree root whose key=value pairs should be rendered
+ *
+ * Walk all key/value pairs under @root and emit them as a space-separated
+ * cmdline string into @buf. Values containing whitespace are quoted with
+ * double quotes. Returns the number of bytes that would be written if @buf
+ * were large enough (matching snprintf semantics), or a negative errno on
+ * failure.
+ */
+int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
+{
+	struct xbc_node *knode, *vnode;
+	char *end = buf + size;
+	const char *val, *q;
+	int ret;
+
+	xbc_node_for_each_key_value(root, knode, val) {
+		ret = xbc_node_compose_key_after(root, knode,
+					xbc_namebuf, XBC_KEYLEN_MAX);
+		if (ret < 0)
+			return ret;
+
+		vnode = xbc_node_get_child(knode);
+		if (!vnode) {
+			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+			if (ret < 0)
+				return ret;
+			buf += ret;
+			continue;
+		}
+		xbc_array_for_each_value(vnode, val) {
+			/*
+			 * For prettier and more readable /proc/cmdline, only
+			 * quote the value when necessary, i.e. when it contains
+			 * whitespace.
+			 */
+			q = strpbrk(val, " \t\r\n") ? "\"" : "";
+			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
+				       xbc_namebuf, q, val, q);
+			if (ret < 0)
+				return ret;
+			buf += ret;
+		}
+	}
+
+	return buf - (end - size);
+}
+#undef rest
+
 /* XBC parse and tree build */
 
 static int __init xbc_init_node(struct xbc_node *node, char *data, uint16_t flag)

-- 
2.53.0-Meta


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox