Re: [PATCH v3 00/16] mm: Some cleanups for page allocator APIs

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH v3 00/16] mm: Some cleanups for page allocator APIs
       [not found] <20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com>
@ 2026-06-29 14:00 ` Mike Rapoport
  2026-06-29 14:30   ` Brendan Jackman
       [not found] ` <20260629-alloc-trylock-v3-4-57bef0eadbc2@google.com>
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Mike Rapoport @ 2026-06-29 14:00 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Harry Yoo (Oracle), Gregory Price,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Sean Christopherson, Paolo Bonzini, kvm,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Robin Holt, Steve Wahl, Arnd Bergmann,
	Greg Kroah-Hartman, Dimitris Michailidis, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

Hi Brendan,

On Mon, Jun 29, 2026 at 01:11:49PM +0000, Brendan Jackman wrote:
> 
> Some tweaks and cleanups for page allocator entrypoint and flags. This
> is motivated by preparation for __GFP_UNMAPPED [1] (which will probably
> become ALLOC_UNMAPPED in its next iteration), but all this is supposed
> to be an improvement to the codebase in its own right: unifying code
> paths, reducing API surface, and removing GFP flags.
> 
> Tested:
> 
> - KVM, mm, and BPF selftests in a QEMU VM
> 
> - kunit.py on x86_64
> 
> - For the ALLOC_NO_CODETAG bits I just booted a VM and read
>   /proc/allocinfo. I confirmed that if I remove ALLOC_NO_CODETAG, the
>   kernel crashes in early boot, so I was at least booting code that
>   depends on this logic.

Heads up before the full kbuild report:

CI has tested the following submission:
Status:     FAILURE
Name:       [v3,00/16] mm: Some cleanups for page allocator APIs
Patchwork:  https://patchwork.kernel.org/project/linux-mm/list/?series=1118244&state=*
Matrix:     https://github.com/linux-mm/linux-mm/actions/runs/28375636866

> I used Google's internal version of Antigravity (AI coding harness) to
> do the repetitive bits, those commits are marked with Assisted-by, the
> rest is manual.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 00/16] mm: Some cleanups for page allocator APIs
  2026-06-29 14:00 ` [PATCH v3 00/16] mm: Some cleanups for page allocator APIs Mike Rapoport
@ 2026-06-29 14:30   ` Brendan Jackman
  2026-06-29 15:05     ` Brendan Jackman
  0 siblings, 1 reply; 31+ messages in thread
From: Brendan Jackman @ 2026-06-29 14:30 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Harry Yoo (Oracle), Gregory Price,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Sean Christopherson, Paolo Bonzini, kvm,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Robin Holt, Steve Wahl, Arnd Bergmann,
	Greg Kroah-Hartman, Dimitris Michailidis, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

On Mon, 29 Jun 2026 at 16:00, Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi Brendan,
>
> On Mon, Jun 29, 2026 at 01:11:49PM +0000, Brendan Jackman wrote:
> >
> > Some tweaks and cleanups for page allocator entrypoint and flags. This
> > is motivated by preparation for __GFP_UNMAPPED [1] (which will probably
> > become ALLOC_UNMAPPED in its next iteration), but all this is supposed
> > to be an improvement to the codebase in its own right: unifying code
> > paths, reducing API surface, and removing GFP flags.
> >
> > Tested:
> >
> > - KVM, mm, and BPF selftests in a QEMU VM
> >
> > - kunit.py on x86_64
> >
> > - For the ALLOC_NO_CODETAG bits I just booted a VM and read
> >   /proc/allocinfo. I confirmed that if I remove ALLOC_NO_CODETAG, the
> >   kernel crashes in early boot, so I was at least booting code that
> >   depends on this logic.
>
> Heads up before the full kbuild report:
>
> CI has tested the following submission:
> Status:     FAILURE
> Name:       [v3,00/16] mm: Some cleanups for page allocator APIs
> Patchwork:  https://patchwork.kernel.org/project/linux-mm/list/?series=1118244&state=*
> Matrix:     https://github.com/linux-mm/linux-mm/actions/runs/28375636866

Agh, thanks, I broke the build for CMA.

I thought I had this covered in my local test scripts. I will fix that
first then I'll send a fixup for the patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 00/16] mm: Some cleanups for page allocator APIs
  2026-06-29 14:30   ` Brendan Jackman
@ 2026-06-29 15:05     ` Brendan Jackman
  0 siblings, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-29 15:05 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Harry Yoo (Oracle), Gregory Price,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Sean Christopherson, Paolo Bonzini, kvm,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Robin Holt, Steve Wahl, Arnd Bergmann,
	Greg Kroah-Hartman, Dimitris Michailidis, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

On Mon, 29 Jun 2026 at 16:30, Brendan Jackman <jackmanb@google.com> wrote:
>
> On Mon, 29 Jun 2026 at 16:00, Mike Rapoport <rppt@kernel.org> wrote:
> >
> > Hi Brendan,
> >
> > On Mon, Jun 29, 2026 at 01:11:49PM +0000, Brendan Jackman wrote:
> > >
> > > Some tweaks and cleanups for page allocator entrypoint and flags. This
> > > is motivated by preparation for __GFP_UNMAPPED [1] (which will probably
> > > become ALLOC_UNMAPPED in its next iteration), but all this is supposed
> > > to be an improvement to the codebase in its own right: unifying code
> > > paths, reducing API surface, and removing GFP flags.
> > >
> > > Tested:
> > >
> > > - KVM, mm, and BPF selftests in a QEMU VM
> > >
> > > - kunit.py on x86_64
> > >
> > > - For the ALLOC_NO_CODETAG bits I just booted a VM and read
> > >   /proc/allocinfo. I confirmed that if I remove ALLOC_NO_CODETAG, the
> > >   kernel crashes in early boot, so I was at least booting code that
> > >   depends on this logic.
> >
> > Heads up before the full kbuild report:
> >
> > CI has tested the following submission:
> > Status:     FAILURE
> > Name:       [v3,00/16] mm: Some cleanups for page allocator APIs
> > Patchwork:  https://patchwork.kernel.org/project/linux-mm/list/?series=1118244&state=*
> > Matrix:     https://github.com/linux-mm/linux-mm/actions/runs/28375636866
>
> Agh, thanks, I broke the build for CMA.
>
> I thought I had this covered in my local test scripts. I will fix that
> first then I'll send a fixup for the patch.

OK my scripts are indeed checking CMA, the issue is that I didn't
build with NUMA_BALANCING.

I guess Suren was right[0] and I really should build allmodconfig (at
least vmlinux) before sending patches. I was a bit skeptical that this
was an especially useful config to build, but now I realise it just
maximises coverage, even if it does so under a pretty arbitrary
configuration.

[0] https://lore.kernel.org/all/CAJuCfpHAMaK2sZYSgS750CvgksCKEfOmLzZasXdBup+CrS-3Pg@mail.gmail.com/

Here are the fixes:

diff --git i/mm/migrate.c w/mm/migrate.c
index d9b23909d716c..8e0a6fb3f6618 100644
--- i/mm/migrate.c
+++ w/mm/migrate.c
@@ -49,6 +49,7 @@
 #include <trace/events/migrate.h>

 #include "internal.h"
+#include "page_alloc.h"
 #include "swap.h"

 static const struct movable_operations *offline_movable_ops;
diff --git i/mm/page_reporting.c w/mm/page_reporting.c
index 7418f2e500bb4..c7325704c3202 100644
--- i/mm/page_reporting.c
+++ w/mm/page_reporting.c
@@ -8,6 +8,7 @@
 #include <linux/delay.h>
 #include <linux/scatterlist.h>

+#include "page_alloc.h"
 #include "page_reporting.h"
 #include "internal.h"

diff --git i/mm/shuffle.c w/mm/shuffle.c
index fb1393b8b3a9d..82a2c7725a08a 100644
--- i/mm/shuffle.c
+++ w/mm/shuffle.c
@@ -7,6 +7,7 @@
 #include <linux/random.h>
 #include <linux/moduleparam.h>
 #include "internal.h"
+#include "page_alloc.h"
 #include "shuffle.h"

 DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);

^ permalink raw reply related	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-4-57bef0eadbc2@google.com>]

* Re: [PATCH v3 04/16] mm: Split out internal page_alloc.h
       [not found] ` <20260629-alloc-trylock-v3-4-57bef0eadbc2@google.com>
@ 2026-06-29 14:16   ` sashiko-bot
  2026-06-30 13:54   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 31+ messages in thread
From: sashiko-bot @ 2026-06-29 14:16 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-kernel, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Missing inclusion of `page_alloc.h` causes build failures for multiple configurations.
--

commit 34a7c3e538f8ec840a6b68715addef408e469ff0
Author: Brendan Jackman <jackmanb@google.com>

mm: Split out internal page_alloc.h

internal.h is a bit bloated, seems like time for a page_alloc.h.
This moves definitions supporting mm/page_alloc.c into the new header.

Link: https://lore.kernel.org/all/41e92bab-6882-401a-8de9-154adbdcfb36@kernel.org/

> diff --git a/mm/internal.h b/mm/internal.h
> index 8ce59c5664497..c22284f04fc9e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
[ ... ]
> -static inline unsigned int buddy_order(struct page *page)
> -{
> -	/* PageBuddy() must be checked by the caller */
> -	return page_private(page);
> -}

[Severity: Low]
Will this cause a build failure when shuffle page allocator or page reporting
features are enabled?

It looks like shuffle_valid_page() in mm/shuffle.c and page_reporting_drain()
in mm/page_reporting.c use buddy_order() but don't include the new
page_alloc.h.

[ ... ]
> -extern void __free_pages_core(struct page *page, unsigned int order,
> -		enum meminit_context context);

[Severity: Low]
Does mm/kmsan/init.c need to include page_alloc.h now?

It seems do_collection() uses __free_pages_core() and might fail to build
when the kernel is configured for kmsan if the header is missing.

[ ... ]
> -#define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */

[Severity: Low]
Could this break the build for migration and page reporting configurations?

I am noticing that migrate_balanced_pgdat() in mm/migrate.c and
page_reporting_process_zone() in mm/page_reporting.c use ALLOC_CMA but neither
of these files appears to include page_alloc.h.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com?part=4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 04/16] mm: Split out internal page_alloc.h
       [not found] ` <20260629-alloc-trylock-v3-4-57bef0eadbc2@google.com>
  2026-06-29 14:16   ` [PATCH v3 04/16] mm: Split out internal page_alloc.h sashiko-bot
@ 2026-06-30 13:54   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 13:54 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> internal.h is a bit bloated, seems like time for a page_alloc.h.
> 
> Where it wasn't obvious, the heuristic for deciding what goes into this
> new header was "does it support/correspond to a definition in
> mm/page_alloc.c?"
> 
> Only need to include it from 15 .c files out of 164 so this does seem
> like a genuine reduction in scopes, which is nice. And there's no
> circular internal.h<->page_alloc.h dependency, so it seems worthwhile to
> split this up before that inevitably emerges!
> 
> Suggested-by: "David Hildenbrand (Arm)" <david@kernel.org>
> Link: https://lore.kernel.org/all/41e92bab-6882-401a-8de9-154adbdcfb36@kernel.org/
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Cool.

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-11-57bef0eadbc2@google.com>]

* Re: [PATCH v3 11/16] sgi-xp: Use higher-level allocator API
       [not found] ` <20260629-alloc-trylock-v3-11-57bef0eadbc2@google.com>
@ 2026-06-29 15:04   ` sashiko-bot
  2026-06-29 18:47   ` Steve Wahl
  1 sibling, 0 replies; 31+ messages in thread
From: sashiko-bot @ 2026-06-29 15:04 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] xpc_init_mq_node() incorrectly passes the node ID (nid) instead of the loop variable (cpu) to xpc_create_gru_mq_uv().
--

commit 364bc05827d5201ce117b2c0d83f21a9b64f85f3
Author: Brendan Jackman <jackmanb@google.com>

sgi-xp: Use higher-level allocator API

This commit replaces __alloc_pages_node() with alloc_pages_node() in
xpc_create_gru_mq_uv(). The __alloc_pages_node() function is being phased
out, and alloc_pages_node() works identically here while allowing
NUMA_NO_NODE if it were needed.

> diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
> index 772c787268932..aacff70204241 100644
> --- a/drivers/misc/sgi-xp/xpc_uv.c
> +++ b/drivers/misc/sgi-xp/xpc_uv.c
> @@ -170,7 +170,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
>  	mq->mmr_blade = uv_cpu_to_blade_id(cpu);
>  
>  	nid = cpu_to_node(cpu);

[Severity: High]
This is a pre-existing issue, but while reviewing this code, I noticed a
potential out-of-bounds access issue.

In xpc_init_mq_node(), the node ID (nid) is incorrectly passed instead of the
loop variable (cpu) to xpc_create_gru_mq_uv():

drivers/misc/sgi-xp/xpc_uv.c:xpc_init_mq_node() {
    ...
		xpc_activate_mq_uv =
			xpc_create_gru_mq_uv(XPC_ACTIVATE_MQ_SIZE_UV, nid,
					     XPC_ACTIVATE_IRQ_NAME,
					     xpc_handle_activate_IRQ_uv);
    ...
}

Because xpc_create_gru_mq_uv() expects a valid CPU index, passing a NUMA node
ID means the cpu parameter here contains a node ID. Since NUMA node IDs and
CPU IDs are in different ID spaces, cpu_to_node() will access the numa_node
per-CPU variable using a node ID instead of a CPU ID.

If the node ID is greater than or equal to nr_cpu_ids, will this result in an
out-of-bounds access causing a kernel panic during module load?

Even if it is within bounds, does this result in routing interrupts and
allocating memory using the topology of the wrong CPU?

> -	page = __alloc_pages_node(nid,
> +	page = alloc_pages_node(nid,
>  				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
>  				      pg_order);
>  	if (page == NULL) {

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com?part=11

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 11/16] sgi-xp: Use higher-level allocator API
       [not found] ` <20260629-alloc-trylock-v3-11-57bef0eadbc2@google.com>
  2026-06-29 15:04   ` [PATCH v3 11/16] sgi-xp: Use higher-level allocator API sashiko-bot
@ 2026-06-29 18:47   ` Steve Wahl
  1 sibling, 0 replies; 31+ messages in thread
From: Steve Wahl @ 2026-06-29 18:47 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
	Robin Holt, Steve Wahl, Arnd Bergmann, Greg Kroah-Hartman

Acked-by: Steve Wahl <steve.wahl@hpe.com>

On Mon, Jun 29, 2026 at 01:12:00PM +0000, Brendan Jackman wrote:
> The difference between __alloc_pages_node() and alloc_pages_node() is
> that the latter allows you to pass NUMA_NO_NODE.
> 
> The former is going away and the latter works fine here so switch over.
> 
> No functional change intended.
> 
> Cc: Robin Holt <robinmholt@gmail.com>
> Cc: Steve Wahl <steve.wahl@hpe.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Assisted-by: Gemini:unknown-model
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  drivers/misc/sgi-xp/xpc_uv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
> index 772c787268932..aacff70204241 100644
> --- a/drivers/misc/sgi-xp/xpc_uv.c
> +++ b/drivers/misc/sgi-xp/xpc_uv.c
> @@ -170,7 +170,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
>  	mq->mmr_blade = uv_cpu_to_blade_id(cpu);
>  
>  	nid = cpu_to_node(cpu);
> -	page = __alloc_pages_node(nid,
> +	page = alloc_pages_node(nid,
>  				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
>  				      pg_order);
>  	if (page == NULL) {
> 
> -- 
> 2.54.0
> 

-- 
Steve Wahl, Hewlett Packard Enterprise

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-13-57bef0eadbc2@google.com>]

* Re: [PATCH v3 13/16] mm: Remove __alloc_pages_node()
       [not found] ` <20260629-alloc-trylock-v3-13-57bef0eadbc2@google.com>
@ 2026-06-29 15:27   ` sashiko-bot
  0 siblings, 0 replies; 31+ messages in thread
From: sashiko-bot @ 2026-06-29 15:27 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-kernel, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Incomplete API refactoring leaves `__folio_alloc_node()` and its external callers vulnerable to the CPU hotplug hazard the patch intends to fix.
--

commit 1c3f10897a9f9fbcd231ced2981c00e238f22315
Author: Brendan Jackman <jackmanb@google.com>

mm: Remove __alloc_pages_node()

This commit removes the __alloc_pages_node() API to reduce the API
surface and prevent issues where callers using cpu_to_node() pass
NUMA_NO_NODE during CPU hotplug, leading to memory corruption or panics.

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 01d6d2591f49e..3bf55a5f9143e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
[ ... ]
> @@ -256,21 +256,6 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
>  	dump_stack();
>  }
>  
> -/*
> - * Allocate pages, preferring the node given as nid. The node must be valid and
> - * online. For more general interface, see alloc_pages_node().
> - */
> -static inline struct page *
> -__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
> -{
> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> -	warn_if_node_offline(nid, gfp_mask);
> -
> -	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
> -}
> -
> -#define  __alloc_pages_node(...)		alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
> -
>  static inline
>  struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
>  {

[Severity: High]
This is a pre-existing issue, but does __folio_alloc_node_noprof() and its 
wrappers need the same treatment?

I noticed it still lacks the NUMA_NO_NODE fallback, which can be triggered by 
existing external callers. For instance, in net/sunrpc/svc.c, 
svc_pool_map_get_node() uses cpu_to_node(), which can return NUMA_NO_NODE (-1) 
during CPU hotplug:

net/sunrpc/svc.c:svc_pool_map_get_node() {
    ...
    if (m->mode == SVC_POOL_PERCPU)
        return cpu_to_node(m->pool_to[pidx]);
    ...
}

This returned node ID is then passed directly to __folio_alloc_node() in 
svc_prepare_thread():

net/sunrpc/svc.c:svc_prepare_thread() {
    ...
    rqstp->rq_scratch_folio = __folio_alloc_node(GFP_KERNEL, 0, node);
    if (!rqstp->rq_scratch_folio)
        goto out_enomem;
    ...
}

Could this trigger the same VM_BUG_ON(nid < 0) or out-of-bounds array access 
that this commit aims to resolve?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com?part=13

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-9-57bef0eadbc2@google.com>]

* RE: -EXT-[PATCH v3 09/16] KVM: VMX: Use higher-level allocator API
       [not found] ` <20260629-alloc-trylock-v3-9-57bef0eadbc2@google.com>
@ 2026-06-29 15:31   ` Soderlund, David
  0 siblings, 0 replies; 31+ messages in thread
From: Soderlund, David @ 2026-06-29 15:31 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev,
	Sean Christopherson, Paolo Bonzini, kvm@vger.kernel.org

Can someone please remove me from this list.  I've requested a couple of times.

  Thanks,
     David

-----Original Message-----
From: Brendan Jackman <jackmanb@google.com> 
Sent: Monday, June 29, 2026 7:12 AM
To: Andrew Morton <akpm@linux-foundation.org>; Vlastimil Babka <vbabka@kernel.org>; Suren Baghdasaryan <surenb@google.com>; Michal Hocko <mhocko@suse.com>; Johannes Weiner <hannes@cmpxchg.org>; Zi Yan <ziy@nvidia.com>; Muchun Song <muchun.song@linux.dev>; Oscar Salvador <osalvador@suse.de>; David Hildenbrand <david@kernel.org>; Lorenzo Stoakes <ljs@kernel.org>; Liam R. Howlett <liam@infradead.org>; Mike Rapoport <rppt@kernel.org>; Matthew Brost <matthew.brost@intel.com>; Joshua Hahn <joshua.hahnjy@gmail.com>; Rakie Kim <rakie.kim@sk.com>; Byungchul Park <byungchul@sk.com>; Ying Huang <ying.huang@linux.alibaba.com>; Alistair Popple <apopple@nvidia.com>; Hao Li <hao.li@linux.dev>; Christoph Lameter <cl@gentwo.org>; David Rientjes <rientjes@google.com>; Roman Gushchin <roman.gushchin@linux.dev>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Clark Williams <clrkwllms@kernel.org>; Steven Rostedt <rostedt@goodmis.org>
Cc: Harry Yoo (Oracle) <harry@kernel.org>; Gregory Price <gourry@gourry.net>; Johannes Weiner <hannes@cmpxchg.org>; Alexei Starovoitov <ast@kernel.org>; Matthew Wilcox <willy@infradead.org>; Hao Ge <hao.ge@linux.dev>; linux-mm@kvack.org; linux-kernel@vger.kernel.org; linux-rt-devel@lists.linux.dev; Brendan Jackman <jackmanb@google.com>; Sean Christopherson <seanjc@google.com>; Paolo Bonzini <pbonzini@redhat.com>; kvm@vger.kernel.org
Subject: -EXT-[PATCH v3 09/16] KVM: VMX: Use higher-level allocator API

WARNING:  This message is from an external source.  Evaluate the message carefully BEFORE clicking on links or opening attachments.

The difference between __alloc_pages_node() and alloc_pages_node() is that the latter allows you to pass NUMA_NO_NODE.

The former is going away and the latter works fine here so switch over.

No functional change intended.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2325be57d3d75..ad6a7fc6a54da 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3028,7 +3028,7 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
        struct page *pages;
        struct vmcs *vmcs;

-       pages = __alloc_pages_node(node, flags, 0);
+       pages = alloc_pages_node(node, flags, 0);
        if (!pages)
                return NULL;
        vmcs = page_address(pages);

--
2.54.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-16-57bef0eadbc2@google.com>]

* Re: [PATCH v3 16/16] mm: remove the __GFP_NO_OBJ_EXT flag
       [not found] ` <20260629-alloc-trylock-v3-16-57bef0eadbc2@google.com>
@ 2026-06-29 16:02   ` sashiko-bot
  2026-06-30 10:04     ` Brendan Jackman
  0 siblings, 1 reply; 31+ messages in thread
From: sashiko-bot @ 2026-06-29 16:02 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Incomplete removal of `__GFP_NO_OBJ_EXT` flag leaves the bit reserved in the core kernel.
--

commit d2006cc547e302003b4c27943f81bc40cdd90223
Author: Brendan Jackman <jackmanb@google.com>

mm: remove the __GFP_NO_OBJ_EXT flag

This commit removes the __GFP_NO_OBJ_EXT flag from the tools copy of
gfp_types.h. Its previous users have been converted to use
SLAB_ALLOC_NO_RECURSE or ALLOC_NO_CODETAG instead.

> diff --git a/tools/include/linux/gfp_types.h b/tools/include/linux/gfp_types.h
> index 6c75df30a281d..a93b8bd200b76 100644
> --- a/tools/include/linux/gfp_types.h
> +++ b/tools/include/linux/gfp_types.h
> @@ -55,7 +55,6 @@ enum {
>  #ifdef CONFIG_LOCKDEP
>  	___GFP_NOLOCKDEP_BIT,
>  #endif
> -	___GFP_NO_OBJ_EXT_BIT,

[Severity: Medium]
Is it possible that the core kernel header updates were missed? The
changes in this patch only modify the tools copy of the header, which
causes it to diverge from the actual kernel definitions.

Looking at include/linux/gfp_types.h, ___GFP_NO_OBJ_EXT_BIT remains
reserved in the core kernel. Additionally, the trace event mapping
TRACE_GFP_EM(NO_OBJ_EXT) is left behind as dead code in
include/trace/events/mmflags.h.

Could the removal be applied to those files as well so the bit is fully
freed in the core kernel page allocator as described in the commit
message?

>  	___GFP_LAST_BIT
>  };

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com?part=16

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 16/16] mm: remove the __GFP_NO_OBJ_EXT flag
  2026-06-29 16:02   ` [PATCH v3 16/16] mm: remove the __GFP_NO_OBJ_EXT flag sashiko-bot
@ 2026-06-30 10:04     ` Brendan Jackman
  0 siblings, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 10:04 UTC (permalink / raw)
  To: sashiko-reviews, Brendan Jackman; +Cc: linux-rt-devel, linux-kernel

On Mon Jun 29, 2026 at 4:02 PM UTC, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] Incomplete removal of `__GFP_NO_OBJ_EXT` flag leaves the bit reserved in the core kernel.
> --
>
> commit d2006cc547e302003b4c27943f81bc40cdd90223
> Author: Brendan Jackman <jackmanb@google.com>
>
> mm: remove the __GFP_NO_OBJ_EXT flag
>
> This commit removes the __GFP_NO_OBJ_EXT flag from the tools copy of
> gfp_types.h. Its previous users have been converted to use
> SLAB_ALLOC_NO_RECURSE or ALLOC_NO_CODETAG instead.
>
>> diff --git a/tools/include/linux/gfp_types.h b/tools/include/linux/gfp_types.h
>> index 6c75df30a281d..a93b8bd200b76 100644
>> --- a/tools/include/linux/gfp_types.h
>> +++ b/tools/include/linux/gfp_types.h
>> @@ -55,7 +55,6 @@ enum {
>>  #ifdef CONFIG_LOCKDEP
>>  	___GFP_NOLOCKDEP_BIT,
>>  #endif
>> -	___GFP_NO_OBJ_EXT_BIT,
>
> [Severity: Medium]
> Is it possible that the core kernel header updates were missed? The
> changes in this patch only modify the tools copy of the header, which
> causes it to diverge from the actual kernel definitions.
>
> Looking at include/linux/gfp_types.h, ___GFP_NO_OBJ_EXT_BIT remains
> reserved in the core kernel. 

LOL, Sashiko is correct, I did not actually delete __GFP_NO_OBJ_EXT in
this patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-15-57bef0eadbc2@google.com>]

* Re: [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
       [not found] ` <20260629-alloc-trylock-v3-15-57bef0eadbc2@google.com>
@ 2026-06-29 15:56   ` sashiko-bot
  2026-06-30  4:34     ` Hao Ge
  2026-06-30  1:55   ` Hao Ge
  1 sibling, 1 reply; 31+ messages in thread
From: sashiko-bot @ 2026-06-29 15:56 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-kernel, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Low] The fallback `#else` definition of `pgalloc_tag_add` was not updated to accept `unsigned int alloc_flags`.

Pre-existing issues:
- [Critical] TOCTOU race in `__alloc_tag_add_early_pfn` leading to an out-of-bounds array write.
--

commit ccee7f6d4bcf9d00da38c878e97dd852c5e649b3
Author: Brendan Jackman <jackmanb@google.com>

mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG

This commit replaces the mm-internal __GFP_NO_CODETAG flag with a
new alloc_flags bit, ALLOC_NO_CODETAG. It plumbs alloc_flags through
several allocator functions to allow passing this flag directly to the
alloc_tag code.

> diff --git a/mm/alloc_tag.c b/mm/alloc_tag.c
> index d9be1cf5187d9..a32a94e759b94 100644
> --- a/mm/alloc_tag.c
> +++ b/mm/alloc_tag.c
[ ... ]
> @@ -806,7 +795,8 @@ static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
>  	do {
>  		pool = READ_ONCE(current_pfn_pool);
>  		if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
> -			struct page *new_page = alloc_page(__GFP_HIGH | __GFP_NO_CODETAG);
> +			struct page *new_page = __alloc_pages(__GFP_HIGH, 0, numa_mem_id(),
> +							      NULL, ALLOC_NO_CODETAG);
>  			struct pfn_pool *new;

[Severity: Critical]
This is a pre-existing issue, but is there a time-of-check to time-of-use
race condition here?

If multiple CPUs are active during the deferred init window, could they
concurrently observe the same remaining capacity?

CPU1:
    if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE)

CPU2:
    if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE)

If both see 1 remaining slot, won't they both proceed to call
atomic_fetch_add() on the pool counter further down in this function?
This could cause an out-of-bounds array write at pool->pfns[PFN_POOL_SIZE].

[ ... ]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 026f33f217036..803b32e5a5e47 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
[ ... ]
> @@ -1263,17 +1263,17 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
>  		alloc_tag_add_early_pfn(page_to_pfn(page), alloc_flags);
>  		if (task->alloc_tag)
>  			alloc_tag_set_inaccurate(task->alloc_tag);
>  	}
>  }
>  
>  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
> -				   unsigned int nr, gfp_t gfp_flags)
> +				   unsigned int nr, unsigned int alloc_flags)
>  {
>  	if (mem_alloc_profiling_enabled())
> -		__pgalloc_tag_add(page, task, nr, gfp_flags);
> +		__pgalloc_tag_add(page, task, nr, alloc_flags);
>  }

[Severity: Low]
Does this change miss updating the fallback definition of pgalloc_tag_add()
when memory allocation profiling is disabled?

If the kernel is compiled without mem-alloc profiling, the dummy function
signature appears to still use the old type:

static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
                   unsigned int nr, gfp_t gfp_flags) {}

Since callers like post_alloc_hook() are now passing an unsigned int
for alloc_flags, won't this cause a Sparse type mismatch warning because
gfp_t is a restricted bitwise type?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com?part=15

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
  2026-06-29 15:56   ` [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG sashiko-bot
@ 2026-06-30  4:34     ` Hao Ge
  0 siblings, 0 replies; 31+ messages in thread
From: Hao Ge @ 2026-06-30  4:34 UTC (permalink / raw)
  To: sashiko-reviews, Brendan Jackman
  Cc: linux-kernel, linux-rt-devel, Andrew Morton, Suren Baghdasaryan


CC Andrew and Suren


On 2026/6/29 23:56, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> New issues:
> - [Low] The fallback `#else` definition of `pgalloc_tag_add` was not updated to accept `unsigned int alloc_flags`.
>
> Pre-existing issues:
> - [Critical] TOCTOU race in `__alloc_tag_add_early_pfn` leading to an out-of-bounds array write.
> --
>
> commit ccee7f6d4bcf9d00da38c878e97dd852c5e649b3
> Author: Brendan Jackman <jackmanb@google.com>
>
> mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
>
> This commit replaces the mm-internal __GFP_NO_CODETAG flag with a
> new alloc_flags bit, ALLOC_NO_CODETAG. It plumbs alloc_flags through
> several allocator functions to allow passing this flag directly to the
> alloc_tag code.
>
>> diff --git a/mm/alloc_tag.c b/mm/alloc_tag.c
>> index d9be1cf5187d9..a32a94e759b94 100644
>> --- a/mm/alloc_tag.c
>> +++ b/mm/alloc_tag.c
> [ ... ]
>> @@ -806,7 +795,8 @@ static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
>>   	do {
>>   		pool = READ_ONCE(current_pfn_pool);
>>   		if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
>> -			struct page *new_page = alloc_page(__GFP_HIGH | __GFP_NO_CODETAG);
>> +			struct page *new_page = __alloc_pages(__GFP_HIGH, 0, numa_mem_id(),
>> +							      NULL, ALLOC_NO_CODETAG);
>>   			struct pfn_pool *new;
> [Severity: Critical]
> This is a pre-existing issue, but is there a time-of-check to time-of-use
> race condition here?
>
> If multiple CPUs are active during the deferred init window, could they
> concurrently observe the same remaining capacity?
>
> CPU1:
>      if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE)
>
> CPU2:
>      if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE)
>
> If both see 1 remaining slot, won't they both proceed to call
> atomic_fetch_add() on the pool counter further down in this function?
> This could cause an out-of-bounds array write at pool->pfns[PFN_POOL_SIZE].


It's interesting.

I didn't get this warning from Sashiko when I submitted the patch earlier,

I'm not sure what changed on its end to trigger it now.

No worries though, unless I've overlooked some detail,

I believe this is a false positive


Even if both CPUs see 1 remaining slot and both skip the pool-growing 
branch,

they still have to go through atomic_cmpxchg to actually claim it.

static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
{
     struct pfn_pool *pool;
     int idx;

     do {
         pool = READ_ONCE(current_pfn_pool);
         if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
         .........
         }
         idx = atomic_read(&pool->count);
         if (idx >= PFN_POOL_SIZE)
             continue;
         if (atomic_cmpxchg(&pool->count, idx, idx + 1) == idx)
             break;
     } while (1);

     pool->pfns[idx] = pfn;
}


Only one wins, the other gets a CAS failure and loops back.

There's no path where both write to pfns[PFN_POOL_SIZE].


Given that, I think we can ignore this Sashiko report and

move forward with Brendan's patch.


Thanks

Best Regards

Hao


> [ ... ]
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 026f33f217036..803b32e5a5e47 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
> [ ... ]
>> @@ -1263,17 +1263,17 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
>>   		alloc_tag_add_early_pfn(page_to_pfn(page), alloc_flags);
>>   		if (task->alloc_tag)
>>   			alloc_tag_set_inaccurate(task->alloc_tag);
>>   	}
>>   }
>>   
>>   static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>> -				   unsigned int nr, gfp_t gfp_flags)
>> +				   unsigned int nr, unsigned int alloc_flags)
>>   {
>>   	if (mem_alloc_profiling_enabled())
>> -		__pgalloc_tag_add(page, task, nr, gfp_flags);
>> +		__pgalloc_tag_add(page, task, nr, alloc_flags);
>>   }
> [Severity: Low]
> Does this change miss updating the fallback definition of pgalloc_tag_add()
> when memory allocation profiling is disabled?
>
> If the kernel is compiled without mem-alloc profiling, the dummy function
> signature appears to still use the old type:
>
> static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>                     unsigned int nr, gfp_t gfp_flags) {}
>
> Since callers like post_alloc_hook() are now passing an unsigned int
> for alloc_flags, won't this cause a Sparse type mismatch warning because
> gfp_t is a restricted bitwise type?
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
       [not found] ` <20260629-alloc-trylock-v3-15-57bef0eadbc2@google.com>
  2026-06-29 15:56   ` [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG sashiko-bot
@ 2026-06-30  1:55   ` Hao Ge
  2026-06-30 10:10     ` Brendan Jackman
  2026-06-30 12:01     ` Brendan Jackman
  1 sibling, 2 replies; 31+ messages in thread
From: Hao Ge @ 2026-06-30  1:55 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, linux-mm, linux-kernel, linux-rt-devel,
	Vlastimil Babka, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, David Hildenbrand,
	Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

Hi Brendan


On 2026/6/29 21:12, Brendan Jackman wrote:
> Now that alloc_pages has an entrypoint that allows passing alloc_flags,
> we can take advantage of this to start removing GFP flags that are only
> used for mm-internal stuff.
>
> This requires also plumbing the alloc_flags into some more of the
> allocator code, in particular __alloc_pages[_noprof]() gets an
> alloc_flags arg to go along with its callees, and we now need to pass
> those flags deeper into the allocator so they can reach the alloc_tag
> code.
>
> No functional change intended.
>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>   mm/alloc_tag.c       | 22 ++++++----------------
>   mm/compaction.c      |  4 ++--
>   mm/internal.h        |  1 -
>   mm/page_alloc.c      | 42 ++++++++++++++++++++++++------------------
>   mm/page_alloc.h      | 17 +++++++++++++++--
>   mm/page_frag_cache.c |  4 ++--
>   6 files changed, 49 insertions(+), 41 deletions(-)
>
> diff --git a/mm/alloc_tag.c b/mm/alloc_tag.c
> index d9be1cf5187d9..a32a94e759b94 100644
> --- a/mm/alloc_tag.c
> +++ b/mm/alloc_tag.c
> @@ -15,6 +15,8 @@
>   #include <linux/vmalloc.h>
>   #include <linux/kmemleak.h>
>   
> +#include "internal.h"


Should we include page_alloc.h here, as we call __alloc_pages later in 
this file?


> +
>   #define ALLOCINFO_FILE_NAME		"allocinfo"
>   #define MODULE_ALLOC_TAG_VMAP_SIZE	(100000UL * sizeof(struct alloc_tag))
>   #define SECTION_START(NAME)		(CODETAG_SECTION_START_PREFIX NAME)
> @@ -783,19 +785,6 @@ struct pfn_pool {
>   
>   #define PFN_POOL_SIZE			((PAGE_SIZE - offsetof(struct pfn_pool, pfns)) / \
>   					 sizeof(unsigned long))
> -
> -/*
> - * Skip early PFN recording for a page allocation.  Reuses the
> - * %__GFP_NO_OBJ_EXT bit.  Used by __alloc_tag_add_early_pfn() to avoid
> - * recursion when allocating pages for the early PFN tracking list
> - * itself.
> - *
> - * Codetags of the pages allocated with __GFP_NO_CODETAG should be
> - * cleared (via clear_page_tag_ref()) before freeing the pages to prevent
> - * alloc_tag_sub_check() from triggering a warning.
> - */
> -#define __GFP_NO_CODETAG		__GFP_NO_OBJ_EXT
> -
>   static struct pfn_pool *current_pfn_pool __initdata;
>   
>   static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
> @@ -806,7 +795,8 @@ static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
>   	do {
>   		pool = READ_ONCE(current_pfn_pool);
>   		if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
> -			struct page *new_page = alloc_page(__GFP_HIGH | __GFP_NO_CODETAG);
> +			struct page *new_page = __alloc_pages(__GFP_HIGH, 0, numa_mem_id(),
> +							      NULL, ALLOC_NO_CODETAG);
>   			struct pfn_pool *new;
>   
>   			if (!new_page) {
> @@ -837,7 +827,7 @@ typedef void alloc_tag_add_func(unsigned long pfn);
>   static alloc_tag_add_func __rcu *alloc_tag_add_early_pfn_ptr __refdata =
>   	RCU_INITIALIZER(__alloc_tag_add_early_pfn);
>   
> -void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
> +void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags)


alloc_tag_add_early_pfn() has three occurrences across the codebase:

1. Definition in mm/alloc_tag.c:830:

void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags)

2. Declaration in include/linux/alloc_tag.h:166:

void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)

3. Static inline stub in include/linux/alloc_tag.h:170:

static inline void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t 
gfp_flags) {}

This patch updates the definition in alloc_tag.c to take unsigned int 
alloc_flags,

but the two declarations in alloc_tag.h are left with the old gfp_t 
gfp_flags signature

These should be updated to match.


>   {
>   	alloc_tag_add_func *alloc_tag_add;
>   
> @@ -845,7 +835,7 @@ void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
>   		return;
>   
>   	/* Skip allocations for the tracking list itself to avoid recursion. */
> -	if (gfp_flags & __GFP_NO_CODETAG)
> +	if (alloc_flags & ALLOC_NO_CODETAG)
>   		return;
>   
>   	rcu_read_lock();
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7d80735502d9a..4b2318fad4eb5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -83,7 +83,7 @@ static inline bool is_via_compact_memory(int order) { return false; }
>   
>   static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
>   {
> -	post_alloc_hook(page, order, __GFP_MOVABLE);
> +	post_alloc_hook(page, order, __GFP_MOVABLE, ALLOC_DEFAULT);
>   	set_page_refcounted(page);
>   	return page;
>   }
> @@ -1851,7 +1851,7 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
>   	}
>   	dst = (struct folio *)freepage;
>   
> -	post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
> +	post_alloc_hook(&dst->page, order, __GFP_MOVABLE, ALLOC_DEFAULT);
>   	set_page_refcounted(&dst->page);
>   	if (order)
>   		prep_compound_page(&dst->page, order);
> diff --git a/mm/internal.h b/mm/internal.h
> index c22284f04fc9e..369c656c63fa8 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1237,7 +1237,6 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>   enum ttu_flags;
>   struct tlbflush_unmap_batch;
>   
> -
>   /*
>    * only for MM internal work items which do not depend on
>    * any allocations or locks which might depend on allocations
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 026f33f217036..803b32e5a5e47 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1249,7 +1249,7 @@ void __clear_page_tag_ref(struct page *page)
>   /* Should be called only if mem_alloc_profiling_enabled() */
>   static noinline
>   void __pgalloc_tag_add(struct page *page, struct task_struct *task,
> -		       unsigned int nr, gfp_t gfp_flags)
> +		       unsigned int nr, unsigned int alloc_flags)
>   {
>   	union pgtag_ref_handle handle;
>   	union codetag_ref ref;
> @@ -1263,17 +1263,17 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
>   		 * page_ext is not available yet, record the pfn so we can
>   		 * clear the tag ref later when page_ext is initialized.
>   		 */
> -		alloc_tag_add_early_pfn(page_to_pfn(page), gfp_flags);
> +		alloc_tag_add_early_pfn(page_to_pfn(page), alloc_flags);
>   		if (task->alloc_tag)
>   			alloc_tag_set_inaccurate(task->alloc_tag);
>   	}
>   }
>   
>   static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
> -				   unsigned int nr, gfp_t gfp_flags)
> +				   unsigned int nr, unsigned int alloc_flags)


Same situation as alloc_tag_add_early_pfn() — the #else stub

at mm/page_alloc.c:1309 still uses gfp_t gfp_flags instead of unsigned int

alloc_flags.


>   {
>   	if (mem_alloc_profiling_enabled())
> -		__pgalloc_tag_add(page, task, nr, gfp_flags);
> +		__pgalloc_tag_add(page, task, nr, alloc_flags);
>   }
>   
>   /* Should be called only if mem_alloc_profiling_enabled() */
> @@ -1810,7 +1810,7 @@ static inline bool should_skip_init(gfp_t flags)
>   }
>   
>   inline void post_alloc_hook(struct page *page, unsigned int order,
> -				gfp_t gfp_flags)
> +				gfp_t gfp_flags, unsigned int alloc_flags)
>   {
>   	const bool zero_tags = gfp_flags & __GFP_ZEROTAGS;
>   	bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
> @@ -1861,13 +1861,13 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
>   
>   	set_page_owner(page, order, gfp_flags);
>   	page_table_check_alloc(page, order);
> -	pgalloc_tag_add(page, current, 1 << order, gfp_flags);
> +	pgalloc_tag_add(page, current, 1 << order, alloc_flags);
>   }
>   
>   static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
>   							unsigned int alloc_flags)
>   {
> -	post_alloc_hook(page, order, gfp_flags);
> +	post_alloc_hook(page, order, gfp_flags, alloc_flags);
>   
>   	if (order && (gfp_flags & __GFP_COMP))
>   		prep_compound_page(page, order);
> @@ -4791,8 +4791,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>   	 * The fast path uses conservative alloc_flags to succeed only until
>   	 * kswapd needs to be woken up, and to avoid the cost of setting up
>   	 * alloc_flags precisely. So we do that now.
> +	 *
> +	 * Can't just or alloc_flags if it contains WMARK bits, but those flags
> +	 * shouldn't be set in ac->alloc_flags.
>   	 */
> -	alloc_flags = alloc_flags_slowpath(gfp_mask, order);
> +	VM_WARN_ON(ac->alloc_flags & ALLOC_WMARK_MASK);
> +	alloc_flags = ac->alloc_flags | alloc_flags_slowpath(gfp_mask, order);
>   
>   	/*
>   	 * We need to recalculate the starting point for the zonelist iterator
> @@ -4834,7 +4838,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>   	reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
>   	if (reserve_flags)
>   		alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
> -					  (alloc_flags & ALLOC_KSWAPD);
> +				ac->alloc_flags | (alloc_flags & ALLOC_KSWAPD);
>   
>   	/*
>   	 * Reset the nodemask and zonelist iterators if memory policies can be
> @@ -5236,7 +5240,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
>   	return nr_populated;
>   
>   failed:
> -	page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask);
> +	page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask, ALLOC_DEFAULT);
>   	if (page)
>   		page_array[nr_populated++] = page;
>   	goto out;
> @@ -5344,11 +5348,13 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>   {
>   	struct page *page;
>   	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> -	struct alloc_context ac = { };
> +	struct alloc_context ac = {
> +		.alloc_flags = alloc_flags,
> +	};
>   	unsigned int fastpath_alloc_flags = alloc_flags;
>   
>   	/* Other flags could be supported later if needed. */
> -	if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
> +	if (WARN_ON(alloc_flags & ~(ALLOC_NOLOCK | ALLOC_NO_CODETAG)))
>   		return NULL;
>   
>   	if (!alloc_order_allowed(gfp, order, alloc_flags))
> @@ -5417,12 +5423,12 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>   EXPORT_SYMBOL(__alloc_frozen_pages_noprof);
>   
>   struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
> -		int preferred_nid, nodemask_t *nodemask)
> +		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
>   {
>   	struct page *page;
>   
>   	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
> -					   ALLOC_DEFAULT);
> +					   alloc_flags);
>   	if (page)
>   		set_page_refcounted(page);
>   	return page;
> @@ -5436,7 +5442,7 @@ struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order
>   	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>   	warn_if_node_offline(nid, gfp_mask);
>   
> -	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
> +	return __alloc_pages_noprof(gfp_mask, order, nid, NULL, ALLOC_DEFAULT);
>   }
>   EXPORT_SYMBOL(alloc_pages_node_noprof);
>   
> @@ -5444,7 +5450,7 @@ struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_
>   		nodemask_t *nodemask)
>   {
>   	struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
> -					preferred_nid, nodemask);
> +					preferred_nid, nodemask, ALLOC_DEFAULT);
>   	return page_rmappable_folio(page);
>   }
>   EXPORT_SYMBOL(__folio_alloc_noprof);
> @@ -7126,7 +7132,7 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
>   		list_for_each_entry_safe(page, next, &list[order], lru) {
>   			int i;
>   
> -			post_alloc_hook(page, order, gfp_mask);
> +			post_alloc_hook(page, order, gfp_mask, ALLOC_DEFAULT);
>   			if (!order)
>   				continue;
>   
> @@ -7331,7 +7337,7 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
>   		struct page *head = pfn_to_page(start);
>   
>   		check_new_pages(head, order);
> -		prep_new_page(head, order, gfp_mask, 0);
> +		prep_new_page(head, order, gfp_mask, ALLOC_DEFAULT);
>   	} else {
>   		ret = -EINVAL;
>   		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
> diff --git a/mm/page_alloc.h b/mm/page_alloc.h
> index 2058cbdca56e7..2614bff6795b0 100644
> --- a/mm/page_alloc.h
> +++ b/mm/page_alloc.h
> @@ -49,6 +49,16 @@
>   #define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>   #define ALLOC_NOLOCK		0x400 /* Only use spin_trylock in allocation path */
>   #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
> +/*
> + * Skip early PFN recording for a page allocation.  Used by
> + * __alloc_tag_add_early_pfn() to avoid recursion when allocating pages for the
> + * early PFN tracking list itself.
> + *
> + * Codetags of the pages allocated with __GFP_NO_CODETAG should be
> + * cleared (via clear_page_tag_ref()) before freeing the pages to prevent
> + * alloc_tag_sub_check() from triggering a warning.
> + */


I originally wrote this lengthy comment because the logic lives inside 
alloc_tag.c.

I wanted to document all the context to avoid confusion when revisiting 
this code later on.

We've since replaced __GFP_NO_CODETAG with ALLOC_NO_CODETAG, a generic 
alloc_flags bit defined in page_alloc.h.

The original long comment is no longer accurate:

Given that, I suggest updating it to the following:

/*

  * Avoid alloc_tag recursion for internal allocations.

  * Callers must clear_page_tag_ref() before
  * freeing to avoid warnings from alloc_tag_sub_check().

  */


Thanks

Best Regards

Hao


> +#define ALLOC_NO_CODETAG       0x1000
>   
>   /* Flags that allow allocations below the min watermark. */
>   #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
> @@ -84,6 +94,8 @@ struct alloc_context {
>   	 */
>   	enum zone_type highest_zoneidx;
>   	bool spread_dirty_pages;
> +	/* Only flags that are global to the whole allocation go here. */
> +	unsigned int alloc_flags;
>   };
>   
>   /*
> @@ -214,7 +226,8 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>   extern void __free_pages_core(struct page *page, unsigned int order,
>   		enum meminit_context context);
>   
> -void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
> +void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
> +		     unsigned int alloc_flags);
>   extern bool free_pages_prepare(struct page *page, unsigned int order);
>   
>   extern int user_min_free_kbytes;
> @@ -245,7 +258,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
>   void free_frozen_pages_nolock(struct page *page, unsigned int order);
>   
>   struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
> -		nodemask_t *nodemask);
> +		nodemask_t *nodemask, unsigned int alloc_flags);
>   #define __alloc_pages(...)			alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
>   
>   extern void zone_pcp_reset(struct zone *zone);
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index a1077cef3a791..e63efe78b7d4b 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -57,10 +57,10 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>   	gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
>   		   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>   	page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
> -			     numa_mem_id(), NULL);
> +			     numa_mem_id(), NULL, ALLOC_DEFAULT);
>   #endif
>   	if (unlikely(!page)) {
> -		page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
> +		page = __alloc_pages(gfp, 0, numa_mem_id(), NULL, ALLOC_DEFAULT);
>   		order = 0;
>   	}
>   
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
  2026-06-30  1:55   ` Hao Ge
@ 2026-06-30 10:10     ` Brendan Jackman
  2026-06-30 12:01     ` Brendan Jackman
  1 sibling, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 10:10 UTC (permalink / raw)
  To: Hao Ge, Brendan Jackman
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, linux-mm, linux-kernel, linux-rt-devel,
	Vlastimil Babka, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, David Hildenbrand,
	Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Tue Jun 30, 2026 at 1:55 AM UTC, Hao Ge wrote:
> Hi Brendan
>
>
> On 2026/6/29 21:12, Brendan Jackman wrote:
>> Now that alloc_pages has an entrypoint that allows passing alloc_flags,
>> we can take advantage of this to start removing GFP flags that are only
>> used for mm-internal stuff.
>>
>> This requires also plumbing the alloc_flags into some more of the
>> allocator code, in particular __alloc_pages[_noprof]() gets an
>> alloc_flags arg to go along with its callees, and we now need to pass
>> those flags deeper into the allocator so they can reach the alloc_tag
>> code.
>>
>> No functional change intended.
>>
>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>> ---
>>   mm/alloc_tag.c       | 22 ++++++----------------
>>   mm/compaction.c      |  4 ++--
>>   mm/internal.h        |  1 -
>>   mm/page_alloc.c      | 42 ++++++++++++++++++++++++------------------
>>   mm/page_alloc.h      | 17 +++++++++++++++--
>>   mm/page_frag_cache.c |  4 ++--
>>   6 files changed, 49 insertions(+), 41 deletions(-)
>>
>> diff --git a/mm/alloc_tag.c b/mm/alloc_tag.c
>> index d9be1cf5187d9..a32a94e759b94 100644
>> --- a/mm/alloc_tag.c
>> +++ b/mm/alloc_tag.c
>> @@ -15,6 +15,8 @@
>>   #include <linux/vmalloc.h>
>>   #include <linux/kmemleak.h>
>>   
>> +#include "internal.h"
>
>
> Should we include page_alloc.h here, as we call __alloc_pages later in 
> this file?

Yeah, there are a few build failures due to me not doing a broad enough
build. From now on I will just wait for allmodconfig instead of trying
to be clever with my build tests, sorry about this.

Also, this suggests that I have not actually re-tested the alloc_tag
code since v3 so I must repeat the test described in my cover letter (I
just manually enable the feature and check the kernel boots) for v4.

>> +
>>   #define ALLOCINFO_FILE_NAME		"allocinfo"
>>   #define MODULE_ALLOC_TAG_VMAP_SIZE	(100000UL * sizeof(struct alloc_tag))
>>   #define SECTION_START(NAME)		(CODETAG_SECTION_START_PREFIX NAME)
>> @@ -783,19 +785,6 @@ struct pfn_pool {
>>   
>>   #define PFN_POOL_SIZE			((PAGE_SIZE - offsetof(struct pfn_pool, pfns)) / \
>>   					 sizeof(unsigned long))
>> -
>> -/*
>> - * Skip early PFN recording for a page allocation.  Reuses the
>> - * %__GFP_NO_OBJ_EXT bit.  Used by __alloc_tag_add_early_pfn() to avoid
>> - * recursion when allocating pages for the early PFN tracking list
>> - * itself.
>> - *
>> - * Codetags of the pages allocated with __GFP_NO_CODETAG should be
>> - * cleared (via clear_page_tag_ref()) before freeing the pages to prevent
>> - * alloc_tag_sub_check() from triggering a warning.
>> - */
>> -#define __GFP_NO_CODETAG		__GFP_NO_OBJ_EXT
>> -
>>   static struct pfn_pool *current_pfn_pool __initdata;
>>   
>>   static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
>> @@ -806,7 +795,8 @@ static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
>>   	do {
>>   		pool = READ_ONCE(current_pfn_pool);
>>   		if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
>> -			struct page *new_page = alloc_page(__GFP_HIGH | __GFP_NO_CODETAG);
>> +			struct page *new_page = __alloc_pages(__GFP_HIGH, 0, numa_mem_id(),
>> +							      NULL, ALLOC_NO_CODETAG);
>>   			struct pfn_pool *new;
>>   
>>   			if (!new_page) {
>> @@ -837,7 +827,7 @@ typedef void alloc_tag_add_func(unsigned long pfn);
>>   static alloc_tag_add_func __rcu *alloc_tag_add_early_pfn_ptr __refdata =
>>   	RCU_INITIALIZER(__alloc_tag_add_early_pfn);
>>   
>> -void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
>> +void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags)
>
>
> alloc_tag_add_early_pfn() has three occurrences across the codebase:
>
> 1. Definition in mm/alloc_tag.c:830:
>
> void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags)
>
> 2. Declaration in include/linux/alloc_tag.h:166:
>
> void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
>
> 3. Static inline stub in include/linux/alloc_tag.h:170:
>
> static inline void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t 
> gfp_flags) {}
>
> This patch updates the definition in alloc_tag.c to take unsigned int 
> alloc_flags,
>
> but the two declarations in alloc_tag.h are left with the old gfp_t 
> gfp_flags signature
>
> These should be updated to match.

Yeah ditto, sorry about this and thanks for the review.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
  2026-06-30  1:55   ` Hao Ge
  2026-06-30 10:10     ` Brendan Jackman
@ 2026-06-30 12:01     ` Brendan Jackman
  1 sibling, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 12:01 UTC (permalink / raw)
  To: Hao Ge, Brendan Jackman
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, linux-mm, linux-kernel, linux-rt-devel,
	Vlastimil Babka, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, David Hildenbrand,
	Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Tue Jun 30, 2026 at 1:55 AM UTC, Hao Ge wrote:
>> +/*
>> + * Skip early PFN recording for a page allocation.  Used by
>> + * __alloc_tag_add_early_pfn() to avoid recursion when allocating pages for the
>> + * early PFN tracking list itself.
>> + *
>> + * Codetags of the pages allocated with __GFP_NO_CODETAG should be
>> + * cleared (via clear_page_tag_ref()) before freeing the pages to prevent
>> + * alloc_tag_sub_check() from triggering a warning.
>> + */
>
>
> I originally wrote this lengthy comment because the logic lives inside 
> alloc_tag.c.
>
> I wanted to document all the context to avoid confusion when revisiting 
> this code later on.
>
> We've since replaced __GFP_NO_CODETAG with ALLOC_NO_CODETAG, a generic 
> alloc_flags bit defined in page_alloc.h.
>
> The original long comment is no longer accurate:
>
> Given that, I suggest updating it to the following:
>
> /*
>
>   * Avoid alloc_tag recursion for internal allocations.
>
>   * Callers must clear_page_tag_ref() before
>   * freeing to avoid warnings from alloc_tag_sub_check().
>
>   */

Thanks for the context, pasting this in for v4!

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-1-57bef0eadbc2@google.com>]

* Re: [PATCH v3 01/16] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
       [not found] ` <20260629-alloc-trylock-v3-1-57bef0eadbc2@google.com>
@ 2026-06-30 12:27   ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 12:27 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> It's confusing that the function is called "nolock" but the flag is
> called "trylock", align them.
> 
> The function's terminology is more visible and has more mindshare so use that.
> 
> Suggested-by: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
> Link: https://lore.kernel.org/linux-mm/2399b3ad-4eac-4a14-94c3-27e9f07972a1@kernel.org/
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-2-57bef0eadbc2@google.com>]

* Re: [PATCH v3 02/16] mm/page_alloc: some renames to clarify alloc_flags scopes
       [not found] ` <20260629-alloc-trylock-v3-2-57bef0eadbc2@google.com>
@ 2026-06-30 12:38   ` Vlastimil Babka (SUSE)
  2026-06-30 17:25     ` Brendan Jackman
  0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 12:38 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> It's pretty confusing that:
> 
> - The slowpath and fastpath have a totally distinct set of alloc_flags.
> 
> - gfp_to_alloc_flags() sounds generic but it only influences the
>   slowpath.
> 
> Rename some variables to highlight which alloc_flags are
> fastpath-specific. Rename gfp_to_alloc_flags() to highlight that it's
> slowpath-specific.
> 
> gfp_to_alloc_flags_cma() and gfp_to_alloc_flags_nonblocking() currently
> have perfectly harmless names, but to keep the naming consistent also
> rename those to the alloc_flags_*() pattern (which already exists for
> alloc_flags_nofragment()).

How annoying that alloc_flags_nofragment() doesn't have gfp as the first
parameter, unlike others.
Oh well, must resist too much OCD :)

Uh, more annoyingly, alloc_flags_cma() takes alloc_flags and returns
augmented alloc flags, so there's stuff like

*alloc_flags = alloc_flags_cma(gfp_mask, *alloc_flags);

Since we're unifying, it could be make to work additively like others? Then:

*alloc_flags |= alloc_flags_cma(gfp_mask);

> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Otherwise, LGTM.
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 02/16] mm/page_alloc: some renames to clarify alloc_flags scopes
  2026-06-30 12:38   ` [PATCH v3 02/16] mm/page_alloc: some renames to clarify alloc_flags scopes Vlastimil Babka (SUSE)
@ 2026-06-30 17:25     ` Brendan Jackman
  0 siblings, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 17:25 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Brendan Jackman, Andrew Morton,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On Tue Jun 30, 2026 at 12:38 PM UTC, Vlastimil Babka (SUSE) wrote:
> On 6/29/26 15:11, Brendan Jackman wrote:
>> It's pretty confusing that:
>> 
>> - The slowpath and fastpath have a totally distinct set of alloc_flags.
>> 
>> - gfp_to_alloc_flags() sounds generic but it only influences the
>>   slowpath.
>> 
>> Rename some variables to highlight which alloc_flags are
>> fastpath-specific. Rename gfp_to_alloc_flags() to highlight that it's
>> slowpath-specific.
>> 
>> gfp_to_alloc_flags_cma() and gfp_to_alloc_flags_nonblocking() currently
>> have perfectly harmless names, but to keep the naming consistent also
>> rename those to the alloc_flags_*() pattern (which already exists for
>> alloc_flags_nofragment()).
>
> How annoying that alloc_flags_nofragment() doesn't have gfp as the first
> parameter, unlike others.
> Oh well, must resist too much OCD :)
>
> Uh, more annoyingly, alloc_flags_cma() takes alloc_flags and returns
> augmented alloc flags, so there's stuff like
>
> *alloc_flags = alloc_flags_cma(gfp_mask, *alloc_flags);
>
> Since we're unifying, it could be make to work additively like others? Then:
>
> *alloc_flags |= alloc_flags_cma(gfp_mask);

Sure, I can chuck this on as an extra patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-3-57bef0eadbc2@google.com>]

* Re: [PATCH v3 03/16] mm: name some args in a function declaration
       [not found] ` <20260629-alloc-trylock-v3-3-57bef0eadbc2@google.com>
@ 2026-06-30 12:43   ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 12:43 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> Checkpatch complains about this, a later patch will move the code, fix
> it so that checkpatch doesn't complain about that patch. Do it in a
> separate patch so the "move the code" patch is trivial to review using
> Git's diff colouring.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> ---
>  mm/internal.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 2237eee030cba..8ce59c5664497 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -919,8 +919,8 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
>  
>  extern int user_min_free_kbytes;
>  
> -struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
> -		nodemask_t *);
> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
> +		nodemask_t *nodemask);
>  #define __alloc_frozen_pages(...) \
>  	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
>  void free_frozen_pages(struct page *page, unsigned int order);
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-5-57bef0eadbc2@google.com>]

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
       [not found] ` <20260629-alloc-trylock-v3-5-57bef0eadbc2@google.com>
@ 2026-06-29 14:29   ` sashiko-bot
  2026-06-29 15:27     ` Brendan Jackman
  2026-06-30 13:36   ` Harry Yoo
  2026-06-30 16:16   ` Vlastimil Babka (SUSE)
  2 siblings, 1 reply; 31+ messages in thread
From: sashiko-bot @ 2026-06-29 14:29 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Spurious allocation failures for ALLOC_NOLOCK on fragmented systems
--

commit e9001ddaf8c952ebe08e5042f720b09e0b4a5da8
Author: Brendan Jackman <jackmanb@google.com>

mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()

This commit unifies the core allocator entry points by pulling the ALLOC_NOLOCK
control up into __alloc_frozen_pages_noprof(). It turns the nolock variant
into a thin wrapper that just sets the ALLOC_NOLOCK flag and handles
NUMA_NO_NODE.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a3ba63c7f9199..8d409d075e3e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c

[ ... ]

> @@ -5310,9 +5384,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  	fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
>  	fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
>  
> -	/* First allocation attempt */
> +	/* First allocation attempt (or, for nolock, only attempt) */
>  	page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
> -	if (likely(page))
> +	if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
>  		goto out;

[Severity: High]
Could this lead to spurious allocation failures for ALLOC_NOLOCK requests
on fragmented systems?

In the pre-patch code, alloc_frozen_pages_nolock_noprof() bypassed
alloc_flags_nofragment(), allowing it to successfully steal pages from
other migratetypes when its preferred migratetype was empty.

The new unified __alloc_frozen_pages_noprof() fastpath unconditionally
applies the ALLOC_NOFRAGMENT policy to fastpath_alloc_flags. When the
requested migratetype is empty, __rmqueue() respects this flag by
skipping __rmqueue_steal() and returning NULL.

Since normal allocations fall back to the slowpath (which clears
ALLOC_NOFRAGMENT and retries), they succeed. However, ALLOC_NOLOCK
explicitly bails out after the first attempt here, permanently failing the
allocation. This might cause systems using bpf or tracing facilities under
memory fragmentation to silently drop allocations when hit from atomic
contexts.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com?part=5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-29 14:29   ` [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() sashiko-bot
@ 2026-06-29 15:27     ` Brendan Jackman
  0 siblings, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-29 15:27 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-rt-devel, linux-kernel

On Mon, 29 Jun 2026 at 16:29, <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Spurious allocation failures for ALLOC_NOLOCK on fragmented systems
> --
>
> commit e9001ddaf8c952ebe08e5042f720b09e0b4a5da8
> Author: Brendan Jackman <jackmanb@google.com>
>
> mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
>
> This commit unifies the core allocator entry points by pulling the ALLOC_NOLOCK
> control up into __alloc_frozen_pages_noprof(). It turns the nolock variant
> into a thin wrapper that just sets the ALLOC_NOLOCK flag and handles
> NUMA_NO_NODE.
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a3ba63c7f9199..8d409d075e3e9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
>
> [ ... ]
>
> > @@ -5310,9 +5384,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> >       fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
> >       fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
> >
> > -     /* First allocation attempt */
> > +     /* First allocation attempt (or, for nolock, only attempt) */
> >       page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
> > -     if (likely(page))
> > +     if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
> >               goto out;
>
> [Severity: High]
> Could this lead to spurious allocation failures for ALLOC_NOLOCK requests
> on fragmented systems?
>
> In the pre-patch code, alloc_frozen_pages_nolock_noprof() bypassed
> alloc_flags_nofragment(), allowing it to successfully steal pages from
> other migratetypes when its preferred migratetype was empty.

Oh, yep. Given "no functional change intended" this should not set
ALLOC_NOFRAGMENT when ac.alloc_flags has ALLOC_NOLOCK.

(Is this intentional? Do we really want to allow "opportunistic"
allocations, which are very much allowed to fail, to fragment memory?
It doesn't look like this was discussed at when the nolock
functionality was added).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
       [not found] ` <20260629-alloc-trylock-v3-5-57bef0eadbc2@google.com>
  2026-06-29 14:29   ` [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() sashiko-bot
@ 2026-06-30 13:36   ` Harry Yoo
  2026-06-30 15:34     ` Vlastimil Babka (SUSE)
  2026-06-30 17:04     ` Brendan Jackman
  2026-06-30 16:16   ` Vlastimil Babka (SUSE)
  2 siblings, 2 replies; 31+ messages in thread
From: Harry Yoo @ 2026-06-30 13:36 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Gregory Price, Alexei Starovoitov, Matthew Wilcox, Hao Ge,
	linux-mm, linux-kernel, linux-rt-devel


[-- Attachment #1.1: Type: text/plain, Size: 4198 bytes --]



On 6/29/26 10:11 PM, Brendan Jackman wrote:
> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
> main entry point function is significantly different from the normal
> __alloc_frozen_pages_nolock(), this is tiring when reading the code.
> 
> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
> exposed to mm/) and then turn the nolock variant into a thin wrapper
> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
> how some of the wrappers in gfp.h do).
> 
> Rationale that this doesn't change anything:
>
> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>    gfp_trylock.

Right.

> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>    previously in the nolock variant:
> 
>    a. Application of gfp_allowed_mask; this only affects early boot, and
>       only flags that affect the slowpath get changed here.

gfp_allowed_mask clears __GFP_RECLAIM, and that means now allocations
with GFP_KERNEL during early boot would see
gfpflags_allow_spinning() = false.

The helper is not used in in the page allocator, but used in
memcg/stackdepot/page_owner.

>    b. Application of current_gfp_context() - also only affects the
>       slowpath

PF_MEMALLOC_PIN affects the fast path, but ALLOC_NOLOCK users
won't be affected.

What about alloc_flags_nofragment/nonblocking()?

> 3. The slowpath itself: this is now just explicitly skipped under
>    !ALLOC_TRYLOCK.

Right.

> Ulterior motive: adding an alloc_flags arg to the allocator's
> mm-internal entrypoint can later be used to do more allocation
> customisation without needing to create new GFP flags.
> 
> While adding this flag to a bunch of places, create ALLOC_DEFAULT to
> avoid a mysterious literal 0 in most places.
>
> alloc_frozen_pages_noprof() is defined above the alloc flags

The function is defined below the alloc flags, no?

> so just leave that as a slightly messy
> exception instead of trying to fully reorder mm/internal.h for that one
> case.
> 
> No functional change intended.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  mm/hugetlb.c    |   3 +-
>  mm/mempolicy.c  |  10 ++--
>  mm/page_alloc.c | 178 +++++++++++++++++++++++++++++---------------------------
>  mm/page_alloc.h |   6 +-
>  mm/slub.c       |   6 +-
>  5 files changed, 108 insertions(+), 95 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a3ba63c7f9199..8d409d075e3e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5271,24 +5271,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>  	}
>  }
>  
> +static inline bool alloc_trylock_allowed(void)
> +{
> +	/*
> +	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> +	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> +	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> +	 * mark the task as the owner of another rt_spin_lock which will
> +	 * confuse PI logic, so return immediately if called from hard IRQ or
> +	 * NMI.
> +	 *
> +	 * Note, irqs_disabled() case is ok. This function can be called
> +	 * from raw_spin_lock_irqsave region.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> +		return false;
> +
> +	/* On UP, spin_trylock() always succeeds even when it is locked */
> +	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> +		return false;

Except for deferred_pages_enabled(), it's not specific to the page
allocator. SLUB has

	/*
	 * See the comment for the same check in
	 * alloc_frozen_pages_nolock_noprof()
	 */

... and repeats the same thing as above.

Perhaps let's factor it out into a helper
rather than trying not to forget to update the other place?

> +	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> +	if (deferred_pages_enabled())
> +		return false;
> +
> +	return true;
> +}


-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-30 13:36   ` Harry Yoo
@ 2026-06-30 15:34     ` Vlastimil Babka (SUSE)
  2026-06-30 16:56       ` Brendan Jackman
  2026-06-30 17:04     ` Brendan Jackman
  1 sibling, 1 reply; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 15:34 UTC (permalink / raw)
  To: Harry Yoo, Brendan Jackman, Andrew Morton, Suren Baghdasaryan,
	Michal Hocko, Johannes Weiner, Zi Yan, Muchun Song,
	Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Gregory Price, Alexei Starovoitov, Matthew Wilcox, Hao Ge,
	linux-mm, linux-kernel, linux-rt-devel

On 6/30/26 15:36, Harry Yoo wrote:
> 
> 
> On 6/29/26 10:11 PM, Brendan Jackman wrote:
>> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
>> main entry point function is significantly different from the normal
>> __alloc_frozen_pages_nolock(), this is tiring when reading the code.
>> 
>> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
>> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
>> exposed to mm/) and then turn the nolock variant into a thin wrapper
>> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
>> how some of the wrappers in gfp.h do).
>> 
>> Rationale that this doesn't change anything:
>>
>> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>>    gfp_trylock.
> 
> Right.
> 
>> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>>    previously in the nolock variant:
>> 
>>    a. Application of gfp_allowed_mask; this only affects early boot, and
>>       only flags that affect the slowpath get changed here.
> 
> gfp_allowed_mask clears __GFP_RECLAIM, and that means now allocations
> with GFP_KERNEL during early boot would see
> gfpflags_allow_spinning() = false.

Is it a problem though? non-nolock allocations were affected before (the
masking existed for those already) and will be affected now the same, and
_nolock() allocations don't pass __GFP_RECLAIM in the first place, so the
masking can't affect them?

> The helper is not used in in the page allocator, but used in
> memcg/stackdepot/page_owner.
> 
>>    b. Application of current_gfp_context() - also only affects the
>>       slowpath
> 
> PF_MEMALLOC_PIN affects the fast path, but ALLOC_NOLOCK users
> won't be affected.

And it wouldn't be wrong if they were? It only clears __GFP_MOVABLE?

> What about alloc_flags_nofragment/nonblocking()?

ALLOC_NOFRAGMENT due to e.g. defrag_mode could be a problem indeed, if
there's no slowpath. Make ALLOC_NOLOCK override it?

nonblocking() is probably fine?

> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-30 15:34     ` Vlastimil Babka (SUSE)
@ 2026-06-30 16:56       ` Brendan Jackman
  0 siblings, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 16:56 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Harry Yoo, Brendan Jackman, Andrew Morton,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Gregory Price, Alexei Starovoitov, Matthew Wilcox, Hao Ge,
	linux-mm, linux-kernel, linux-rt-devel

On Tue Jun 30, 2026 at 3:34 PM UTC, Vlastimil Babka (SUSE) wrote:
> On 6/30/26 15:36, Harry Yoo wrote:
>> 
>> 
>> On 6/29/26 10:11 PM, Brendan Jackman wrote:
>>> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
>>> main entry point function is significantly different from the normal
>>> __alloc_frozen_pages_nolock(), this is tiring when reading the code.
>>> 
>>> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
>>> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
>>> exposed to mm/) and then turn the nolock variant into a thin wrapper
>>> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
>>> how some of the wrappers in gfp.h do).
>>> 
>>> Rationale that this doesn't change anything:
>>>
>>> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>>>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>>>    gfp_trylock.
>> 
>> Right.
>> 
>>> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>>>    previously in the nolock variant:
>>> 
>>>    a. Application of gfp_allowed_mask; this only affects early boot, and
>>>       only flags that affect the slowpath get changed here.
>> 
>> gfp_allowed_mask clears __GFP_RECLAIM, and that means now allocations
>> with GFP_KERNEL during early boot would see
>> gfpflags_allow_spinning() = false.
>
> Is it a problem though? non-nolock allocations were affected before (the
> masking existed for those already) and will be affected now the same, and
> _nolock() allocations don't pass __GFP_RECLAIM in the first place, so the
> masking can't affect them?

This was my thinking too.

>> The helper is not used in in the page allocator, but used in
>> memcg/stackdepot/page_owner.
>> 
>>>    b. Application of current_gfp_context() - also only affects the
>>>       slowpath
>> 
>> PF_MEMALLOC_PIN affects the fast path, but ALLOC_NOLOCK users
>> won't be affected.
>
> And it wouldn't be wrong if they were? It only clears __GFP_MOVABLE?
>
>> What about alloc_flags_nofragment/nonblocking()?
>
> ALLOC_NOFRAGMENT due to e.g. defrag_mode could be a problem indeed, if
> there's no slowpath. Make ALLOC_NOLOCK override it?

Yeah calling alloc_flags_nofragment() here is a bug in the patch,
and Sashiko also complained: 

https://lore.kernel.org/all/20260629142921.9A05A1F000E9@smtp.kernel.org/

Like I said in the reply to that thread I think maybe we _do_ want to
set ALLOC_NOFRAGMENT for nolock allocations? But, that is a functional
change, it doesn't belong in this series.

> nonblocking() is probably fine?

Yeah, I believe this is fine.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-30 13:36   ` Harry Yoo
  2026-06-30 15:34     ` Vlastimil Babka (SUSE)
@ 2026-06-30 17:04     ` Brendan Jackman
  1 sibling, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 17:04 UTC (permalink / raw)
  To: Harry Yoo, Brendan Jackman, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Gregory Price, Alexei Starovoitov, Matthew Wilcox, Hao Ge,
	linux-mm, linux-kernel, linux-rt-devel

On Tue Jun 30, 2026 at 1:36 PM UTC, Harry Yoo wrote:
>> Ulterior motive: adding an alloc_flags arg to the allocator's
>> mm-internal entrypoint can later be used to do more allocation
>> customisation without needing to create new GFP flags.
>> 
>> While adding this flag to a bunch of places, create ALLOC_DEFAULT to
>> avoid a mysterious literal 0 in most places.
>>
>> alloc_frozen_pages_noprof() is defined above the alloc flags
>
> The function is defined below the alloc flags, no?

Yep this paragraph is stale since I created mm/page_alloc.h, will remove
it.

>> so just leave that as a slightly messy
>> exception instead of trying to fully reorder mm/internal.h for that one
>> case.
>> 
>> No functional change intended.
>> 
>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>> ---
>>  mm/hugetlb.c    |   3 +-
>>  mm/mempolicy.c  |  10 ++--
>>  mm/page_alloc.c | 178 +++++++++++++++++++++++++++++---------------------------
>>  mm/page_alloc.h |   6 +-
>>  mm/slub.c       |   6 +-
>>  5 files changed, 108 insertions(+), 95 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a3ba63c7f9199..8d409d075e3e9 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5271,24 +5271,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>>  	}
>>  }
>>  
>> +static inline bool alloc_trylock_allowed(void)
>> +{
>> +	/*
>> +	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
>> +	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
>> +	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
>> +	 * mark the task as the owner of another rt_spin_lock which will
>> +	 * confuse PI logic, so return immediately if called from hard IRQ or
>> +	 * NMI.
>> +	 *
>> +	 * Note, irqs_disabled() case is ok. This function can be called
>> +	 * from raw_spin_lock_irqsave region.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> +		return false;
>> +
>> +	/* On UP, spin_trylock() always succeeds even when it is locked */
>> +	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
>> +		return false;
>
> Except for deferred_pages_enabled(), it's not specific to the page
> allocator. SLUB has
>
> 	/*
> 	 * See the comment for the same check in
> 	 * alloc_frozen_pages_nolock_noprof()
> 	 */
>
> ... and repeats the same thing as above.
>
> Perhaps let's factor it out into a helper
> rather than trying not to forget to update the other place?

Hm, not sure about this. I think I would say it's a "coincidence" that
these two bits of code look the same? Like, page_alloc.c uses
spin_trylock() so you can't do alloc_pages_nolock() from IRQ on
PREEMPT_RT. slub.c ALSO uses spin_trylock(), so you ALSO can't use
kmalloc_nolock() in those scenarios. But those are two different facts
that just happen to be isomorphic? Putting them into a shared helper
would kinda imply that these are part of a single system with inherently
coupled constraints.

I dunno I'm being a bit of a ponderous philosopher there, I don't have
particularly strong feelings. But I'd lean towards leaving this out of
the patchset since the potential deduplication isn't really related to
the other cleanups anyway.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
       [not found] ` <20260629-alloc-trylock-v3-5-57bef0eadbc2@google.com>
  2026-06-29 14:29   ` [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() sashiko-bot
  2026-06-30 13:36   ` Harry Yoo
@ 2026-06-30 16:16   ` Vlastimil Babka (SUSE)
  2026-06-30 18:47     ` Brendan Jackman
  2 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 16:16 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
> main entry point function is significantly different from the normal

Let's mention it explicitly, alloc_frozen_pages_nolock_noprof().

> __alloc_frozen_pages_nolock(), this is tiring when reading the code.

You mean __alloc_frozen_pages_noprof()?

> 
> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only

Again __alloc_frozen_pages_noprof()

> exposed to mm/) and then turn the nolock variant into a thin wrapper
> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
> how some of the wrappers in gfp.h do).
> 
> Rationale that this doesn't change anything:
> 
> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>    gfp_trylock.

Should be alloc_nolock_allowed() and gfp_nolock

> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>    previously in the nolock variant:
> 
>    a. Application of gfp_allowed_mask; this only affects early boot, and
>       only flags that affect the slowpath get changed here.

As discussed in reply to Harry, I'd mention the flags excluded by
GFP_BOOT_MASK are not usable by _nolock() anyway.

>    b. Application of current_gfp_context() - also only affects the
>       slowpath
> 
> 3. The slowpath itself: this is now just explicitly skipped under
>    !ALLOC_TRYLOCK.

ALLOC_NOLOCK.

> 
> Ulterior motive: adding an alloc_flags arg to the allocator's
> mm-internal entrypoint can later be used to do more allocation
> customisation without needing to create new GFP flags.
> 
> While adding this flag to a bunch of places, create ALLOC_DEFAULT to
> avoid a mysterious literal 0 in most places.


> alloc_frozen_pages_noprof()
> is defined above the alloc flags so just leave that as a slightly messy
> exception instead of trying to fully reorder mm/internal.h for that one
> case.

This no longer applies in v3?

> No functional change intended.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  mm/hugetlb.c    |   3 +-
>  mm/mempolicy.c  |  10 ++--
>  mm/page_alloc.c | 178 +++++++++++++++++++++++++++++---------------------------
>  mm/page_alloc.h |   6 +-
>  mm/slub.c       |   6 +-
>  5 files changed, 108 insertions(+), 95 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f7925624c4d2e..dfcfcfa4715bf 100644

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a3ba63c7f9199..8d409d075e3e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5222,7 +5222,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
>  		}
>  		nr_account++;
>  
> -		prep_new_page(page, 0, gfp, 0);
> +		prep_new_page(page, 0, gfp, ALLOC_DEFAULT);
>  		set_page_refcounted(page);
>  		page_array[nr_populated++] = page;
>  	}
> @@ -5271,24 +5271,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>  	}
>  }
>  
> -/*
> - * This is the 'heart' of the zoned buddy allocator.
> - */
> -struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> -		int preferred_nid, nodemask_t *nodemask)
> +static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
> +				       unsigned int alloc_flags)
>  {
> -	struct page *page;
> -	unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
> -	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> -	struct alloc_context ac = { };
> +	if (alloc_flags & ALLOC_NOLOCK)
> +		return pcp_allowed_order(order);
>  
>  	/*
>  	 * There are several places where we assume that the order value is sane
>  	 * so bail out early if the request is out of bound.
>  	 */
> -	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
> +	return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
> +}
> +
> +static inline bool alloc_trylock_allowed(void)

alloc_nolock_allowed()

> +{
> +	/*
> +	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> +	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> +	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> +	 * mark the task as the owner of another rt_spin_lock which will
> +	 * confuse PI logic, so return immediately if called from hard IRQ or
> +	 * NMI.
> +	 *
> +	 * Note, irqs_disabled() case is ok. This function can be called
> +	 * from raw_spin_lock_irqsave region.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> +		return false;
> +
> +	/* On UP, spin_trylock() always succeeds even when it is locked */
> +	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> +		return false;
> +
> +	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> +	if (deferred_pages_enabled())
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * GFP flags to set for ALLOC_NOLOCK i.e. alloc_pages_nolock().
> + *
> + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> + * is not safe in arbitrary context.
> + *
> + * These two are the conditions for gfpflags_allow_spinning() being true.
> + *
> + * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> + * to warn. Also warn would trigger printk() which is unsafe from
> + * various contexts. We cannot use printk_deferred_enter() to mitigate,
> + * since the running context is unknown.
> + *
> + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> + * is safe in any context. Also zeroing the page is mandatory for
> + * BPF use cases.
> + *
> + * Though __GFP_NOMEMALLOC is not checked in the code path below,
> + * specify it here to highlight that alloc_pages_nolock()
> + * doesn't want to deplete reserves.
> + */
> +static const gfp_t gfp_nolock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
> +				__GFP_COMP;
> +
> +/*
> + * This is the 'heart' of the zoned buddy allocator.
> + */
> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> +		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
> +{
> +	struct page *page;
> +	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> +	struct alloc_context ac = { };
> +	unsigned int fastpath_alloc_flags = alloc_flags;
> +
> +	/* Other flags could be supported later if needed. */
> +	if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
>  		return NULL;
>  
> +	if (!alloc_order_allowed(gfp, order, alloc_flags))
> +		return NULL;
> +
> +	if (alloc_flags & ALLOC_NOLOCK) {
> +		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> +		if (!alloc_trylock_allowed())
> +			return NULL;
> +		gfp |= gfp_nolock;

I think we could do a
		fastpath_alloc_flags |= ALLOC_WMARK_MIN;

to make it explicit, even though it's a no-op (the value is 0) and
alloc_frozen_pages_nolock_noprof() didn't do it.

> +	} else {
> +		fastpath_alloc_flags |= ALLOC_WMARK_LOW;
> +	}
> +
>  	gfp &= gfp_allowed_mask;
>  	/*
>  	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
> @@ -5310,9 +5384,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  	fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
>  	fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
>  
> -	/* First allocation attempt */
> +	/* First allocation attempt (or, for nolock, only attempt) */
>  	page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
> -	if (likely(page))
> +	if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
>  		goto out;
>  
>  	alloc_gfp = gfp;
> @@ -5329,7 +5403,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  out:
>  	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>  	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> -		free_frozen_pages(page, order);
> +		__free_frozen_pages(page, order,
> +				    alloc_flags & ALLOC_NOLOCK ? FPI_TRYLOCK : 0);
>  		page = NULL;
>  	}
>  
> @@ -5345,7 +5420,8 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
>  {
>  	struct page *page;
>  
> -	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
> +	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
> +					   ALLOC_DEFAULT);
>  	if (page)
>  		set_page_refcounted(page);
>  	return page;
> @@ -7875,80 +7951,10 @@ static bool __free_unaccepted(struct page *page)
>  
>  struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
>  {
> -	/*
> -	 * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> -	 * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> -	 * is not safe in arbitrary context.
> -	 *
> -	 * These two are the conditions for gfpflags_allow_spinning() being true.
> -	 *
> -	 * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> -	 * to warn. Also warn would trigger printk() which is unsafe from
> -	 * various contexts. We cannot use printk_deferred_enter() to mitigate,
> -	 * since the running context is unknown.
> -	 *
> -	 * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> -	 * is safe in any context. Also zeroing the page is mandatory for
> -	 * BPF use cases.
> -	 *
> -	 * Though __GFP_NOMEMALLOC is not checked in the code path below,
> -	 * specify it here to highlight that alloc_pages_nolock()
> -	 * doesn't want to deplete reserves.
> -	 */
> -	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
> -			| gfp_flags;
> -	unsigned int alloc_flags = ALLOC_NOLOCK;
> -	struct alloc_context ac = { };
> -	struct page *page;
> -
> -	VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
> -	/*
> -	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> -	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> -	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> -	 * mark the task as the owner of another rt_spin_lock which will
> -	 * confuse PI logic, so return immediately if called from hard IRQ or
> -	 * NMI.
> -	 *
> -	 * Note, irqs_disabled() case is ok. This function can be called
> -	 * from raw_spin_lock_irqsave region.
> -	 */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> -		return NULL;
> -
> -	/* On UP, spin_trylock() always succeeds even when it is locked */
> -	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> -		return NULL;
> -
> -	if (!pcp_allowed_order(order))
> -		return NULL;
> -
> -	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> -	if (deferred_pages_enabled())
> -		return NULL;
> -
>  	if (nid == NUMA_NO_NODE)
>  		nid = numa_node_id();
>  
> -	prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
> -			    &alloc_gfp, &alloc_flags);
> -
> -	/*
> -	 * Best effort allocation from percpu free list.
> -	 * If it's empty attempt to spin_trylock zone->lock.
> -	 */
> -	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
> -
> -	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
> -
> -	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
> -	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
> -		__free_frozen_pages(page, order, FPI_TRYLOCK);
> -		page = NULL;
> -	}
> -	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
> -	kmsan_alloc_page(page, order, alloc_gfp);
> -	return page;
> +	return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_NOLOCK);
>  }
>  /**
>   * alloc_pages_nolock - opportunistic reentrant allocation from any context
> diff --git a/mm/page_alloc.h b/mm/page_alloc.h
> index 3250d44f96457..e16f905f859a7 100644
> --- a/mm/page_alloc.h
> +++ b/mm/page_alloc.h
> @@ -11,6 +11,7 @@
>  #include <linux/nodemask.h>
>  #include <linux/types.h>
>  
> +#define ALLOC_DEFAULT		0
>  /* The ALLOC_WMARK bits are used as an index to zone->watermark */
>  #define ALLOC_WMARK_MIN		WMARK_MIN
>  #define ALLOC_WMARK_LOW		WMARK_LOW
> @@ -219,7 +220,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
>  extern int user_min_free_kbytes;
>  
>  struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
> -		nodemask_t *nodemask);
> +		nodemask_t *nodemask, unsigned int alloc_flags);
>  #define __alloc_frozen_pages(...) \
>  	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
>  void free_frozen_pages(struct page *page, unsigned int order);
> @@ -230,7 +231,8 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
>  #else
>  static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
>  {
> -	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
> +	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL,
> +					   0 /* ALLOC_DEFAULT */);

Can use ALLOC_DEFAULT now.

>  }
>  #endif
>  

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-30 16:16   ` Vlastimil Babka (SUSE)
@ 2026-06-30 18:47     ` Brendan Jackman
  0 siblings, 0 replies; 31+ messages in thread
From: Brendan Jackman @ 2026-06-30 18:47 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Brendan Jackman, Andrew Morton,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On Tue Jun 30, 2026 at 4:16 PM UTC, Vlastimil Babka (SUSE) wrote:
> On 6/29/26 15:11, Brendan Jackman wrote:
>> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
>> main entry point function is significantly different from the normal
>
> Let's mention it explicitly, alloc_frozen_pages_nolock_noprof().
>
>> __alloc_frozen_pages_nolock(), this is tiring when reading the code.
>
> You mean __alloc_frozen_pages_noprof()?
>
>> 
>> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
>> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
>
> Again __alloc_frozen_pages_noprof()
>
>> exposed to mm/) and then turn the nolock variant into a thin wrapper
>> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
>> how some of the wrappers in gfp.h do).
>> 
>> Rationale that this doesn't change anything:
>> 
>> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>>    gfp_trylock.
>
> Should be alloc_nolock_allowed() and gfp_nolock
>
>> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>>    previously in the nolock variant:
>> 
>>    a. Application of gfp_allowed_mask; this only affects early boot, and
>>       only flags that affect the slowpath get changed here.
>
> As discussed in reply to Harry, I'd mention the flags excluded by
> GFP_BOOT_MASK are not usable by _nolock() anyway.
>
>>    b. Application of current_gfp_context() - also only affects the
>>       slowpath
>> 
>> 3. The slowpath itself: this is now just explicitly skipped under
>>    !ALLOC_TRYLOCK.
>
> ALLOC_NOLOCK.
>
>> 
>> Ulterior motive: adding an alloc_flags arg to the allocator's
>> mm-internal entrypoint can later be used to do more allocation
>> customisation without needing to create new GFP flags.
>> 
>> While adding this flag to a bunch of places, create ALLOC_DEFAULT to
>> avoid a mysterious literal 0 in most places.
>
>
>> alloc_frozen_pages_noprof()
>> is defined above the alloc flags so just leave that as a slightly messy
>> exception instead of trying to fully reorder mm/internal.h for that one
>> case.
>
> This no longer applies in v3?
>
>> No functional change intended.
>> 
>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>> ---
>>  mm/hugetlb.c    |   3 +-
>>  mm/mempolicy.c  |  10 ++--
>>  mm/page_alloc.c | 178 +++++++++++++++++++++++++++++---------------------------
>>  mm/page_alloc.h |   6 +-
>>  mm/slub.c       |   6 +-
>>  5 files changed, 108 insertions(+), 95 deletions(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f7925624c4d2e..dfcfcfa4715bf 100644
>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a3ba63c7f9199..8d409d075e3e9 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5222,7 +5222,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
>>  		}
>>  		nr_account++;
>>  
>> -		prep_new_page(page, 0, gfp, 0);
>> +		prep_new_page(page, 0, gfp, ALLOC_DEFAULT);
>>  		set_page_refcounted(page);
>>  		page_array[nr_populated++] = page;
>>  	}
>> @@ -5271,24 +5271,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>>  	}
>>  }
>>  
>> -/*
>> - * This is the 'heart' of the zoned buddy allocator.
>> - */
>> -struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>> -		int preferred_nid, nodemask_t *nodemask)
>> +static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
>> +				       unsigned int alloc_flags)
>>  {
>> -	struct page *page;
>> -	unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
>> -	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
>> -	struct alloc_context ac = { };
>> +	if (alloc_flags & ALLOC_NOLOCK)
>> +		return pcp_allowed_order(order);
>>  
>>  	/*
>>  	 * There are several places where we assume that the order value is sane
>>  	 * so bail out early if the request is out of bound.
>>  	 */
>> -	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
>> +	return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
>> +}
>> +
>> +static inline bool alloc_trylock_allowed(void)
>
> alloc_nolock_allowed()
>
>> +{
>> +	/*
>> +	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
>> +	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
>> +	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
>> +	 * mark the task as the owner of another rt_spin_lock which will
>> +	 * confuse PI logic, so return immediately if called from hard IRQ or
>> +	 * NMI.
>> +	 *
>> +	 * Note, irqs_disabled() case is ok. This function can be called
>> +	 * from raw_spin_lock_irqsave region.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> +		return false;
>> +
>> +	/* On UP, spin_trylock() always succeeds even when it is locked */
>> +	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
>> +		return false;
>> +
>> +	/* Bailout, since _deferred_grow_zone() needs to take a lock */
>> +	if (deferred_pages_enabled())
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * GFP flags to set for ALLOC_NOLOCK i.e. alloc_pages_nolock().
>> + *
>> + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
>> + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
>> + * is not safe in arbitrary context.
>> + *
>> + * These two are the conditions for gfpflags_allow_spinning() being true.
>> + *
>> + * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
>> + * to warn. Also warn would trigger printk() which is unsafe from
>> + * various contexts. We cannot use printk_deferred_enter() to mitigate,
>> + * since the running context is unknown.
>> + *
>> + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
>> + * is safe in any context. Also zeroing the page is mandatory for
>> + * BPF use cases.
>> + *
>> + * Though __GFP_NOMEMALLOC is not checked in the code path below,
>> + * specify it here to highlight that alloc_pages_nolock()
>> + * doesn't want to deplete reserves.
>> + */
>> +static const gfp_t gfp_nolock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
>> +				__GFP_COMP;
>> +
>> +/*
>> + * This is the 'heart' of the zoned buddy allocator.
>> + */
>> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>> +		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
>> +{
>> +	struct page *page;
>> +	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
>> +	struct alloc_context ac = { };
>> +	unsigned int fastpath_alloc_flags = alloc_flags;
>> +
>> +	/* Other flags could be supported later if needed. */
>> +	if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
>>  		return NULL;
>>  
>> +	if (!alloc_order_allowed(gfp, order, alloc_flags))
>> +		return NULL;
>> +
>> +	if (alloc_flags & ALLOC_NOLOCK) {
>> +		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
>> +		if (!alloc_trylock_allowed())
>> +			return NULL;
>> +		gfp |= gfp_nolock;
>
> I think we could do a
> 		fastpath_alloc_flags |= ALLOC_WMARK_MIN;
>
> to make it explicit, even though it's a no-op (the value is 0) and
> alloc_frozen_pages_nolock_noprof() didn't do it.
>
>> +	} else {
>> +		fastpath_alloc_flags |= ALLOC_WMARK_LOW;
>> +	}
>> +
>>  	gfp &= gfp_allowed_mask;
>>  	/*
>>  	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
>> @@ -5310,9 +5384,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>>  	fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
>>  	fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
>>  
>> -	/* First allocation attempt */
>> +	/* First allocation attempt (or, for nolock, only attempt) */
>>  	page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
>> -	if (likely(page))
>> +	if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
>>  		goto out;
>>  
>>  	alloc_gfp = gfp;
>> @@ -5329,7 +5403,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>>  out:
>>  	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>>  	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>> -		free_frozen_pages(page, order);
>> +		__free_frozen_pages(page, order,
>> +				    alloc_flags & ALLOC_NOLOCK ? FPI_TRYLOCK : 0);
>>  		page = NULL;
>>  	}
>>  
>> @@ -5345,7 +5420,8 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
>>  {
>>  	struct page *page;
>>  
>> -	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
>> +	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
>> +					   ALLOC_DEFAULT);
>>  	if (page)
>>  		set_page_refcounted(page);
>>  	return page;
>> @@ -7875,80 +7951,10 @@ static bool __free_unaccepted(struct page *page)
>>  
>>  struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
>>  {
>> -	/*
>> -	 * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
>> -	 * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
>> -	 * is not safe in arbitrary context.
>> -	 *
>> -	 * These two are the conditions for gfpflags_allow_spinning() being true.
>> -	 *
>> -	 * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
>> -	 * to warn. Also warn would trigger printk() which is unsafe from
>> -	 * various contexts. We cannot use printk_deferred_enter() to mitigate,
>> -	 * since the running context is unknown.
>> -	 *
>> -	 * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
>> -	 * is safe in any context. Also zeroing the page is mandatory for
>> -	 * BPF use cases.
>> -	 *
>> -	 * Though __GFP_NOMEMALLOC is not checked in the code path below,
>> -	 * specify it here to highlight that alloc_pages_nolock()
>> -	 * doesn't want to deplete reserves.
>> -	 */
>> -	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
>> -			| gfp_flags;
>> -	unsigned int alloc_flags = ALLOC_NOLOCK;
>> -	struct alloc_context ac = { };
>> -	struct page *page;
>> -
>> -	VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
>> -	/*
>> -	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
>> -	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
>> -	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
>> -	 * mark the task as the owner of another rt_spin_lock which will
>> -	 * confuse PI logic, so return immediately if called from hard IRQ or
>> -	 * NMI.
>> -	 *
>> -	 * Note, irqs_disabled() case is ok. This function can be called
>> -	 * from raw_spin_lock_irqsave region.
>> -	 */
>> -	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> -		return NULL;
>> -
>> -	/* On UP, spin_trylock() always succeeds even when it is locked */
>> -	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
>> -		return NULL;
>> -
>> -	if (!pcp_allowed_order(order))
>> -		return NULL;
>> -
>> -	/* Bailout, since _deferred_grow_zone() needs to take a lock */
>> -	if (deferred_pages_enabled())
>> -		return NULL;
>> -
>>  	if (nid == NUMA_NO_NODE)
>>  		nid = numa_node_id();
>>  
>> -	prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
>> -			    &alloc_gfp, &alloc_flags);
>> -
>> -	/*
>> -	 * Best effort allocation from percpu free list.
>> -	 * If it's empty attempt to spin_trylock zone->lock.
>> -	 */
>> -	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
>> -
>> -	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
>> -
>> -	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
>> -	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
>> -		__free_frozen_pages(page, order, FPI_TRYLOCK);
>> -		page = NULL;
>> -	}
>> -	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
>> -	kmsan_alloc_page(page, order, alloc_gfp);
>> -	return page;
>> +	return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_NOLOCK);
>>  }
>>  /**
>>   * alloc_pages_nolock - opportunistic reentrant allocation from any context
>> diff --git a/mm/page_alloc.h b/mm/page_alloc.h
>> index 3250d44f96457..e16f905f859a7 100644
>> --- a/mm/page_alloc.h
>> +++ b/mm/page_alloc.h
>> @@ -11,6 +11,7 @@
>>  #include <linux/nodemask.h>
>>  #include <linux/types.h>
>>  
>> +#define ALLOC_DEFAULT		0
>>  /* The ALLOC_WMARK bits are used as an index to zone->watermark */
>>  #define ALLOC_WMARK_MIN		WMARK_MIN
>>  #define ALLOC_WMARK_LOW		WMARK_LOW
>> @@ -219,7 +220,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
>>  extern int user_min_free_kbytes;
>>  
>>  struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
>> -		nodemask_t *nodemask);
>> +		nodemask_t *nodemask, unsigned int alloc_flags);
>>  #define __alloc_frozen_pages(...) \
>>  	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
>>  void free_frozen_pages(struct page *page, unsigned int order);
>> @@ -230,7 +231,8 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
>>  #else
>>  static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
>>  {
>> -	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
>> +	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL,
>> +					   0 /* ALLOC_DEFAULT */);
>
> Can use ALLOC_DEFAULT now.

Thanks and ack to all of these.

Will mention the ALLOC_WMARK_MIN thing in the commit message too.

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-6-57bef0eadbc2@google.com>]

* Re: [PATCH v3 06/16] mm/page_alloc: relax GFP WARN in nolock allocs
       [not found] ` <20260629-alloc-trylock-v3-6-57bef0eadbc2@google.com>
@ 2026-06-30 13:52   ` Harry Yoo
  2026-06-30 16:42   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 31+ messages in thread
From: Harry Yoo @ 2026-06-30 13:52 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
	Muchun Song, Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Gregory Price, Alexei Starovoitov, Matthew Wilcox, Hao Ge,
	linux-mm, linux-kernel, linux-rt-devel


[-- Attachment #1.1: Type: text/plain, Size: 864 bytes --]



On 6/29/26 10:11 PM, Brendan Jackman wrote:
> This WARN forbids setting other flags than __GFP_ACCOUNT but we
> unconditionally set the ones in gfp_nolock so they are certainly fine
> for the caller to set.
> 
> There are other GFP flags that are almost certainly fine to set here;
> Willy noted GFP_HIGHMEM, GFP_DMA, GFP_MOVABLE and GFP_HARDWALL. But,
> nolock allocation is rather special, so be conservative to try and
> ensure we have a chance to think carefully before nontrivial new
> usecases arise.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/linux-mm/ajS96fWbG4dzP3u3@casper.infradead.org/
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---

Acked-by: Harry Yoo (Oracle) <harry@kernel.org>

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 06/16] mm/page_alloc: relax GFP WARN in nolock allocs
       [not found] ` <20260629-alloc-trylock-v3-6-57bef0eadbc2@google.com>
  2026-06-30 13:52   ` [PATCH v3 06/16] mm/page_alloc: relax GFP WARN in nolock allocs Harry Yoo
@ 2026-06-30 16:42   ` Vlastimil Babka (SUSE)
  1 sibling, 0 replies; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 16:42 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> This WARN forbids setting other flags than __GFP_ACCOUNT but we
> unconditionally set the ones in gfp_nolock so they are certainly fine
> for the caller to set.
> 
> There are other GFP flags that are almost certainly fine to set here;
> Willy noted GFP_HIGHMEM, GFP_DMA, GFP_MOVABLE and GFP_HARDWALL. But,
> nolock allocation is rather special, so be conservative to try and
> ensure we have a chance to think carefully before nontrivial new
> usecases arise.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/linux-mm/ajS96fWbG4dzP3u3@casper.infradead.org/
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8d409d075e3e9..9cb3f1665b41b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5355,7 +5355,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  		return NULL;
>  
>  	if (alloc_flags & ALLOC_NOLOCK) {
> -		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> +		/* Certain other flags could be supported later if needed. */
> +		VM_WARN_ON_ONCE(gfp & ~(__GFP_ACCOUNT | gfp_nolock));
>  		if (!alloc_trylock_allowed())
>  			return NULL;
>  		gfp |= gfp_nolock;
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20260629-alloc-trylock-v3-7-57bef0eadbc2@google.com>]

* Re: [PATCH v3 07/16] mm: move some stuff to mm/page_alloc.h
       [not found] ` <20260629-alloc-trylock-v3-7-57bef0eadbc2@google.com>
@ 2026-06-30 16:42   ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30 16:42 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
	Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel

On 6/29/26 15:11, Brendan Jackman wrote:
> Some of this stuff in the public header is only used internally so
> shrink the scope to avoid silently growing new users.
> 
> drain_local_pages() is still used from kernel/power/snapshot.c so that
> needs to stay behind.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> ---
>  include/linux/gfp.h | 26 --------------------------
>  mm/page_alloc.h     | 28 ++++++++++++++++++++++++++++
>  mm/vmstat.c         |  1 +
>  3 files changed, 29 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index cdf95a9f0b87c..01d6d2591f49e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -17,28 +17,6 @@ struct mempolicy;
>  #define __default_gfp(a,b,...) b
>  #define default_gfp(...) __default_gfp(,##__VA_ARGS__,GFP_KERNEL)
>  
> -/* Convert GFP flags to their corresponding migrate type */
> -#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> -#define GFP_MOVABLE_SHIFT 3
> -
> -static inline int gfp_migratetype(const gfp_t gfp_flags)
> -{
> -	VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
> -	BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
> -	BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
> -	BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
> -	BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
> -		      GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
> -
> -	if (unlikely(page_group_by_mobility_disabled))
> -		return MIGRATE_UNMOVABLE;
> -
> -	/* Group based on mobility */
> -	return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
> -}
> -#undef GFP_MOVABLE_MASK
> -#undef GFP_MOVABLE_SHIFT
> -
>  static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
>  {
>  	return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
> @@ -395,10 +373,6 @@ extern void free_pages(unsigned long addr, unsigned int order);
>  #define __free_page(page) __free_pages((page), 0)
>  #define free_page(addr) free_pages((addr), 0)
>  
> -void page_alloc_init_cpuhp(void);
> -bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
> -void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
> -void drain_all_pages(struct zone *zone);
>  void drain_local_pages(struct zone *zone);
>  
>  void page_alloc_init_late(void);
> diff --git a/mm/page_alloc.h b/mm/page_alloc.h
> index e16f905f859a7..af83764788b96 100644
> --- a/mm/page_alloc.h
> +++ b/mm/page_alloc.h
> @@ -266,6 +266,34 @@ static inline bool free_area_empty(struct free_area *area, int migratetype)
>  	return list_empty(&area->free_list[migratetype]);
>  }
>  
> +/* Convert GFP flags to their corresponding migrate type */
> +#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> +#define GFP_MOVABLE_SHIFT 3
> +
> +static inline int gfp_migratetype(const gfp_t gfp_flags)
> +{
> +	VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
> +	BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
> +	BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
> +	BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
> +	BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
> +		      GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
> +
> +	if (unlikely(page_group_by_mobility_disabled))
> +		return MIGRATE_UNMOVABLE;
> +
> +	/* Group based on mobility */
> +	return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
> +}
> +#undef GFP_MOVABLE_MASK
> +#undef GFP_MOVABLE_SHIFT
> +
> +bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
> +void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
> +void drain_all_pages(struct zone *zone);
> +void drain_local_pages(struct zone *zone);
> +
> +void page_alloc_init_cpuhp(void);
>  void page_alloc_sysctl_init(void);
>  
>  #endif /* __MM_PAGE_ALLOC_H */
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 7b93fbf9af092..3b5cb1031f720 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -30,6 +30,7 @@
>  #include <linux/sched/isolation.h>
>  
>  #include "internal.h"
> +#include "page_alloc.h"
>  
>  #ifdef CONFIG_PROC_FS
>  #ifdef CONFIG_NUMA
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2026-06-30 18:48 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com>
2026-06-29 14:00 ` [PATCH v3 00/16] mm: Some cleanups for page allocator APIs Mike Rapoport
2026-06-29 14:30   ` Brendan Jackman
2026-06-29 15:05     ` Brendan Jackman
     [not found] ` <20260629-alloc-trylock-v3-4-57bef0eadbc2@google.com>
2026-06-29 14:16   ` [PATCH v3 04/16] mm: Split out internal page_alloc.h sashiko-bot
2026-06-30 13:54   ` Vlastimil Babka (SUSE)
     [not found] ` <20260629-alloc-trylock-v3-11-57bef0eadbc2@google.com>
2026-06-29 15:04   ` [PATCH v3 11/16] sgi-xp: Use higher-level allocator API sashiko-bot
2026-06-29 18:47   ` Steve Wahl
     [not found] ` <20260629-alloc-trylock-v3-13-57bef0eadbc2@google.com>
2026-06-29 15:27   ` [PATCH v3 13/16] mm: Remove __alloc_pages_node() sashiko-bot
     [not found] ` <20260629-alloc-trylock-v3-9-57bef0eadbc2@google.com>
2026-06-29 15:31   ` -EXT-[PATCH v3 09/16] KVM: VMX: Use higher-level allocator API Soderlund, David
     [not found] ` <20260629-alloc-trylock-v3-16-57bef0eadbc2@google.com>
2026-06-29 16:02   ` [PATCH v3 16/16] mm: remove the __GFP_NO_OBJ_EXT flag sashiko-bot
2026-06-30 10:04     ` Brendan Jackman
     [not found] ` <20260629-alloc-trylock-v3-15-57bef0eadbc2@google.com>
2026-06-29 15:56   ` [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG sashiko-bot
2026-06-30  4:34     ` Hao Ge
2026-06-30  1:55   ` Hao Ge
2026-06-30 10:10     ` Brendan Jackman
2026-06-30 12:01     ` Brendan Jackman
     [not found] ` <20260629-alloc-trylock-v3-1-57bef0eadbc2@google.com>
2026-06-30 12:27   ` [PATCH v3 01/16] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Vlastimil Babka (SUSE)
     [not found] ` <20260629-alloc-trylock-v3-2-57bef0eadbc2@google.com>
2026-06-30 12:38   ` [PATCH v3 02/16] mm/page_alloc: some renames to clarify alloc_flags scopes Vlastimil Babka (SUSE)
2026-06-30 17:25     ` Brendan Jackman
     [not found] ` <20260629-alloc-trylock-v3-3-57bef0eadbc2@google.com>
2026-06-30 12:43   ` [PATCH v3 03/16] mm: name some args in a function declaration Vlastimil Babka (SUSE)
     [not found] ` <20260629-alloc-trylock-v3-5-57bef0eadbc2@google.com>
2026-06-29 14:29   ` [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() sashiko-bot
2026-06-29 15:27     ` Brendan Jackman
2026-06-30 13:36   ` Harry Yoo
2026-06-30 15:34     ` Vlastimil Babka (SUSE)
2026-06-30 16:56       ` Brendan Jackman
2026-06-30 17:04     ` Brendan Jackman
2026-06-30 16:16   ` Vlastimil Babka (SUSE)
2026-06-30 18:47     ` Brendan Jackman
     [not found] ` <20260629-alloc-trylock-v3-6-57bef0eadbc2@google.com>
2026-06-30 13:52   ` [PATCH v3 06/16] mm/page_alloc: relax GFP WARN in nolock allocs Harry Yoo
2026-06-30 16:42   ` Vlastimil Babka (SUSE)
     [not found] ` <20260629-alloc-trylock-v3-7-57bef0eadbc2@google.com>
2026-06-30 16:42   ` [PATCH v3 07/16] mm: move some stuff to mm/page_alloc.h Vlastimil Babka (SUSE)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox