[PATCH v4 00/18] mm: Some cleanups for page allocator APIs

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH v4 00/18] mm: Some cleanups for page allocator APIs
@ 2026-07-02  9:49 Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
                   ` (17 more replies)
  0 siblings, 18 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, JP Kobryn, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Sean Christopherson, Paolo Bonzini, kvm,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Robin Holt, Steve Wahl, Arnd Bergmann,
	Greg Kroah-Hartman, Dimitris Michailidis, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

This is based on mm-new, it depends on moving alloc_tag to mm/ [0].

[0] https://lore.kernel.org/all/aj5QBtJcphPElczI@lucifer/

Some tweaks and cleanups for page allocator entrypoint and flags. This
is motivated by preparation for __GFP_UNMAPPED [1] (which will probably
become ALLOC_UNMAPPED in its next iteration), but all this is supposed
to be an improvement to the codebase in its own right: unifying code
paths, reducing API surface, and removing GFP flags.

[1] https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/

This started with unifying  __alloc_frozen_pages[_nolock]_noprof() and
expanded from there.

Unifying the nolock allocator entrypoint with the normal allocator
entrypoint means adding an alloc_flags argument to the later (only
exposed within mm/). This presents an opportunity to take advantage of
that arg to remove some GFP flags, if we add that alloc_flags arg a bit
more broadly to allocator entrypoints.

To distinguish between mm-internal and "public" allocator entrypoints,
it makes sense to use the __ prefix. There are already some public APIs
with that prefix. For *alloc_pages*, just removing those variants seems
like a nice cleanup anyway, so do that. For get_free_pages, the "__"
variant is the _only_ variant and it's very widely used, so it doesn't
seem worthwhile to modify that. Therefore, scope this "__" change
specifically to the *alloc_pages* API, which means we leave the
*folio_alloc* API untouched too, even though that could probably be
cleaned up if so desired.

Tested:

- KVM, mm, and BPF selftests in a QEMU VM

- kunit.py on x86_64

- For the ALLOC_NO_CODETAG bits I just booted a VM and read
  /proc/allocinfo. I confirmed that if I remove ALLOC_NO_CODETAG, the
  kernel crashes in early boot, so I was at least booting code that
  depends on this logic.

I used Google's internal version of Antigravity (AI coding harness) to
do the repetitive bits, those commits are marked with Assisted-by, the
rest is manual.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Changes in v4:
- Fixed some (harmless) missing applications of ac->alloc_flags (local
  Sashiko)
- Fixed various build issues.
   - Note that Sashiko pointed out a KMSAN build issue [0], I have
     fixed it but KMSAN builds are currently broken by objtool [1]. At least
     mm/kmsan/init.c compiles.
     [2] https://lore.kernel.org/all/20260629141642.628271F00A3D@smtp.kernel.org/
     [3] https://lore.kernel.org/all/20260630104434.GC751831@noisy.programming.kicks-ass.net/t/#u
- Avoided setting ALLOC_NOFRAGMENT under ALLOC_NOLOCK (Sashiko, Harry)
- Added patch to tweak alloc_flags_cma() interface (Vlastimil)
- More commit messages fixups (various)
- Added patch to create can_spin_trylock() (Harry)
- Link to v3: https://patch.msgid.link/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com

Changes in v3:
- Created mm/page_alloc.h
- Fixed EXPORT_SYMBOL() issues
- Reworded commit messages per Sashiko's pointers
- Dropped rename of alloc_flags arg in prepare_alloc_pages() (Suren)
- Renamed gfp_to_alloc_flags_nonblocking() too after rebasing onto:
  https://lore.kernel.org/all/20260623004600.113347-1-jp.kobryn@linux.dev/
- Link to v2: https://patch.msgid.link/20260622-alloc-trylock-v2-0-31f31367d420@google.com

Changes in v2:
- Fixed up whitespace in nolock unification patch
- Introduced ALLOC_DEFAULT to replace literal 0 for alloc_flags
- All other patches are new
- Link to v1: https://patch.msgid.link/20260617-alloc-trylock-v1-1-83fd7858832e@google.com

---
Brendan Jackman (17):
      mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
      mm/page_alloc: some renames to clarify alloc_flags scopes
      mm: name some args in a function declaration
      mm: Split out internal page_alloc.h
      mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
      mm/page_alloc: relax GFP WARN in nolock allocs
      mm: move some stuff to mm/page_alloc.h
      perf/x86/intel: Use higher-level allocator API
      KVM: VMX: Use higher-level allocator API
      x86/virt: Use higher-level allocator API
      sgi-xp: Use higher-level allocator API
      net/funeth: Switch to higher-level allocator API
      mm: Remove __alloc_pages_node()
      mm: Move __alloc_pages() to mm/page_alloc.h
      mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
      mm/page_alloc: drop alloc_flags arg from alloc_flags_cma()
      mm: factor out can_spin_trylock()

Vlastimil Babka (SUSE) (1):
      mm: remove the __GFP_NO_OBJ_EXT flag

 Documentation/admin-guide/cgroup-v1/cpusets.rst  |   2 +-
 Documentation/admin-guide/mm/transhuge.rst       |   2 +-
 MAINTAINERS                                      |   1 +
 arch/x86/events/intel/ds.c                       |   6 +-
 arch/x86/kvm/vmx/vmx.c                           |   2 +-
 arch/x86/virt/hw.c                               |   2 +-
 drivers/misc/sgi-xp/xpc_uv.c                     |   5 +-
 drivers/net/ethernet/fungible/funeth/funeth_rx.c |   2 +-
 include/linux/alloc_tag.h                        |   4 +-
 include/linux/gfp.h                              |  54 +---
 include/linux/gfp_types.h                        |   7 -
 include/linux/skbuff.h                           |   2 +-
 include/trace/events/mmflags.h                   |  10 +-
 mm/alloc_tag.c                                   |  23 +-
 mm/compaction.c                                  |   5 +-
 mm/hugetlb.c                                     |   4 +-
 mm/internal.h                                    | 276 ++------------------
 mm/khugepaged.c                                  |   1 +
 mm/kmsan/init.c                                  |   2 +-
 mm/memory-failure.c                              |   1 +
 mm/memory_hotplug.c                              |   1 +
 mm/mempolicy.c                                   |  11 +-
 mm/migrate.c                                     |   1 +
 mm/mm_init.c                                     |   1 +
 mm/page_alloc.c                                  | 269 ++++++++++---------
 mm/page_alloc.h                                  | 313 +++++++++++++++++++++++
 mm/page_frag_cache.c                             |   6 +-
 mm/page_isolation.c                              |   1 +
 mm/page_owner.c                                  |   2 +-
 mm/page_reporting.c                              |   1 +
 mm/show_mem.c                                    |   1 +
 mm/shuffle.c                                     |   1 +
 mm/slub.c                                        |  17 +-
 mm/swap.c                                        |   1 +
 mm/vmscan.c                                      |   1 +
 mm/vmstat.c                                      |   1 +
 tools/include/linux/gfp_types.h                  |   7 -
 37 files changed, 537 insertions(+), 509 deletions(-)
---
base-commit: 581a9d8f214b3f9d8777e66795ca86dbd9ca44a8
change-id: 20260617-alloc-trylock-14ad37dab337

Best regards,
--  
Brendan Jackman <jackmanb@google.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

It's confusing that the function is called "nolock" but the flag is
called "trylock", align them.

The function's terminology is more visible and has more mindshare so use that.

Suggested-by: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Link: https://lore.kernel.org/linux-mm/2399b3ad-4eac-4a14-94c3-27e9f07972a1@kernel.org/
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/internal.h   |  2 +-
 mm/page_alloc.c | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index fa4fb69444ecd..a2b09a13735bf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1480,7 +1480,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_NOFRAGMENT	  0x0
 #endif
 #define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
-#define ALLOC_TRYLOCK		0x400 /* Only use spin_trylock in allocation path */
+#define ALLOC_NOLOCK		0x400 /* Only use spin_trylock in allocation path */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 /* Flags that allow allocations below the min watermark. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 762d9b6bc792f..6004fe6583d47 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2530,7 +2530,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 	unsigned long flags;
 	int i;
 
-	if (unlikely(alloc_flags & ALLOC_TRYLOCK)) {
+	if (unlikely(alloc_flags & ALLOC_NOLOCK)) {
 		if (!spin_trylock_irqsave(&zone->lock, flags))
 			return 0;
 	} else {
@@ -3218,7 +3218,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 
 	do {
 		page = NULL;
-		if (unlikely(alloc_flags & ALLOC_TRYLOCK)) {
+		if (unlikely(alloc_flags & ALLOC_NOLOCK)) {
 			if (!spin_trylock_irqsave(&zone->lock, flags))
 				return NULL;
 		} else {
@@ -5059,7 +5059,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	 * Don't invoke should_fail logic, since it may call
 	 * get_random_u32() and printk() which need to spin_lock.
 	 */
-	if (!(*alloc_flags & ALLOC_TRYLOCK) &&
+	if (!(*alloc_flags & ALLOC_NOLOCK) &&
 	    should_fail_alloc_page(gfp_mask, order))
 		return false;
 
@@ -7804,7 +7804,7 @@ static bool cond_accept_memory(struct zone *zone, unsigned int order,
 		return false;
 
 	/* Bailout, since try_to_accept_memory_one() needs to take a lock */
-	if (alloc_flags & ALLOC_TRYLOCK)
+	if (alloc_flags & ALLOC_NOLOCK)
 		return false;
 
 	wmark = promo_wmark_pages(zone);
@@ -7896,7 +7896,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
 	 */
 	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
 			| gfp_flags;
-	unsigned int alloc_flags = ALLOC_TRYLOCK;
+	unsigned int alloc_flags = ALLOC_NOLOCK;
 	struct alloc_context ac = { };
 	struct page *page;
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 03/18] mm: name some args in a function declaration Brendan Jackman
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, JP Kobryn

It's pretty confusing that:

- The slowpath and fastpath have a totally distinct set of alloc_flags.

- gfp_to_alloc_flags() sounds generic but it only influences the
  slowpath.

Rename some variables to highlight which alloc_flags are
fastpath-specific. Rename gfp_to_alloc_flags() to highlight that it's
slowpath-specific.

gfp_to_alloc_flags_cma() and gfp_to_alloc_flags_nonblocking() currently
have perfectly harmless names, but to keep the naming consistent also
rename those to the alloc_flags_*() pattern (which already exists for
alloc_flags_nofragment()).

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: JP Kobryn <jp.kobryn@linux.dev>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 include/linux/skbuff.h |  2 +-
 mm/page_alloc.c        | 28 ++++++++++++++--------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 22eda1d54a0e8..4431b026e429d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3573,7 +3573,7 @@ static inline struct page *__dev_alloc_pages_noprof(gfp_t gfp_mask,
 	 * 3.  If requesting a order 0 page it will not be compound
 	 *     due to the check to see if order has a value in prep_new_page
 	 * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
-	 *     code in gfp_to_alloc_flags that should be enforcing this.
+	 *     code in alloc_flags_slowpath() that should be enforcing this.
 	 */
 	gfp_mask |= __GFP_COMP | __GFP_MEMALLOC;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6004fe6583d47..df1345cde301f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3774,8 +3774,8 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
 }
 
 /* Must be called after current_gfp_context() which can change gfp_mask */
-static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
-						  unsigned int alloc_flags)
+static inline unsigned int alloc_flags_cma(gfp_t gfp_mask,
+					   unsigned int alloc_flags)
 {
 #ifdef CONFIG_CMA
 	if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
@@ -4474,7 +4474,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
 }
 
 static inline unsigned int
-gfp_to_alloc_flags_nonblocking(gfp_t gfp_mask, unsigned int order)
+alloc_flags_nonblocking(gfp_t gfp_mask, unsigned int order)
 {
 	unsigned int alloc_flags = 0;
 
@@ -4497,7 +4497,7 @@ gfp_to_alloc_flags_nonblocking(gfp_t gfp_mask, unsigned int order)
 }
 
 static inline unsigned int
-gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
+alloc_flags_slowpath(gfp_t gfp_mask, unsigned int order)
 {
 	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
@@ -4512,7 +4512,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
 		alloc_flags |= ALLOC_KSWAPD;
 
-	alloc_flags |= gfp_to_alloc_flags_nonblocking(gfp_mask, order);
+	alloc_flags |= alloc_flags_nonblocking(gfp_mask, order);
 
 	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		/*
@@ -4525,7 +4525,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	} else if (unlikely(rt_or_dl_task(current)) && in_task())
 		alloc_flags |= ALLOC_MIN_RESERVE;
 
-	alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
+	alloc_flags = alloc_flags_cma(gfp_mask, alloc_flags);
 
 	if (defrag_mode)
 		alloc_flags |= ALLOC_NOFRAGMENT;
@@ -4791,7 +4791,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * kswapd needs to be woken up, and to avoid the cost of setting up
 	 * alloc_flags precisely. So we do that now.
 	 */
-	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
+	alloc_flags = alloc_flags_slowpath(gfp_mask, order);
 
 	/*
 	 * We need to recalculate the starting point for the zonelist iterator
@@ -4832,7 +4832,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
 	if (reserve_flags)
-		alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) |
+		alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
 					  (alloc_flags & ALLOC_KSWAPD);
 
 	/*
@@ -5063,7 +5063,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	    should_fail_alloc_page(gfp_mask, order))
 		return false;
 
-	*alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags);
+	*alloc_flags = alloc_flags_cma(gfp_mask, *alloc_flags);
 
 	/* Dirty zone balancing only done in the fast path */
 	ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE);
@@ -5277,7 +5277,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 		int preferred_nid, nodemask_t *nodemask)
 {
 	struct page *page;
-	unsigned int alloc_flags = ALLOC_WMARK_LOW;
+	unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
 	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = { };
 
@@ -5299,18 +5299,18 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 	gfp = current_gfp_context(gfp);
 	alloc_gfp = gfp;
 	if (!prepare_alloc_pages(gfp, order, preferred_nid, nodemask, &ac,
-			&alloc_gfp, &alloc_flags))
+			&alloc_gfp, &fastpath_alloc_flags))
 		return NULL;
 
 	/*
 	 * Forbid the first pass from falling back to types that fragment
 	 * memory until all local zones are considered.
 	 */
-	alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
-	alloc_flags |= gfp_to_alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
+	fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
+	fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
 
 	/* First allocation attempt */
-	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
+	page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
 	if (likely(page))
 		goto out;
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 03/18] mm: name some args in a function declaration
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 04/18] mm: Split out internal page_alloc.h Brendan Jackman
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

Checkpatch complains about this, a later patch will move the code, fix
it so that checkpatch doesn't complain about that patch. Do it in a
separate patch so the "move the code" patch is trivial to review using
Git's diff colouring.

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index a2b09a13735bf..1e252678bbc91 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -919,8 +919,8 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
 
 extern int user_min_free_kbytes;
 
-struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
-		nodemask_t *);
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
+		nodemask_t *nodemask);
 #define __alloc_frozen_pages(...) \
 	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
 void free_frozen_pages(struct page *page, unsigned int order);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 04/18] mm: Split out internal page_alloc.h
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (2 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 03/18] mm: name some args in a function declaration Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

internal.h is a bit bloated, seems like time for a page_alloc.h.

Where it wasn't obvious, the heuristic for deciding what goes into this
new header was "does it support/correspond to a definition in
mm/page_alloc.c?"

Only need to include it from ~20 .c files out of ~150 so this does seem
like a genuine reduction in scopes, which is nice. And there's no
circular internal.h<->page_alloc.h dependency, so it seems worthwhile to
split this up before that inevitably emerges!

Suggested-by: "David Hildenbrand (Arm)" <david@kernel.org>
Link: https://lore.kernel.org/all/41e92bab-6882-401a-8de9-154adbdcfb36@kernel.org/
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 MAINTAINERS          |   1 +
 mm/compaction.c      |   1 +
 mm/hugetlb.c         |   1 +
 mm/internal.h        | 252 -----------------------------------------------
 mm/khugepaged.c      |   1 +
 mm/kmsan/init.c      |   2 +-
 mm/memory-failure.c  |   1 +
 mm/memory_hotplug.c  |   1 +
 mm/mempolicy.c       |   1 +
 mm/migrate.c         |   1 +
 mm/mm_init.c         |   1 +
 mm/page_alloc.c      |   1 +
 mm/page_alloc.h      | 269 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_frag_cache.c |   2 +-
 mm/page_isolation.c  |   1 +
 mm/page_owner.c      |   2 +-
 mm/page_reporting.c  |   1 +
 mm/show_mem.c        |   1 +
 mm/shuffle.c         |   1 +
 mm/slub.c            |   1 +
 mm/swap.c            |   1 +
 mm/vmscan.c          |   1 +
 22 files changed, 289 insertions(+), 255 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2acb019516eb7..4852e069f4381 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17171,6 +17171,7 @@ F:	mm/debug_page_alloc.c
 F:	mm/debug_page_ref.c
 F:	mm/fail_page_alloc.c
 F:	mm/page_alloc.c
+F:	mm/page_alloc.h
 F:	mm/page_ext.c
 F:	mm/page_frag_cache.c
 F:	mm/page_isolation.c
diff --git a/mm/compaction.c b/mm/compaction.c
index f08765ade014c..7d80735502d9a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -24,6 +24,7 @@
 #include <linux/page_owner.h>
 #include <linux/psi.h>
 #include <linux/cpuset.h>
+#include "page_alloc.h"
 #include "internal.h"
 
 #ifdef CONFIG_COMPACTION
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 231de5012eba8..1d6203912d896 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,7 @@
 #include <linux/node.h>
 #include <linux/page_owner.h>
 #include "internal.h"
+#include "page_alloc.h"
 #include "hugetlb_vmemmap.h"
 #include "hugetlb_cma.h"
 #include "hugetlb_internal.h"
diff --git a/mm/internal.h b/mm/internal.h
index 1e252678bbc91..7e3b2386e274b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -658,165 +658,6 @@ extern int defrag_mode;
 void setup_per_zone_wmarks(void);
 void calculate_min_free_kbytes(void);
 int __meminit init_per_zone_wmark_min(void);
-void page_alloc_sysctl_init(void);
-
-/*
- * Structure for holding the mostly immutable allocation parameters passed
- * between functions involved in allocations, including the alloc_pages*
- * family of functions.
- *
- * nodemask, migratetype and highest_zoneidx are initialized only once in
- * __alloc_pages() and then never change.
- *
- * zonelist, preferred_zone and highest_zoneidx are set first in
- * __alloc_pages() for the fast path, and might be later changed
- * in __alloc_pages_slowpath(). All other functions pass the whole structure
- * by a const pointer.
- */
-struct alloc_context {
-	struct zonelist *zonelist;
-	const nodemask_t *nodemask;
-	struct zoneref *preferred_zoneref;
-	int migratetype;
-
-	/*
-	 * highest_zoneidx represents highest usable zone index of
-	 * the allocation request. Due to the nature of the zone,
-	 * memory on lower zone than the highest_zoneidx will be
-	 * protected by lowmem_reserve[highest_zoneidx].
-	 *
-	 * highest_zoneidx is also used by reclaim/compaction to limit
-	 * the target zone since higher zone than this index cannot be
-	 * usable for this allocation request.
-	 */
-	enum zone_type highest_zoneidx;
-	bool spread_dirty_pages;
-};
-
-/*
- * This function returns the order of a free page in the buddy system. In
- * general, page_zone(page)->lock must be held by the caller to prevent the
- * page from being allocated in parallel and returning garbage as the order.
- * If a caller does not hold page_zone(page)->lock, it must guarantee that the
- * page cannot be allocated or merged in parallel. Alternatively, it must
- * handle invalid values gracefully, and use buddy_order_unsafe() below.
- */
-static inline unsigned int buddy_order(struct page *page)
-{
-	/* PageBuddy() must be checked by the caller */
-	return page_private(page);
-}
-
-/*
- * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
- * PageBuddy() should be checked first by the caller to minimize race window,
- * and invalid values must be handled gracefully.
- *
- * READ_ONCE is used so that if the caller assigns the result into a local
- * variable and e.g. tests it for valid range before using, the compiler cannot
- * decide to remove the variable and inline the page_private(page) multiple
- * times, potentially observing different values in the tests and the actual
- * use of the result.
- */
-#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
-
-/*
- * This function checks whether a page is free && is the buddy
- * we can coalesce a page and its buddy if
- * (a) the buddy is not in a hole (check before calling!) &&
- * (b) the buddy is in the buddy system &&
- * (c) a page and its buddy have the same order &&
- * (d) a page and its buddy are in the same zone.
- *
- * For recording whether a page is in the buddy system, we set PageBuddy.
- * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
- *
- * For recording page's order, we use page_private(page).
- */
-static inline bool page_is_buddy(struct page *page, struct page *buddy,
-				 unsigned int order)
-{
-	if (!page_is_guard(buddy) && !PageBuddy(buddy))
-		return false;
-
-	if (buddy_order(buddy) != order)
-		return false;
-
-	/*
-	 * zone check is done late to avoid uselessly calculating
-	 * zone/node ids for pages that could never merge.
-	 */
-	if (page_zone_id(page) != page_zone_id(buddy))
-		return false;
-
-	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
-
-	return true;
-}
-
-/*
- * Locate the struct page for both the matching buddy in our
- * pair (buddy1) and the combined O(n+1) page they form (page).
- *
- * 1) Any buddy B1 will have an order O twin B2 which satisfies
- * the following equation:
- *     B2 = B1 ^ (1 << O)
- * For example, if the starting buddy (buddy2) is #8 its order
- * 1 buddy is #10:
- *     B2 = 8 ^ (1 << 1) = 8 ^ 2 = 10
- *
- * 2) Any buddy B will have an order O+1 parent P which
- * satisfies the following equation:
- *     P = B & ~(1 << O)
- *
- * Assumption: *_mem_map is contiguous at least up to MAX_PAGE_ORDER
- */
-static inline unsigned long
-__find_buddy_pfn(unsigned long page_pfn, unsigned int order)
-{
-	return page_pfn ^ (1 << order);
-}
-
-/*
- * Find the buddy of @page and validate it.
- * @page: The input page
- * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
- *       function is used in the performance-critical __free_one_page().
- * @order: The order of the page
- * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
- *             page_to_pfn().
- *
- * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
- * not the same as @page. The validation is necessary before use it.
- *
- * Return: the found buddy page or NULL if not found.
- */
-static inline struct page *find_buddy_page_pfn(struct page *page,
-			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
-{
-	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
-	struct page *buddy;
-
-	buddy = page + (__buddy_pfn - pfn);
-	if (buddy_pfn)
-		*buddy_pfn = __buddy_pfn;
-
-	if (page_is_buddy(page, buddy, order))
-		return buddy;
-	return NULL;
-}
-
-extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
-				unsigned long end_pfn, struct zone *zone);
-
-static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
-				unsigned long end_pfn, struct zone *zone)
-{
-	if (zone->contiguous)
-		return pfn_to_page(start_pfn);
-
-	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
-}
 
 void set_zone_contiguous(struct zone *zone);
 bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
@@ -831,8 +672,6 @@ extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
 extern void memblock_free_pages(unsigned long pfn, unsigned int order);
-extern void __free_pages_core(struct page *page, unsigned int order,
-		enum meminit_context context);
 
 /*
  * This will have no effect, other than possibly generating a warning, if the
@@ -914,40 +753,6 @@ static inline void init_compound_tail(struct page *tail,
 	prep_compound_tail(tail, head, order);
 }
 
-void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
-extern bool free_pages_prepare(struct page *page, unsigned int order);
-
-extern int user_min_free_kbytes;
-
-struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
-		nodemask_t *nodemask);
-#define __alloc_frozen_pages(...) \
-	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
-void free_frozen_pages(struct page *page, unsigned int order);
-void free_unref_folios(struct folio_batch *fbatch);
-
-#ifdef CONFIG_NUMA
-struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
-#else
-static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
-{
-	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
-}
-#endif
-
-#define alloc_frozen_pages(...) \
-	alloc_hooks(alloc_frozen_pages_noprof(__VA_ARGS__))
-
-struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order);
-#define alloc_frozen_pages_nolock(...) \
-	alloc_hooks(alloc_frozen_pages_nolock_noprof(__VA_ARGS__))
-void free_frozen_pages_nolock(struct page *page, unsigned int order);
-
-extern void zone_pcp_reset(struct zone *zone);
-extern void zone_pcp_disable(struct zone *zone);
-extern void zone_pcp_enable(struct zone *zone);
-extern void zone_pcp_init(struct zone *zone);
-
 extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
 			  phys_addr_t min_addr,
 			  int nid, bool exact_nid);
@@ -1101,23 +906,6 @@ static inline void init_cma_pageblock(struct page *page)
 }
 #endif
 
-enum fallback_result {
-	/* Found suitable migratetype, *mt_out is valid. */
-	FALLBACK_FOUND,
-	/* No fallback found in requested order. */
-	FALLBACK_EMPTY,
-	/* Passed @claimable, but claiming whole block is a bad idea. */
-	FALLBACK_NOCLAIM,
-};
-enum fallback_result
-find_suitable_fallback(struct free_area *area, unsigned int order,
-		       int migratetype, bool claimable, int *mt_out);
-
-static inline bool free_area_empty(struct free_area *area, int migratetype)
-{
-	return list_empty(&area->free_list[migratetype]);
-}
-
 /* mm/util.c */
 struct anon_vma *folio_anon_vma(const struct folio *folio);
 
@@ -1445,46 +1233,6 @@ extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
 unsigned long reclaim_pages(struct list_head *folio_list);
 unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 					    struct list_head *folio_list);
-/* The ALLOC_WMARK bits are used as an index to zone->watermark */
-#define ALLOC_WMARK_MIN		WMARK_MIN
-#define ALLOC_WMARK_LOW		WMARK_LOW
-#define ALLOC_WMARK_HIGH	WMARK_HIGH
-#define ALLOC_NO_WATERMARKS	0x04 /* don't check watermarks at all */
-
-/* Mask to get the watermark bits */
-#define ALLOC_WMARK_MASK	(ALLOC_NO_WATERMARKS-1)
-
-/*
- * Only MMU archs have async oom victim reclaim - aka oom_reaper so we
- * cannot assume a reduced access to memory reserves is sufficient for
- * !MMU
- */
-#ifdef CONFIG_MMU
-#define ALLOC_OOM		0x08
-#else
-#define ALLOC_OOM		ALLOC_NO_WATERMARKS
-#endif
-
-#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
-				       * to 25% of the min watermark or
-				       * 62.5% if __GFP_HIGH is set.
-				       */
-#define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
-				       * of the min watermark.
-				       */
-#define ALLOC_CPUSET		 0x40 /* check for correct cpuset */
-#define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */
-#ifdef CONFIG_ZONE_DMA32
-#define ALLOC_NOFRAGMENT	0x100 /* avoid mixing pageblock types */
-#else
-#define ALLOC_NOFRAGMENT	  0x0
-#endif
-#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
-#define ALLOC_NOLOCK		0x400 /* Only use spin_trylock in allocation path */
-#define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
-
-/* Flags that allow allocations below the min watermark. */
-#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 617bca76db49b..58e14d1543ecb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -26,6 +26,7 @@
 
 #include <asm/tlb.h>
 #include "internal.h"
+#include "page_alloc.h"
 #include "mm_slot.h"
 
 enum scan_result {
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
index b14ce3417e65e..4983b6e9f7c99 100644
--- a/mm/kmsan/init.c
+++ b/mm/kmsan/init.c
@@ -13,7 +13,7 @@
 #include <linux/mm.h>
 #include <linux/memblock.h>
 
-#include "../internal.h"
+#include "../page_alloc.h"
 
 #define NUM_FUTURE_RANGES 128
 struct start_end_pair {
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4916ab1453257..bf717ec595087 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -66,6 +66,7 @@
 #include <trace/events/memory-failure.h>
 
 #include "swap.h"
+#include "page_alloc.h"
 #include "internal.h"
 
 static int sysctl_memory_failure_early_kill __read_mostly;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8b137328dcf01..11ab2f7bc7f3b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -40,6 +40,7 @@
 #include <asm/tlbflush.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 #include "shuffle.h"
 
 enum {
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bba65898aee17..948264407dee3 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -119,6 +119,7 @@
 #include <linux/memory.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 
 /* Internal flags */
 #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0)	/* Skip checks for continuous vmas */
diff --git a/mm/migrate.c b/mm/migrate.c
index 7301e424f8d81..b24f900b06c84 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -49,6 +49,7 @@
 #include <trace/events/migrate.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 #include "swap.h"
 
 static const struct movable_operations *offline_movable_ops;
diff --git a/mm/mm_init.c b/mm/mm_init.c
index b7ff4d6f397bd..86b5243692c39 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -33,6 +33,7 @@
 #include <linux/kexec_handover.h>
 #include <linux/hugetlb.h>
 #include "internal.h"
+#include "page_alloc.h"
 #include "slab.h"
 #include "shuffle.h"
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df1345cde301f..85cee8a0031f2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -56,6 +56,7 @@
 #include <linux/pgalloc_tag.h>
 #include <asm/div64.h>
 #include "internal.h"
+#include "page_alloc.h"
 #include "shuffle.h"
 #include "page_reporting.h"
 
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
new file mode 100644
index 0000000000000..3250d44f96457
--- /dev/null
+++ b/mm/page_alloc.h
@@ -0,0 +1,269 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * mm-internal API for the page (buddy) allocator. Public API lives in
+ * include/linux/gfp.h.
+ */
+#ifndef __MM_PAGE_ALLOC_H
+#define __MM_PAGE_ALLOC_H
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/nodemask.h>
+#include <linux/types.h>
+
+/* The ALLOC_WMARK bits are used as an index to zone->watermark */
+#define ALLOC_WMARK_MIN		WMARK_MIN
+#define ALLOC_WMARK_LOW		WMARK_LOW
+#define ALLOC_WMARK_HIGH	WMARK_HIGH
+#define ALLOC_NO_WATERMARKS	0x04 /* don't check watermarks at all */
+
+/* Mask to get the watermark bits */
+#define ALLOC_WMARK_MASK	(ALLOC_NO_WATERMARKS-1)
+
+/*
+ * Only MMU archs have async oom victim reclaim - aka oom_reaper so we
+ * cannot assume a reduced access to memory reserves is sufficient for
+ * !MMU
+ */
+#ifdef CONFIG_MMU
+#define ALLOC_OOM		0x08
+#else
+#define ALLOC_OOM		ALLOC_NO_WATERMARKS
+#endif
+
+#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
+				       * to 25% of the min watermark or
+				       * 62.5% if __GFP_HIGH is set.
+				       */
+#define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
+				       * of the min watermark.
+				       */
+#define ALLOC_CPUSET		 0x40 /* check for correct cpuset */
+#define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */
+#ifdef CONFIG_ZONE_DMA32
+#define ALLOC_NOFRAGMENT	0x100 /* avoid mixing pageblock types */
+#else
+#define ALLOC_NOFRAGMENT	  0x0
+#endif
+#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
+#define ALLOC_NOLOCK		0x400 /* Only use spin_trylock in allocation path */
+#define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+
+/* Flags that allow allocations below the min watermark. */
+#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+
+/*
+ * Structure for holding the mostly immutable allocation parameters passed
+ * between functions involved in allocations, including the alloc_pages*
+ * family of functions.
+ *
+ * nodemask, migratetype and highest_zoneidx are initialized only once in
+ * __alloc_pages() and then never change.
+ *
+ * zonelist, preferred_zone and highest_zoneidx are set first in
+ * __alloc_pages() for the fast path, and might be later changed
+ * in __alloc_pages_slowpath(). All other functions pass the whole structure
+ * by a const pointer.
+ */
+struct alloc_context {
+	struct zonelist *zonelist;
+	const nodemask_t *nodemask;
+	struct zoneref *preferred_zoneref;
+	int migratetype;
+
+	/*
+	 * highest_zoneidx represents highest usable zone index of
+	 * the allocation request. Due to the nature of the zone,
+	 * memory on lower zone than the highest_zoneidx will be
+	 * protected by lowmem_reserve[highest_zoneidx].
+	 *
+	 * highest_zoneidx is also used by reclaim/compaction to limit
+	 * the target zone since higher zone than this index cannot be
+	 * usable for this allocation request.
+	 */
+	enum zone_type highest_zoneidx;
+	bool spread_dirty_pages;
+};
+
+/*
+ * This function returns the order of a free page in the buddy system. In
+ * general, page_zone(page)->lock must be held by the caller to prevent the
+ * page from being allocated in parallel and returning garbage as the order.
+ * If a caller does not hold page_zone(page)->lock, it must guarantee that the
+ * page cannot be allocated or merged in parallel. Alternatively, it must
+ * handle invalid values gracefully, and use buddy_order_unsafe() below.
+ */
+static inline unsigned int buddy_order(struct page *page)
+{
+	/* PageBuddy() must be checked by the caller */
+	return page_private(page);
+}
+
+/*
+ * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
+ * PageBuddy() should be checked first by the caller to minimize race window,
+ * and invalid values must be handled gracefully.
+ *
+ * READ_ONCE is used so that if the caller assigns the result into a local
+ * variable and e.g. tests it for valid range before using, the compiler cannot
+ * decide to remove the variable and inline the page_private(page) multiple
+ * times, potentially observing different values in the tests and the actual
+ * use of the result.
+ */
+#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
+
+/*
+ * This function checks whether a page is free && is the buddy
+ * we can coalesce a page and its buddy if
+ * (a) the buddy is not in a hole (check before calling!) &&
+ * (b) the buddy is in the buddy system &&
+ * (c) a page and its buddy have the same order &&
+ * (d) a page and its buddy are in the same zone.
+ *
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
+ *
+ * For recording page's order, we use page_private(page).
+ */
+static inline bool page_is_buddy(struct page *page, struct page *buddy,
+				 unsigned int order)
+{
+	if (!page_is_guard(buddy) && !PageBuddy(buddy))
+		return false;
+
+	if (buddy_order(buddy) != order)
+		return false;
+
+	/*
+	 * zone check is done late to avoid uselessly calculating
+	 * zone/node ids for pages that could never merge.
+	 */
+	if (page_zone_id(page) != page_zone_id(buddy))
+		return false;
+
+	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
+
+	return true;
+}
+
+/*
+ * Locate the struct page for both the matching buddy in our
+ * pair (buddy1) and the combined O(n+1) page they form (page).
+ *
+ * 1) Any buddy B1 will have an order O twin B2 which satisfies
+ * the following equation:
+ *     B2 = B1 ^ (1 << O)
+ * For example, if the starting buddy (buddy2) is #8 its order
+ * 1 buddy is #10:
+ *     B2 = 8 ^ (1 << 1) = 8 ^ 2 = 10
+ *
+ * 2) Any buddy B will have an order O+1 parent P which
+ * satisfies the following equation:
+ *     P = B & ~(1 << O)
+ *
+ * Assumption: *_mem_map is contiguous at least up to MAX_PAGE_ORDER
+ */
+static inline unsigned long
+__find_buddy_pfn(unsigned long page_pfn, unsigned int order)
+{
+	return page_pfn ^ (1 << order);
+}
+
+/*
+ * Find the buddy of @page and validate it.
+ * @page: The input page
+ * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
+ *       function is used in the performance-critical __free_one_page().
+ * @order: The order of the page
+ * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
+ *             page_to_pfn().
+ *
+ * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
+ * not the same as @page. The validation is necessary before use it.
+ *
+ * Return: the found buddy page or NULL if not found.
+ */
+static inline struct page *find_buddy_page_pfn(struct page *page,
+			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
+{
+	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
+	struct page *buddy;
+
+	buddy = page + (__buddy_pfn - pfn);
+	if (buddy_pfn)
+		*buddy_pfn = __buddy_pfn;
+
+	if (page_is_buddy(page, buddy, order))
+		return buddy;
+	return NULL;
+}
+
+extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone);
+
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	if (zone->contiguous)
+		return pfn_to_page(start_pfn);
+
+	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
+extern void __free_pages_core(struct page *page, unsigned int order,
+		enum meminit_context context);
+
+void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
+extern bool free_pages_prepare(struct page *page, unsigned int order);
+
+extern int user_min_free_kbytes;
+
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
+		nodemask_t *nodemask);
+#define __alloc_frozen_pages(...) \
+	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
+void free_frozen_pages(struct page *page, unsigned int order);
+void free_unref_folios(struct folio_batch *fbatch);
+
+#ifdef CONFIG_NUMA
+struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
+#else
+static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
+{
+	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
+}
+#endif
+
+#define alloc_frozen_pages(...) \
+	alloc_hooks(alloc_frozen_pages_noprof(__VA_ARGS__))
+
+struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order);
+#define alloc_frozen_pages_nolock(...) \
+	alloc_hooks(alloc_frozen_pages_nolock_noprof(__VA_ARGS__))
+void free_frozen_pages_nolock(struct page *page, unsigned int order);
+
+extern void zone_pcp_reset(struct zone *zone);
+extern void zone_pcp_disable(struct zone *zone);
+extern void zone_pcp_enable(struct zone *zone);
+extern void zone_pcp_init(struct zone *zone);
+
+enum fallback_result {
+	/* Found suitable migratetype, *mt_out is valid. */
+	FALLBACK_FOUND,
+	/* No fallback found in requested order. */
+	FALLBACK_EMPTY,
+	/* Passed @claimable, but claiming whole block is a bad idea. */
+	FALLBACK_NOCLAIM,
+};
+enum fallback_result
+find_suitable_fallback(struct free_area *area, unsigned int order,
+		       int migratetype, bool claimable, int *mt_out);
+
+static inline bool free_area_empty(struct free_area *area, int migratetype)
+{
+	return list_empty(&area->free_list[migratetype]);
+}
+
+void page_alloc_sysctl_init(void);
+
+#endif /* __MM_PAGE_ALLOC_H */
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index d2423f30577e4..a1077cef3a791 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -18,7 +18,7 @@
 #include <linux/init.h>
 #include <linux/mm.h>
 #include <linux/page_frag_cache.h>
-#include "internal.h"
+#include "page_alloc.h"
 
 static unsigned long encoded_page_create(struct page *page, unsigned int order,
 					 bool pfmemalloc)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 32ce8a7d9df35..e5dfc7bf49446 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -11,6 +11,7 @@
 #include <linux/page_owner.h>
 #include <linux/migrate.h>
 #include "internal.h"
+#include "page_alloc.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 26d6ab6530ce0..e399ebed27234 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -13,7 +13,7 @@
 #include <linux/memcontrol.h>
 #include <linux/sched/clock.h>
 
-#include "internal.h"
+#include "page_alloc.h"
 
 /*
  * TODO: teach PAGE_OWNER_STACK_DEPTH (__dump_page_owner and save_stack)
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 7418f2e500bb4..c7325704c3202 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -8,6 +8,7 @@
 #include <linux/delay.h>
 #include <linux/scatterlist.h>
 
+#include "page_alloc.h"
 #include "page_reporting.h"
 #include "internal.h"
 
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 1b721a8ade67d..d1288b4c2b640 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -16,6 +16,7 @@
 #include <linux/vmstat.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 #include "swap.h"
 
 atomic_long_t _totalram_pages __read_mostly;
diff --git a/mm/shuffle.c b/mm/shuffle.c
index fb1393b8b3a9d..82a2c7725a08a 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -7,6 +7,7 @@
 #include <linux/random.h>
 #include <linux/moduleparam.h>
 #include "internal.h"
+#include "page_alloc.h"
 #include "shuffle.h"
 
 DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
diff --git a/mm/slub.c b/mm/slub.c
index 9ec774dc70096..877021e69cc41 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -53,6 +53,7 @@
 #include <trace/events/kmem.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 
 /*
  * Lock order:
diff --git a/mm/swap.c b/mm/swap.c
index 58e4eff698cc4..d25131305c94c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,6 +39,7 @@
 #include <linux/buffer_head.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/pagemap.h>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f40cfe9d703b3..2407a80918a7d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -66,6 +66,7 @@
 #include <linux/sched/sysctl.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 #include "swap.h"
 
 #define CREATE_TRACE_POINTS

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (3 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 04/18] mm: Split out internal page_alloc.h Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
main entry point function is significantly different from the normal
__alloc_frozen_pages_nolock(), this is tiring when reading the code.

Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
exposed to mm/) and then turn the nolock variant into a thin wrapper
that just sets that flag (as well as handling NUMA_NO_NODE, similar to
how some of the wrappers in gfp.h do).

For consistency, set ALLOC_WMARK_MIN explicitly in fastpath_alloc_flags
for the new ALLOC_NOLOCK path. This was already "done" silently in
__alloc_frozen_pages_nolock_noprof(): ALLOC_WMARK_MIN is 0.

Rationale that this doesn't change anything:

1. Simple bits: A bunch of the nolock-specific handling is just moved to
   the new alloc_order_allowed(), alloc_nolock_allowed() and
   gfp_nolock.

2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
   previously in the nolock variant:

   a. Application of gfp_allowed_mask; this only affects early boot,
      only flags that affect the slowpath get changed here, and the
      nolock allocation path isn't allowed to the GFP_BOOT_MASK flags.

   b. Application of current_gfp_context() - also only affects the
      slowpath

3. The slowpath itself: this is now just explicitly skipped under
   !ALLOC_TRYLOCK.

Ulterior motive: adding an alloc_flags arg to the allocator's
mm-internal entrypoint can later be used to do more allocation
customisation without needing to create new GFP flags.

No functional change intended.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/hugetlb.c    |   3 +-
 mm/mempolicy.c  |  10 +--
 mm/page_alloc.c | 192 +++++++++++++++++++++++++++++---------------------------
 mm/page_alloc.h |   6 +-
 mm/slub.c       |   6 +-
 5 files changed, 117 insertions(+), 100 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1d6203912d896..076036bd3dcdb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1806,7 +1806,8 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
 	if (alloc_try_hard)
 		gfp_mask |= __GFP_RETRY_MAYFAIL;
 
-	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
+	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask,
+						     ALLOC_DEFAULT);
 
 	/*
 	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 948264407dee3..914f81863db5a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2417,9 +2417,11 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
 	 */
 	preferred_gfp = gfp | __GFP_NOWARN;
 	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
-	page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask);
+	page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask,
+					   ALLOC_DEFAULT);
 	if (!page)
-		page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL);
+		page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL,
+						   ALLOC_DEFAULT);
 
 	return page;
 }
@@ -2467,7 +2469,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 			 */
 			page = __alloc_frozen_pages_noprof(
 				gfp | __GFP_THISNODE | __GFP_NORETRY, order,
-				nid, NULL);
+				nid, NULL, ALLOC_DEFAULT);
 			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
 				return page;
 			/*
@@ -2479,7 +2481,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 		}
 	}
 
-	page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);
+	page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask, ALLOC_DEFAULT);
 
 	if (unlikely(pol->mode == MPOL_INTERLEAVE ||
 		     pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 85cee8a0031f2..f47a848555077 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5222,7 +5222,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
 		}
 		nr_account++;
 
-		prep_new_page(page, 0, gfp, 0);
+		prep_new_page(page, 0, gfp, ALLOC_DEFAULT);
 		set_page_refcounted(page);
 		page_array[nr_populated++] = page;
 	}
@@ -5271,24 +5271,99 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
 	}
 }
 
-/*
- * This is the 'heart' of the zoned buddy allocator.
- */
-struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
-		int preferred_nid, nodemask_t *nodemask)
+static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
+				       unsigned int alloc_flags)
 {
-	struct page *page;
-	unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
-	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
-	struct alloc_context ac = { };
+	if (alloc_flags & ALLOC_NOLOCK)
+		return pcp_allowed_order(order);
 
 	/*
 	 * There are several places where we assume that the order value is sane
 	 * so bail out early if the request is out of bound.
 	 */
-	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
+	return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
+}
+
+static inline bool alloc_nolock_allowed(void)
+{
+	/*
+	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
+	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
+	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
+	 * mark the task as the owner of another rt_spin_lock which will
+	 * confuse PI logic, so return immediately if called from hard IRQ or
+	 * NMI.
+	 *
+	 * Note, irqs_disabled() case is ok. This function can be called
+	 * from raw_spin_lock_irqsave region.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
+		return false;
+
+	/* On UP, spin_trylock() always succeeds even when it is locked */
+	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+		return false;
+
+	/* Bailout, since _deferred_grow_zone() needs to take a lock */
+	if (deferred_pages_enabled())
+		return false;
+
+	return true;
+}
+
+/*
+ * GFP flags to set for ALLOC_NOLOCK i.e. alloc_pages_nolock().
+ *
+ * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
+ * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
+ * is not safe in arbitrary context.
+ *
+ * These two are the conditions for gfpflags_allow_spinning() being true.
+ *
+ * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
+ * to warn. Also warn would trigger printk() which is unsafe from
+ * various contexts. We cannot use printk_deferred_enter() to mitigate,
+ * since the running context is unknown.
+ *
+ * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
+ * is safe in any context. Also zeroing the page is mandatory for
+ * BPF use cases.
+ *
+ * Though __GFP_NOMEMALLOC is not checked in the code path below,
+ * specify it here to highlight that alloc_pages_nolock()
+ * doesn't want to deplete reserves.
+ */
+static const gfp_t gfp_nolock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
+				__GFP_COMP;
+
+/*
+ * This is the 'heart' of the zoned buddy allocator.
+ */
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
+{
+	struct page *page;
+	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
+	struct alloc_context ac = { };
+	unsigned int fastpath_alloc_flags = alloc_flags;
+
+	/* Other flags could be supported later if needed. */
+	if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
 		return NULL;
 
+	if (!alloc_order_allowed(gfp, order, alloc_flags))
+		return NULL;
+
+	if (alloc_flags & ALLOC_NOLOCK) {
+		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
+		if (!alloc_nolock_allowed())
+			return NULL;
+		gfp |= gfp_nolock;
+		fastpath_alloc_flags |= ALLOC_WMARK_MIN;
+	} else {
+		fastpath_alloc_flags |= ALLOC_WMARK_LOW;
+	}
+
 	gfp &= gfp_allowed_mask;
 	/*
 	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
@@ -5303,16 +5378,19 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 			&alloc_gfp, &fastpath_alloc_flags))
 		return NULL;
 
-	/*
-	 * Forbid the first pass from falling back to types that fragment
-	 * memory until all local zones are considered.
-	 */
-	fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
+	if (!(alloc_flags & ALLOC_NOLOCK)) {
+		/*
+		 * Forbid the first pass from falling back to types that
+		 * fragment memory until all local zones are considered.
+		 */
+		fastpath_alloc_flags |= alloc_flags_nofragment(
+			zonelist_zone(ac.preferred_zoneref), gfp);
+	}
 	fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
 
-	/* First allocation attempt */
+	/* First allocation attempt (or, for nolock, only attempt) */
 	page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
-	if (likely(page))
+	if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
 		goto out;
 
 	alloc_gfp = gfp;
@@ -5329,7 +5407,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 out:
 	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
 	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
-		free_frozen_pages(page, order);
+		__free_frozen_pages(page, order,
+				    alloc_flags & ALLOC_NOLOCK ? FPI_TRYLOCK : 0);
 		page = NULL;
 	}
 
@@ -5345,7 +5424,8 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
 {
 	struct page *page;
 
-	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
+	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
+					   ALLOC_DEFAULT);
 	if (page)
 		set_page_refcounted(page);
 	return page;
@@ -7875,80 +7955,10 @@ static bool __free_unaccepted(struct page *page)
 
 struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
 {
-	/*
-	 * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
-	 * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
-	 * is not safe in arbitrary context.
-	 *
-	 * These two are the conditions for gfpflags_allow_spinning() being true.
-	 *
-	 * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
-	 * to warn. Also warn would trigger printk() which is unsafe from
-	 * various contexts. We cannot use printk_deferred_enter() to mitigate,
-	 * since the running context is unknown.
-	 *
-	 * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
-	 * is safe in any context. Also zeroing the page is mandatory for
-	 * BPF use cases.
-	 *
-	 * Though __GFP_NOMEMALLOC is not checked in the code path below,
-	 * specify it here to highlight that alloc_pages_nolock()
-	 * doesn't want to deplete reserves.
-	 */
-	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
-			| gfp_flags;
-	unsigned int alloc_flags = ALLOC_NOLOCK;
-	struct alloc_context ac = { };
-	struct page *page;
-
-	VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
-	/*
-	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
-	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
-	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
-	 * mark the task as the owner of another rt_spin_lock which will
-	 * confuse PI logic, so return immediately if called from hard IRQ or
-	 * NMI.
-	 *
-	 * Note, irqs_disabled() case is ok. This function can be called
-	 * from raw_spin_lock_irqsave region.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
-		return NULL;
-
-	/* On UP, spin_trylock() always succeeds even when it is locked */
-	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
-		return NULL;
-
-	if (!pcp_allowed_order(order))
-		return NULL;
-
-	/* Bailout, since _deferred_grow_zone() needs to take a lock */
-	if (deferred_pages_enabled())
-		return NULL;
-
 	if (nid == NUMA_NO_NODE)
 		nid = numa_node_id();
 
-	prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
-			    &alloc_gfp, &alloc_flags);
-
-	/*
-	 * Best effort allocation from percpu free list.
-	 * If it's empty attempt to spin_trylock zone->lock.
-	 */
-	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
-
-	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
-
-	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
-	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
-		__free_frozen_pages(page, order, FPI_TRYLOCK);
-		page = NULL;
-	}
-	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
-	kmsan_alloc_page(page, order, alloc_gfp);
-	return page;
+	return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_NOLOCK);
 }
 /**
  * alloc_pages_nolock - opportunistic reentrant allocation from any context
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index 3250d44f96457..a4f4b325381ad 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -11,6 +11,7 @@
 #include <linux/nodemask.h>
 #include <linux/types.h>
 
+#define ALLOC_DEFAULT		0
 /* The ALLOC_WMARK bits are used as an index to zone->watermark */
 #define ALLOC_WMARK_MIN		WMARK_MIN
 #define ALLOC_WMARK_LOW		WMARK_LOW
@@ -219,7 +220,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
 extern int user_min_free_kbytes;
 
 struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
-		nodemask_t *nodemask);
+		nodemask_t *nodemask, unsigned int alloc_flags);
 #define __alloc_frozen_pages(...) \
 	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
 void free_frozen_pages(struct page *page, unsigned int order);
@@ -230,7 +231,8 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
 #else
 static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
 {
-	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
+	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL,
+					   ALLOC_DEFAULT);
 }
 #endif
 
diff --git a/mm/slub.c b/mm/slub.c
index 877021e69cc41..3989b4758ae0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3292,7 +3292,8 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
 	else if (node == NUMA_NO_NODE)
 		page = alloc_frozen_pages(flags, order);
 	else
-		page = __alloc_frozen_pages(flags, order, node, NULL);
+		page = __alloc_frozen_pages(flags, order, node, NULL,
+					    ALLOC_DEFAULT);
 
 	if (!page)
 		return NULL;
@@ -5302,7 +5303,8 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
 	if (node == NUMA_NO_NODE)
 		page = alloc_frozen_pages_noprof(flags, order);
 	else
-		page = __alloc_frozen_pages_noprof(flags, order, node, NULL);
+		page = __alloc_frozen_pages_noprof(flags, order, node, NULL,
+						   ALLOC_DEFAULT);
 
 	if (page) {
 		ptr = page_address(page);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 06/18] mm/page_alloc: relax GFP WARN in nolock allocs
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (4 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

This WARN forbids setting other flags than __GFP_ACCOUNT but we
unconditionally set the ones in gfp_nolock so they are certainly fine
for the caller to set.

There are other GFP flags that are almost certainly fine to set here;
Willy noted GFP_HIGHMEM, GFP_DMA, GFP_MOVABLE and GFP_HARDWALL. But,
nolock allocation is rather special, so be conservative to try and
ensure we have a chance to think carefully before nontrivial new
usecases arise.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/linux-mm/ajS96fWbG4dzP3u3@casper.infradead.org/
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f47a848555077..c2839959d7908 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5355,7 +5355,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 		return NULL;
 
 	if (alloc_flags & ALLOC_NOLOCK) {
-		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
+		/* Certain other flags could be supported later if needed. */
+		VM_WARN_ON_ONCE(gfp & ~(__GFP_ACCOUNT | gfp_nolock));
 		if (!alloc_nolock_allowed())
 			return NULL;
 		gfp |= gfp_nolock;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 07/18] mm: move some stuff to mm/page_alloc.h
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (5 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02 10:28   ` sashiko-bot
  2026-07-02  9:49 ` [PATCH v4 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

Some of this stuff in the public header is only used internally so
shrink the scope to avoid silently growing new users.

drain_local_pages() is still used from kernel/power/snapshot.c so that
needs to stay behind.

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 include/linux/gfp.h | 26 --------------------------
 mm/page_alloc.h     | 28 ++++++++++++++++++++++++++++
 mm/vmstat.c         |  1 +
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index cdf95a9f0b87c..01d6d2591f49e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -17,28 +17,6 @@ struct mempolicy;
 #define __default_gfp(a,b,...) b
 #define default_gfp(...) __default_gfp(,##__VA_ARGS__,GFP_KERNEL)
 
-/* Convert GFP flags to their corresponding migrate type */
-#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
-#define GFP_MOVABLE_SHIFT 3
-
-static inline int gfp_migratetype(const gfp_t gfp_flags)
-{
-	VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
-	BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
-	BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
-	BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
-	BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
-		      GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
-
-	if (unlikely(page_group_by_mobility_disabled))
-		return MIGRATE_UNMOVABLE;
-
-	/* Group based on mobility */
-	return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
-}
-#undef GFP_MOVABLE_MASK
-#undef GFP_MOVABLE_SHIFT
-
 static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 {
 	return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
@@ -395,10 +373,6 @@ extern void free_pages(unsigned long addr, unsigned int order);
 #define __free_page(page) __free_pages((page), 0)
 #define free_page(addr) free_pages((addr), 0)
 
-void page_alloc_init_cpuhp(void);
-bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
-void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
-void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
 void page_alloc_init_late(void);
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index a4f4b325381ad..6bb50bae3ba27 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -266,6 +266,34 @@ static inline bool free_area_empty(struct free_area *area, int migratetype)
 	return list_empty(&area->free_list[migratetype]);
 }
 
+/* Convert GFP flags to their corresponding migrate type */
+#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
+#define GFP_MOVABLE_SHIFT 3
+
+static inline int gfp_migratetype(const gfp_t gfp_flags)
+{
+	VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
+	BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
+	BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
+	BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
+	BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
+		      GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
+
+	if (unlikely(page_group_by_mobility_disabled))
+		return MIGRATE_UNMOVABLE;
+
+	/* Group based on mobility */
+	return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
+}
+#undef GFP_MOVABLE_MASK
+#undef GFP_MOVABLE_SHIFT
+
+bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
+void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
+void drain_all_pages(struct zone *zone);
+void drain_local_pages(struct zone *zone);
+
+void page_alloc_init_cpuhp(void);
 void page_alloc_sysctl_init(void);
 
 #endif /* __MM_PAGE_ALLOC_H */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7b93fbf9af092..3b5cb1031f720 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -30,6 +30,7 @@
 #include <linux/sched/isolation.h>
 
 #include "internal.h"
+#include "page_alloc.h"
 
 #ifdef CONFIG_PROC_FS
 #ifdef CONFIG_NUMA

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 08/18] perf/x86/intel: Use higher-level allocator API
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (6 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 09/18] KVM: VMX: " Brendan Jackman
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark

The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.

The former is going away and the latter works fine here so switch over.

No functional change intended.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 arch/x86/events/intel/ds.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 91a093d8cf2e7..70be80211d823 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -832,7 +832,7 @@ static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)
 	int node = cpu_to_node(cpu);
 	struct page *page;
 
-	page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+	page = alloc_pages_node(node, flags | __GFP_ZERO, order);
 	return page ? page_address(page) : NULL;
 }
 
@@ -1088,9 +1088,9 @@ void init_arch_pebs_on_cpu(int cpu)
 
 	/*
 	 * 4KB-aligned pointer of the output buffer
-	 * (__alloc_pages_node() return page aligned address)
+	 * (alloc_pages_node() returns page aligned address)
 	 * Buffer Size = 4KB * 2^SIZE
-	 * contiguous physical buffer (__alloc_pages_node() with order)
+	 * contiguous physical buffer (alloc_pages_node() with order)
 	 */
 	arch_pebs_base = virt_to_phys(cpuc->pebs_vaddr) | PEBS_BUFFER_SHIFT;
 	wrmsrq_on_cpu(cpu, MSR_IA32_PEBS_BASE, arch_pebs_base);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 09/18] KVM: VMX: Use higher-level allocator API
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (7 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 10/18] x86/virt: " Brendan Jackman
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, Sean Christopherson, Paolo Bonzini, kvm

The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.

The former is going away and the latter works fine here so switch over.

No functional change intended.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2325be57d3d75..ad6a7fc6a54da 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3028,7 +3028,7 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
 	struct page *pages;
 	struct vmcs *vmcs;
 
-	pages = __alloc_pages_node(node, flags, 0);
+	pages = alloc_pages_node(node, flags, 0);
 	if (!pages)
 		return NULL;
 	vmcs = page_address(pages);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 10/18] x86/virt: Use higher-level allocator API
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (8 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 09/18] KVM: VMX: " Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 11/18] sgi-xp: " Brendan Jackman
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin

The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.

The former is going away and the latter works fine here so switch over.

No functional change intended.

Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 arch/x86/virt/hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index 7e9091c640be0..a236447ac7a26 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -196,7 +196,7 @@ static __init int __x86_vmx_init(void)
 		struct page *page;
 		struct vmcs *vmcs;
 
-		page = __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+		page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
 		if (WARN_ON_ONCE(!page)) {
 			x86_vmx_exit();
 			return -ENOMEM;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 11/18] sgi-xp: Use higher-level allocator API
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (9 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 10/18] x86/virt: " Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02 10:54   ` sashiko-bot
  2026-07-02  9:49 ` [PATCH v4 12/18] net/funeth: Switch to " Brendan Jackman
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, Robin Holt, Steve Wahl, Arnd Bergmann,
	Greg Kroah-Hartman

The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.

The former is going away and the latter works fine here so switch over.

No functional change intended.

Cc: Robin Holt <robinmholt@gmail.com>
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Assisted-by: Gemini:unknown-model
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 drivers/misc/sgi-xp/xpc_uv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
index 772c787268932..79c2f00ed4d70 100644
--- a/drivers/misc/sgi-xp/xpc_uv.c
+++ b/drivers/misc/sgi-xp/xpc_uv.c
@@ -170,9 +170,8 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
 	mq->mmr_blade = uv_cpu_to_blade_id(cpu);
 
 	nid = cpu_to_node(cpu);
-	page = __alloc_pages_node(nid,
-				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
-				      pg_order);
+	page = alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				pg_order);
 	if (page == NULL) {
 		dev_err(xpc_part, "xpc_create_gru_mq_uv() failed to alloc %d "
 			"bytes of memory on nid=%d for GRU mq\n", mq_size, nid);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 12/18] net/funeth: Switch to higher-level allocator API
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (10 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 11/18] sgi-xp: " Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed, Dimitris Michailidis, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni

The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.

The former is going away and the latter works fine here so switch over.

No functional change intended.

Cc: Dimitris Michailidis <dmichail@fungible.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 drivers/net/ethernet/fungible/funeth/funeth_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/fungible/funeth/funeth_rx.c b/drivers/net/ethernet/fungible/funeth/funeth_rx.c
index 7e2584895de39..d7000017ac2bd 100644
--- a/drivers/net/ethernet/fungible/funeth/funeth_rx.c
+++ b/drivers/net/ethernet/fungible/funeth/funeth_rx.c
@@ -103,7 +103,7 @@ static int funeth_alloc_page(struct funeth_rxq *q, struct funeth_rxbuf *rb,
 	if (cache_get(q, rb))
 		return 0;
 
-	p = __alloc_pages_node(node, gfp | __GFP_NOWARN, 0);
+	p = alloc_pages_node(node, gfp | __GFP_NOWARN, 0);
 	if (unlikely(!p))
 		return -ENOMEM;
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 13/18] mm: Remove __alloc_pages_node()
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (11 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 12/18] net/funeth: Switch to " Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02 11:11   ` sashiko-bot
  2026-07-02  9:49 ` [PATCH v4 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

There were only a few users, which have been removed. The only advantage
of this API over alloc_pages_node() is avoiding a single conditional
branch. The disadvantages are:

1. More API surface, more sources of confusion, more maintenance.

2. Worse impact of CPU hotplug bugs: most users of __alloc_pages_node()
   were using the result of cpu_to_node(); if the CPU gets hotplugged
   out this will return NUMA_NO_NODE. If one of these paths fails to
   protect against a concurrent hotplug then page_alloc.c will use
   NUMA_NO_NODE as an index into NODE_DATA() and cause some horrible
   memory corruption or other. With alloc_pages_node(), the code might
   just work fine.

Ulterior motive: this frees up the __* variants of the allocator APIs to
serve specifically for use as mm-internal API.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 include/linux/gfp.h | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 01d6d2591f49e..3bf55a5f9143e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -256,21 +256,6 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
 	dump_stack();
 }
 
-/*
- * Allocate pages, preferring the node given as nid. The node must be valid and
- * online. For more general interface, see alloc_pages_node().
- */
-static inline struct page *
-__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
-{
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
-	warn_if_node_offline(nid, gfp_mask);
-
-	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
-}
-
-#define  __alloc_pages_node(...)		alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
-
 static inline
 struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
 {
@@ -293,7 +278,10 @@ static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 
-	return __alloc_pages_node_noprof(nid, gfp_mask, order);
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	warn_if_node_offline(nid, gfp_mask);
+
+	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
 }
 
 #define  alloc_pages_node(...)			alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 14/18] mm: Move __alloc_pages() to mm/page_alloc.h
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (12 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

It's no longer used outside of mm/.

Since this means __alloc_pages_noprof() is no longer visible from gfp.h,
this also means moving the definition of alloc_pages_node_noprof into
the .c file.

Also remove references to this API from the documentation tree -
referring to the specific function name was already questionable but
now the function is not even public it definitely seems wrong.

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 Documentation/admin-guide/cgroup-v1/cpusets.rst |  2 +-
 Documentation/admin-guide/mm/transhuge.rst      |  2 +-
 include/linux/gfp.h                             | 16 +---------------
 mm/page_alloc.c                                 | 13 ++++++++++++-
 mm/page_alloc.h                                 |  4 ++++
 5 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
index c7909e5ac1361..52a213aff04e5 100644
--- a/Documentation/admin-guide/cgroup-v1/cpusets.rst
+++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
@@ -284,7 +284,7 @@ take action.
 ==>
     Unless this feature is enabled by writing "1" to the special file
     /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
-    code of __alloc_pages() for this metric reduces to simply noticing
+    code of the page allocator for this metric reduces to simply noticing
     that the cpuset_memory_pressure_enabled flag is zero.  So only
     systems that enable this feature will compute the metric.
 
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 23f8d13c2629d..16f37135ed80d 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -761,7 +761,7 @@ compact_fail
 	but failed.
 
 It is possible to establish how long the stalls were using the function
-tracer to record how long was spent in __alloc_pages() and
+tracer to record how long was spent in the page allocator and
 using the mm_page_alloc tracepoint to identify which allocations were
 for huge pages.
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3bf55a5f9143e..4d57e9c0bf204 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -204,10 +204,6 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
-struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
-		nodemask_t *nodemask);
-#define __alloc_pages(...)			alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
-
 struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
 		nodemask_t *nodemask);
 #define __folio_alloc(...)			alloc_hooks(__folio_alloc_noprof(__VA_ARGS__))
@@ -272,17 +268,7 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
  * prefer the current CPU's closest node. Otherwise node must be valid and
  * online.
  */
-static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
-						   unsigned int order)
-{
-	if (nid == NUMA_NO_NODE)
-		nid = numa_mem_id();
-
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
-	warn_if_node_offline(nid, gfp_mask);
-
-	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
-}
+struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order);
 
 #define  alloc_pages_node(...)			alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c2839959d7908..f68b2b138a2e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5431,7 +5431,18 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
 		set_page_refcounted(page);
 	return page;
 }
-EXPORT_SYMBOL(__alloc_pages_noprof);
+
+struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
+{
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	warn_if_node_offline(nid, gfp_mask);
+
+	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
+}
+EXPORT_SYMBOL(alloc_pages_node_noprof);
 
 struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
 		nodemask_t *nodemask)
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index 6bb50bae3ba27..3b8a4709b1497 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -244,6 +244,10 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
 	alloc_hooks(alloc_frozen_pages_nolock_noprof(__VA_ARGS__))
 void free_frozen_pages_nolock(struct page *page, unsigned int order);
 
+struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
+		nodemask_t *nodemask);
+#define __alloc_pages(...)			alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
+
 extern void zone_pcp_reset(struct zone *zone);
 extern void zone_pcp_disable(struct zone *zone);
 extern void zone_pcp_enable(struct zone *zone);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (13 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 16/18] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

Now that alloc_pages has an entrypoint that allows passing alloc_flags,
we can take advantage of this to start removing GFP flags that are only
used for mm-internal stuff.

This requires also plumbing the alloc_flags into some more of the
allocator code, in particular __alloc_pages[_noprof]() gets an
alloc_flags arg to go along with its callees, and we now need to pass
those flags deeper into the allocator so they can reach the alloc_tag
code.

While moving the flag definition into page_alloc.h, also update the
comment per Hao's suggestion.

No functional change intended.

Link: https://lore.kernel.org/all/b4916118-3537-4e19-8bc8-1d103dd0d225@linux.dev/
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 include/linux/alloc_tag.h |  4 ++--
 mm/alloc_tag.c            | 23 +++++++--------------
 mm/compaction.c           |  4 ++--
 mm/internal.h             |  1 -
 mm/page_alloc.c           | 52 +++++++++++++++++++++++++++--------------------
 mm/page_alloc.h           | 14 +++++++++++--
 mm/page_frag_cache.c      |  4 ++--
 7 files changed, 55 insertions(+), 47 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 068ba2e77c5d6..fcf90e6b24204 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -163,11 +163,11 @@ static inline void alloc_tag_sub_check(union codetag_ref *ref)
 {
 	WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
 }
-void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags);
+void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags);
 #else
 static inline void alloc_tag_add_check(union codetag_ref *ref, struct alloc_tag *tag) {}
 static inline void alloc_tag_sub_check(union codetag_ref *ref) {}
-static inline void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags) {}
+static inline void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags) {}
 #endif
 
 /* Caller should verify both ref and tag to be valid */
diff --git a/mm/alloc_tag.c b/mm/alloc_tag.c
index d9be1cf5187d9..cf65e9992fda3 100644
--- a/mm/alloc_tag.c
+++ b/mm/alloc_tag.c
@@ -15,6 +15,9 @@
 #include <linux/vmalloc.h>
 #include <linux/kmemleak.h>
 
+#include "internal.h"
+#include "page_alloc.h"
+
 #define ALLOCINFO_FILE_NAME		"allocinfo"
 #define MODULE_ALLOC_TAG_VMAP_SIZE	(100000UL * sizeof(struct alloc_tag))
 #define SECTION_START(NAME)		(CODETAG_SECTION_START_PREFIX NAME)
@@ -783,19 +786,6 @@ struct pfn_pool {
 
 #define PFN_POOL_SIZE			((PAGE_SIZE - offsetof(struct pfn_pool, pfns)) / \
 					 sizeof(unsigned long))
-
-/*
- * Skip early PFN recording for a page allocation.  Reuses the
- * %__GFP_NO_OBJ_EXT bit.  Used by __alloc_tag_add_early_pfn() to avoid
- * recursion when allocating pages for the early PFN tracking list
- * itself.
- *
- * Codetags of the pages allocated with __GFP_NO_CODETAG should be
- * cleared (via clear_page_tag_ref()) before freeing the pages to prevent
- * alloc_tag_sub_check() from triggering a warning.
- */
-#define __GFP_NO_CODETAG		__GFP_NO_OBJ_EXT
-
 static struct pfn_pool *current_pfn_pool __initdata;
 
 static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
@@ -806,7 +796,8 @@ static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
 	do {
 		pool = READ_ONCE(current_pfn_pool);
 		if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
-			struct page *new_page = alloc_page(__GFP_HIGH | __GFP_NO_CODETAG);
+			struct page *new_page = __alloc_pages(__GFP_HIGH, 0, numa_mem_id(),
+							      NULL, ALLOC_NO_CODETAG);
 			struct pfn_pool *new;
 
 			if (!new_page) {
@@ -837,7 +828,7 @@ typedef void alloc_tag_add_func(unsigned long pfn);
 static alloc_tag_add_func __rcu *alloc_tag_add_early_pfn_ptr __refdata =
 	RCU_INITIALIZER(__alloc_tag_add_early_pfn);
 
-void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
+void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags)
 {
 	alloc_tag_add_func *alloc_tag_add;
 
@@ -845,7 +836,7 @@ void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
 		return;
 
 	/* Skip allocations for the tracking list itself to avoid recursion. */
-	if (gfp_flags & __GFP_NO_CODETAG)
+	if (alloc_flags & ALLOC_NO_CODETAG)
 		return;
 
 	rcu_read_lock();
diff --git a/mm/compaction.c b/mm/compaction.c
index 7d80735502d9a..4b2318fad4eb5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -83,7 +83,7 @@ static inline bool is_via_compact_memory(int order) { return false; }
 
 static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
 {
-	post_alloc_hook(page, order, __GFP_MOVABLE);
+	post_alloc_hook(page, order, __GFP_MOVABLE, ALLOC_DEFAULT);
 	set_page_refcounted(page);
 	return page;
 }
@@ -1851,7 +1851,7 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
 	}
 	dst = (struct folio *)freepage;
 
-	post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
+	post_alloc_hook(&dst->page, order, __GFP_MOVABLE, ALLOC_DEFAULT);
 	set_page_refcounted(&dst->page);
 	if (order)
 		prep_compound_page(&dst->page, order);
diff --git a/mm/internal.h b/mm/internal.h
index 7e3b2386e274b..3c00eaf5f45a4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1237,7 +1237,6 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 enum ttu_flags;
 struct tlbflush_unmap_batch;
 
-
 /*
  * only for MM internal work items which do not depend on
  * any allocations or locks which might depend on allocations
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f68b2b138a2e8..cfaf16244f56d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1249,7 +1249,7 @@ void __clear_page_tag_ref(struct page *page)
 /* Should be called only if mem_alloc_profiling_enabled() */
 static noinline
 void __pgalloc_tag_add(struct page *page, struct task_struct *task,
-		       unsigned int nr, gfp_t gfp_flags)
+		       unsigned int nr, unsigned int alloc_flags)
 {
 	union pgtag_ref_handle handle;
 	union codetag_ref ref;
@@ -1263,17 +1263,17 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
 		 * page_ext is not available yet, record the pfn so we can
 		 * clear the tag ref later when page_ext is initialized.
 		 */
-		alloc_tag_add_early_pfn(page_to_pfn(page), gfp_flags);
+		alloc_tag_add_early_pfn(page_to_pfn(page), alloc_flags);
 		if (task->alloc_tag)
 			alloc_tag_set_inaccurate(task->alloc_tag);
 	}
 }
 
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
-				   unsigned int nr, gfp_t gfp_flags)
+				   unsigned int nr, unsigned int alloc_flags)
 {
 	if (mem_alloc_profiling_enabled())
-		__pgalloc_tag_add(page, task, nr, gfp_flags);
+		__pgalloc_tag_add(page, task, nr, alloc_flags);
 }
 
 /* Should be called only if mem_alloc_profiling_enabled() */
@@ -1306,7 +1306,7 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
-				   unsigned int nr, gfp_t gfp_flags) {}
+				   unsigned int nr, unsigned int alloc_flags) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
 static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
 
@@ -1810,7 +1810,7 @@ static inline bool should_skip_init(gfp_t flags)
 }
 
 inline void post_alloc_hook(struct page *page, unsigned int order,
-				gfp_t gfp_flags)
+				gfp_t gfp_flags, unsigned int alloc_flags)
 {
 	const bool zero_tags = gfp_flags & __GFP_ZEROTAGS;
 	bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
@@ -1861,13 +1861,13 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 
 	set_page_owner(page, order, gfp_flags);
 	page_table_check_alloc(page, order);
-	pgalloc_tag_add(page, current, 1 << order, gfp_flags);
+	pgalloc_tag_add(page, current, 1 << order, alloc_flags);
 }
 
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
 							unsigned int alloc_flags)
 {
-	post_alloc_hook(page, order, gfp_flags);
+	post_alloc_hook(page, order, gfp_flags, alloc_flags);
 
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
@@ -4078,7 +4078,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	 */
 	page = get_page_from_freelist((gfp_mask | __GFP_HARDWALL) &
 				      ~__GFP_DIRECT_RECLAIM, order,
-				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
+				      ac->alloc_flags|ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
 	if (page)
 		goto out;
 
@@ -4124,7 +4124,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		 */
 		if (gfp_mask & __GFP_NOFAIL)
 			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
-					ALLOC_NO_WATERMARKS, ac);
+					ac->alloc_flags|ALLOC_NO_WATERMARKS, ac);
 	}
 out:
 	mutex_unlock(&oom_lock);
@@ -4791,8 +4791,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * The fast path uses conservative alloc_flags to succeed only until
 	 * kswapd needs to be woken up, and to avoid the cost of setting up
 	 * alloc_flags precisely. So we do that now.
+	 *
+	 * Can't just or alloc_flags if it contains WMARK bits, but those flags
+	 * shouldn't be set in ac->alloc_flags.
 	 */
-	alloc_flags = alloc_flags_slowpath(gfp_mask, order);
+	VM_WARN_ON(ac->alloc_flags & ALLOC_WMARK_MASK);
+	alloc_flags = ac->alloc_flags | alloc_flags_slowpath(gfp_mask, order);
 
 	/*
 	 * We need to recalculate the starting point for the zonelist iterator
@@ -4834,7 +4838,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
 	if (reserve_flags)
 		alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
-					  (alloc_flags & ALLOC_KSWAPD);
+				ac->alloc_flags | (alloc_flags & ALLOC_KSWAPD);
 
 	/*
 	 * Reset the nodemask and zonelist iterators if memory policies can be
@@ -5003,6 +5007,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * we always retry
 	 */
 	if (unlikely(nofail)) {
+		unsigned int alloc_flags = ac->alloc_flags | ALLOC_MIN_RESERVE;
+
 		/*
 		 * Lacking direct_reclaim we can't do anything to reclaim memory,
 		 * we disregard these unreasonable nofail requests and still
@@ -5018,7 +5024,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * could deplete whole memory reserves which would just make
 		 * the situation worse.
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, alloc_flags, ac);
 		if (page)
 			goto got_pg;
 
@@ -5236,7 +5242,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
 	return nr_populated;
 
 failed:
-	page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask);
+	page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask, ALLOC_DEFAULT);
 	if (page)
 		page_array[nr_populated++] = page;
 	goto out;
@@ -5344,11 +5350,13 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 {
 	struct page *page;
 	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
-	struct alloc_context ac = { };
+	struct alloc_context ac = {
+		.alloc_flags = alloc_flags,
+	};
 	unsigned int fastpath_alloc_flags = alloc_flags;
 
 	/* Other flags could be supported later if needed. */
-	if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
+	if (WARN_ON(alloc_flags & ~(ALLOC_NOLOCK | ALLOC_NO_CODETAG)))
 		return NULL;
 
 	if (!alloc_order_allowed(gfp, order, alloc_flags))
@@ -5421,12 +5429,12 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 EXPORT_SYMBOL(__alloc_frozen_pages_noprof);
 
 struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
-		int preferred_nid, nodemask_t *nodemask)
+		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
 {
 	struct page *page;
 
 	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
-					   ALLOC_DEFAULT);
+					   alloc_flags);
 	if (page)
 		set_page_refcounted(page);
 	return page;
@@ -5440,7 +5448,7 @@ struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order
 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
 	warn_if_node_offline(nid, gfp_mask);
 
-	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
+	return __alloc_pages_noprof(gfp_mask, order, nid, NULL, ALLOC_DEFAULT);
 }
 EXPORT_SYMBOL(alloc_pages_node_noprof);
 
@@ -5448,7 +5456,7 @@ struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_
 		nodemask_t *nodemask)
 {
 	struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
-					preferred_nid, nodemask);
+					preferred_nid, nodemask, ALLOC_DEFAULT);
 	return page_rmappable_folio(page);
 }
 EXPORT_SYMBOL(__folio_alloc_noprof);
@@ -7130,7 +7138,7 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
 		list_for_each_entry_safe(page, next, &list[order], lru) {
 			int i;
 
-			post_alloc_hook(page, order, gfp_mask);
+			post_alloc_hook(page, order, gfp_mask, ALLOC_DEFAULT);
 			if (!order)
 				continue;
 
@@ -7335,7 +7343,7 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
 		struct page *head = pfn_to_page(start);
 
 		check_new_pages(head, order);
-		prep_new_page(head, order, gfp_mask, 0);
+		prep_new_page(head, order, gfp_mask, ALLOC_DEFAULT);
 	} else {
 		ret = -EINVAL;
 		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index 3b8a4709b1497..06f8b6f150cdf 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -49,6 +49,13 @@
 #define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
 #define ALLOC_NOLOCK		0x400 /* Only use spin_trylock in allocation path */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+/*
+ * Avoid alloc_tag recursion for internal allocations.
+ *
+ * Callers must clear_page_tag_ref() before freeing to avoid warnings from
+ * alloc_tag_sub_check().
+ */
+#define ALLOC_NO_CODETAG       0x1000
 
 /* Flags that allow allocations below the min watermark. */
 #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
@@ -84,6 +91,8 @@ struct alloc_context {
 	 */
 	enum zone_type highest_zoneidx;
 	bool spread_dirty_pages;
+	/* Only flags that are global to the whole allocation go here. */
+	unsigned int alloc_flags;
 };
 
 /*
@@ -214,7 +223,8 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 extern void __free_pages_core(struct page *page, unsigned int order,
 		enum meminit_context context);
 
-void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
+void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
+		     unsigned int alloc_flags);
 extern bool free_pages_prepare(struct page *page, unsigned int order);
 
 extern int user_min_free_kbytes;
@@ -245,7 +255,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
 void free_frozen_pages_nolock(struct page *page, unsigned int order);
 
 struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
-		nodemask_t *nodemask);
+		nodemask_t *nodemask, unsigned int alloc_flags);
 #define __alloc_pages(...)			alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
 
 extern void zone_pcp_reset(struct zone *zone);
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index a1077cef3a791..e63efe78b7d4b 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -57,10 +57,10 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 	gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
 		   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
 	page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
-			     numa_mem_id(), NULL);
+			     numa_mem_id(), NULL, ALLOC_DEFAULT);
 #endif
 	if (unlikely(!page)) {
-		page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
+		page = __alloc_pages(gfp, 0, numa_mem_id(), NULL, ALLOC_DEFAULT);
 		order = 0;
 	}
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 16/18] mm: remove the __GFP_NO_OBJ_EXT flag
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (14 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 18/18] mm: factor out can_spin_trylock() Brendan Jackman
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>

All users of the flag are converted to SLAB_ALLOC_NO_RECURSE or
ALLOC_NO_CODETAG (from __GFP_NO_CODETAG which reused the NO_OBJ_EXT bit).
Free up the flag bit.

Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
[Rebased onto __GFP_NO_CODETAG removal]
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 include/linux/gfp_types.h       |  7 -------
 include/trace/events/mmflags.h  | 10 +---------
 tools/include/linux/gfp_types.h |  7 -------
 3 files changed, 1 insertion(+), 23 deletions(-)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 463b551d12d99..190191411009f 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -55,7 +55,6 @@ enum {
 #ifdef CONFIG_LOCKDEP
 	___GFP_NOLOCKDEP_BIT,
 #endif
-	___GFP_NO_OBJ_EXT_BIT,
 	___GFP_LAST_BIT
 };
 
@@ -96,7 +95,6 @@ enum {
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
-#define ___GFP_NO_OBJ_EXT       BIT(___GFP_NO_OBJ_EXT_BIT)
 
 /*
  * Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -138,17 +136,12 @@ enum {
  *
  * %__GFP_ACCOUNT causes the allocation to be accounted to the active
  * cgroup context.
- *
- * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
- * mark_obj_codetag_empty() should be called upon freeing for objects allocated
- * with this flag to indicate that their NULL tags are expected and normal.
  */
 #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
 #define __GFP_WRITE	((__force gfp_t)___GFP_WRITE)
 #define __GFP_HARDWALL   ((__force gfp_t)___GFP_HARDWALL)
 #define __GFP_THISNODE	((__force gfp_t)___GFP_THISNODE)
 #define __GFP_ACCOUNT	((__force gfp_t)___GFP_ACCOUNT)
-#define __GFP_NO_OBJ_EXT   ((__force gfp_t)___GFP_NO_OBJ_EXT)
 
 /**
  * DOC: Watermark modifiers
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a6e5a44c9b429..c1a05ff0feab0 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -54,18 +54,10 @@
 # define TRACE_GFP_FLAGS_LOCKDEP
 #endif
 
-#ifdef CONFIG_SLAB_OBJ_EXT
-# define TRACE_GFP_FLAGS_SLAB			\
-	TRACE_GFP_EM(NO_OBJ_EXT)
-#else
-# define TRACE_GFP_FLAGS_SLAB
-#endif
-
 #define TRACE_GFP_FLAGS				\
 	TRACE_GFP_FLAGS_GENERAL			\
 	TRACE_GFP_FLAGS_KASAN			\
-	TRACE_GFP_FLAGS_LOCKDEP			\
-	TRACE_GFP_FLAGS_SLAB
+	TRACE_GFP_FLAGS_LOCKDEP
 
 #undef TRACE_GFP_EM
 #define TRACE_GFP_EM(a) TRACE_DEFINE_ENUM(___GFP_##a##_BIT);
diff --git a/tools/include/linux/gfp_types.h b/tools/include/linux/gfp_types.h
index 6c75df30a281d..a93b8bd200b76 100644
--- a/tools/include/linux/gfp_types.h
+++ b/tools/include/linux/gfp_types.h
@@ -55,7 +55,6 @@ enum {
 #ifdef CONFIG_LOCKDEP
 	___GFP_NOLOCKDEP_BIT,
 #endif
-	___GFP_NO_OBJ_EXT_BIT,
 	___GFP_LAST_BIT
 };
 
@@ -96,7 +95,6 @@ enum {
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
-#define ___GFP_NO_OBJ_EXT       BIT(___GFP_NO_OBJ_EXT_BIT)
 
 /*
  * Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -137,17 +135,12 @@ enum {
  * node with no fallbacks or placement policy enforcements.
  *
  * %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
- *
- * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
- * mark_obj_codetag_empty() should be called upon freeing for objects allocated
- * with this flag to indicate that their NULL tags are expected and normal.
  */
 #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
 #define __GFP_WRITE	((__force gfp_t)___GFP_WRITE)
 #define __GFP_HARDWALL   ((__force gfp_t)___GFP_HARDWALL)
 #define __GFP_THISNODE	((__force gfp_t)___GFP_THISNODE)
 #define __GFP_ACCOUNT	((__force gfp_t)___GFP_ACCOUNT)
-#define __GFP_NO_OBJ_EXT   ((__force gfp_t)___GFP_NO_OBJ_EXT)
 
 /**
  * DOC: Watermark modifiers

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma()
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (15 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 16/18] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02  9:49 ` [PATCH v4 18/18] mm: factor out can_spin_trylock() Brendan Jackman
  17 siblings, 0 replies; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

To align the style with other alloc_flags_*() functions, drop this
additive argument and just have the callers do that themselves.

Note you can't always freely or alloc_flags like these callers do
(because of the WMARK bits that encode an enum) but this is fine for
ALLOC_CMA, just like it's fine for e.g. ALLOC_NON_BLOCK returned by
alloc_flags_nonblocking() and or'd by its caller.

Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Link: https://lore.kernel.org/all/5dcdd1ef-21ad-4ed0-9e8a-0e5cf96b4392@kernel.org/
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/page_alloc.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfaf16244f56d..c3b246e67ed14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3775,14 +3775,13 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
 }
 
 /* Must be called after current_gfp_context() which can change gfp_mask */
-static inline unsigned int alloc_flags_cma(gfp_t gfp_mask,
-					   unsigned int alloc_flags)
+static inline unsigned int alloc_flags_cma(gfp_t gfp_mask)
 {
 #ifdef CONFIG_CMA
 	if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
+		return ALLOC_CMA;
 #endif
-	return alloc_flags;
+	return ALLOC_DEFAULT;
 }
 
 /*
@@ -4526,7 +4525,7 @@ alloc_flags_slowpath(gfp_t gfp_mask, unsigned int order)
 	} else if (unlikely(rt_or_dl_task(current)) && in_task())
 		alloc_flags |= ALLOC_MIN_RESERVE;
 
-	alloc_flags = alloc_flags_cma(gfp_mask, alloc_flags);
+	alloc_flags |= alloc_flags_cma(gfp_mask);
 
 	if (defrag_mode)
 		alloc_flags |= ALLOC_NOFRAGMENT;
@@ -4837,7 +4836,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
 	if (reserve_flags)
-		alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
+		alloc_flags = alloc_flags_cma(gfp_mask) | reserve_flags |
 				ac->alloc_flags | (alloc_flags & ALLOC_KSWAPD);
 
 	/*
@@ -5070,7 +5069,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	    should_fail_alloc_page(gfp_mask, order))
 		return false;
 
-	*alloc_flags = alloc_flags_cma(gfp_mask, *alloc_flags);
+	*alloc_flags |= alloc_flags_cma(gfp_mask);
 
 	/* Dirty zone balancing only done in the fast path */
 	ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 18/18] mm: factor out can_spin_trylock()
  2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
                   ` (16 preceding siblings ...)
  2026-07-02  9:49 ` [PATCH v4 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
@ 2026-07-02  9:49 ` Brendan Jackman
  2026-07-02 12:28   ` sashiko-bot
  17 siblings, 1 reply; 23+ messages in thread
From: Brendan Jackman @ 2026-07-02  9:49 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
	Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
	linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
	Yosry Ahmed

Deduplicate checks for whether the current context is safe for
spin_trylock().

Does this function really belong in mm/internal.h or is it generic? Not
sure. If someone ends up duplicating this logic elsewhere in the kernel,
that would be a shame. But goes in some generic header, someone treats
it as documentation about where it's guaranteed safe to spin_trylock(),
and then it emerges that there are other subtle preconditions that
didn't affect the mm usecase, that would be worse. So, just be
conservative and keep it local.

Suggested-by: Harry Yoo <harry@kernel.org>
Link: https://lore.kernel.org/all/397859cb-b127-4cc6-9c71-044afc99bf0c@kernel.org/
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/internal.h   | 23 +++++++++++++++++++++++
 mm/page_alloc.c | 17 +----------------
 mm/slub.c       | 10 +---------
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3c00eaf5f45a4..e6f300693ffd7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1715,4 +1715,27 @@ static inline void mm_prepare_for_swap_entries(struct mm_struct *mm)
 	}
 }
 
+static inline bool can_spin_trylock(void)
+{
+	/*
+	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
+	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
+	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
+	 * mark the task as the owner of another rt_spin_lock which will
+	 * confuse PI logic, so return immediately if called from hard IRQ or
+	 * NMI.
+	 *
+	 * Note, irqs_disabled() case is ok. spin_trylock() can be called
+	 * from raw_spin_lock_irqsave region.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
+		return false;
+
+	/* On UP, spin_trylock() always succeeds even when it is locked */
+	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+		return false;
+
+	return true;
+}
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3b246e67ed14..a63733dac659e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5291,22 +5291,7 @@ static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
 
 static inline bool alloc_nolock_allowed(void)
 {
-	/*
-	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
-	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
-	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
-	 * mark the task as the owner of another rt_spin_lock which will
-	 * confuse PI logic, so return immediately if called from hard IRQ or
-	 * NMI.
-	 *
-	 * Note, irqs_disabled() case is ok. This function can be called
-	 * from raw_spin_lock_irqsave region.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
-		return false;
-
-	/* On UP, spin_trylock() always succeeds even when it is locked */
-	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+	if (!can_spin_trylock())
 		return false;
 
 	/* Bailout, since _deferred_grow_zone() needs to take a lock */
diff --git a/mm/slub.c b/mm/slub.c
index 3989b4758ae0a..b19dc46de73c5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5408,15 +5408,7 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
 
-	/*
-	 * See the comment for the same check in
-	 * alloc_frozen_pages_nolock_noprof()
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
-		return NULL;
-
-	/* On UP, spin_trylock() always succeeds even when it is locked */
-	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+	if (!can_spin_trylock())
 		return NULL;
 
 retry:

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 07/18] mm: move some stuff to mm/page_alloc.h
  2026-07-02  9:49 ` [PATCH v4 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
@ 2026-07-02 10:28   ` sashiko-bot
  0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-07-02 10:28 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Duplicate declaration of `drain_local_pages()` in `mm/page_alloc.h` contradicts the commit message.
--

commit 54f4dc2300a55c6c9cef3e3916ba57e09ad07e39
Author: Brendan Jackman <jackmanb@google.com>

mm: move some stuff to mm/page_alloc.h

This commit shrinks the scope of several internal functions by moving their
declarations from the public include/linux/gfp.h header to the internal
mm/page_alloc.h header.

> diff --git a/mm/page_alloc.h b/mm/page_alloc.h
> index a4f4b325381ad..6bb50bae3ba27 100644
> --- a/mm/page_alloc.h
> +++ b/mm/page_alloc.h

[ ... ]

> @@ -266,6 +266,34 @@ static inline bool free_area_empty(struct free_area *area, int migratetype)
>  	return list_empty(&area->free_list[migratetype]);
>  }
>  
> +/* Convert GFP flags to their corresponding migrate type */
> +#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)

[ ... ]

> +bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
> +void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
> +void drain_all_pages(struct zone *zone);
> +void drain_local_pages(struct zone *zone);

[Severity: Low]
The commit message notes that drain_local_pages() is still used externally
and needs to stay behind in include/linux/gfp.h. Since it was successfully
left in the public header, is this declaration in mm/page_alloc.h an
accidental duplicate?

> +
> +void page_alloc_init_cpuhp(void);
>  void page_alloc_sysctl_init(void);
>  
>  #endif /* __MM_PAGE_ALLOC_H */

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260702-alloc-trylock-v4-0-0af8ff387e80@google.com?part=7

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 11/18] sgi-xp: Use higher-level allocator API
  2026-07-02  9:49 ` [PATCH v4 11/18] sgi-xp: " Brendan Jackman
@ 2026-07-02 10:54   ` sashiko-bot
  0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-07-02 10:54 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] xpc_init_mq_node() incorrectly passes a NUMA node ID (nid) in place of a CPU ID to xpc_create_gru_mq_uv().
- [High] xpc_init_mq_node() fails to catch empty CPU masks, leaving global MQ pointers NULL and reporting false success.
--

commit 4df0c24c50fd7156f7d0c39d45d908827e5063f5
Author: Brendan Jackman <jackmanb@google.com>

sgi-xp: Use higher-level allocator API

This patch updates the memory allocation in xpc_create_gru_mq_uv() by
replacing __alloc_pages_node() with alloc_pages_node(). It aligns with
the deprecation of the former while preserving the existing NUMA-aware
memory allocation behavior.

> diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
> index 772c787268932..79c2f00ed4d70 100644
> --- a/drivers/misc/sgi-xp/xpc_uv.c
> +++ b/drivers/misc/sgi-xp/xpc_uv.c
[ ... ]
> @@ -170,9 +170,8 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
>  	mq->mmr_blade = uv_cpu_to_blade_id(cpu);
>  
>  	nid = cpu_to_node(cpu);
> -	page = __alloc_pages_node(nid,
> -				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
> -				      pg_order);
> +	page = alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
> +				pg_order);

[Severity: High]
This isn't a bug introduced by this patch, but looking at how this function
is called from xpc_init_mq_node(), is the correct CPU being passed in?

drivers/misc/sgi-xp/xpc_uv.c:xpc_init_mq_node() {
	for_each_cpu(cpu, cpumask_of_node(nid)) {
		xpc_activate_mq_uv =
			xpc_create_gru_mq_uv(XPC_ACTIVATE_MQ_SIZE_UV, nid,
					     XPC_ACTIVATE_IRQ_NAME,
					     xpc_handle_activate_IRQ_uv);
        ...

It appears the loop invariant nid is passed as the cpu argument instead of
the cpu variable. Could this cause the message queue to be initialized
using the wrong physical topology resources, scrambling routing and IRQ
configuration since xpc_create_gru_mq_uv() uses the provided cpu argument for
uv_cpu_to_blade_id(), cpu_to_node(), and xpc_get_gru_mq_irq_uv()?

[Severity: High]
This is also a pre-existing issue, but does xpc_init_mq_node() properly
handle online NUMA nodes with no CPUs?

If the cpumask for the node is empty, the for_each_cpu() loop will not
execute:

drivers/misc/sgi-xp/xpc_uv.c:xpc_init_mq_node() {
	for_each_cpu(cpu, cpumask_of_node(nid)) {
		...
		if (!IS_ERR(xpc_activate_mq_uv))
			break;
	}
	if (IS_ERR(xpc_activate_mq_uv)) {
		cpus_read_unlock();
		return PTR_ERR(xpc_activate_mq_uv);
	}
    ...

Since xpc_activate_mq_uv is initially NULL, IS_ERR(NULL) evaluates to false.
Will this bypass the error check, report false success, and leave the
pointer NULL, leading to a panic when it is dereferenced later in the driver
lifecycle?

>  	if (page == NULL) {
>  		dev_err(xpc_part, "xpc_create_gru_mq_uv() failed to alloc %d "

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260702-alloc-trylock-v4-0-0af8ff387e80@google.com?part=11

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 13/18] mm: Remove __alloc_pages_node()
  2026-07-02  9:49 ` [PATCH v4 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
@ 2026-07-02 11:11   ` sashiko-bot
  0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-07-02 11:11 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-kernel, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The patch removes `__alloc_pages_node` because it is unsafe to use with `cpu_to_node()`, which can return `NUMA_NO_NODE` (-1) during CPU hotplug, but leaves the equivalent `__folio_alloc_node` untouched, which suffers from the exact same vulnerability.
--

commit 584eaae6f16d0eb43140d0b202665ffc9212c61e
Author: Brendan Jackman <jackmanb@google.com>

mm: Remove __alloc_pages_node()

This commit removes __alloc_pages_node() since it is prone to memory
corruption on CPU hotplug if callers pass NUMA_NO_NODE from cpu_to_node().
Removing it leaves the __* variants strictly for internal memory management.

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 01d6d2591f49e..3bf55a5f9143e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
[ ... ]
> @@ -256,21 +256,6 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
>  	dump_stack();
>  }
>  
> -/*
> - * Allocate pages, preferring the node given as nid. The node must be valid and
> - * online. For more general interface, see alloc_pages_node().
> - */
> -static inline struct page *
> -__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
> -{
> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> -	warn_if_node_offline(nid, gfp_mask);
> -
> -	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
> -}
> -
> -#define  __alloc_pages_node(...)		alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
> -
>  static inline
>  struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
>  {

[Severity: High]
This isn't a bug introduced by this patch, but does __folio_alloc_node_noprof()
suffer from the exact same vulnerability that motivated removing
__alloc_pages_node_noprof()?

If NUMA_NO_NODE (-1) is passed here, it triggers VM_BUG_ON(nid < 0) or bypasses
it in non-debug builds, leading to out-of-bounds array accesses when evaluating
warn_if_node_offline() and NODE_DATA(-1).

Looking at net/sunrpc/svc.c, there is a path that passes the result of
cpu_to_node() directly to __folio_alloc_node() without checking for
NUMA_NO_NODE:

net/sunrpc/svc.c:svc_prepare_thread() {
    ...
    node = svc_pool_map_get_node(pool->sp_id);
    rqstp->rq_scratch_folio = __folio_alloc_node(GFP_KERNEL, 0, node);
    ...
}

During concurrent CPU hotplug, svc_pool_map_get_node() can return NUMA_NO_NODE.
Should __folio_alloc_node() be removed or fixed in this patch as well to fully
eliminate this class of bug?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260702-alloc-trylock-v4-0-0af8ff387e80@google.com?part=13

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 18/18] mm: factor out can_spin_trylock()
  2026-07-02  9:49 ` [PATCH v4 18/18] mm: factor out can_spin_trylock() Brendan Jackman
@ 2026-07-02 12:28   ` sashiko-bot
  0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-07-02 12:28 UTC (permalink / raw)
  To: Brendan Jackman; +Cc: linux-kernel, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The refactoring to deduplicate spin_trylock safety checks into can_spin_trylock() missed __free_frozen_pages(), leaving a pre-existing UP NMI corruption bug unpatched.
--

Note: The format of this report has been altered to provide a free-form
summary of the findings due to recitation restrictions.

commit a99a7a468132ab177e1233c7e2f5644523ace0a7
Author: Brendan Jackman <jackmanb@google.com>
Subject: mm: factor out can_spin_trylock()

This commit introduces can_spin_trylock() to deduplicate checks for whether 
the current context is safe for spin_trylock(). It updates 
alloc_nolock_allowed() in mm/page_alloc.c and __kmalloc_nolock_noprof() in 
mm/slub.c to use this newly factored helper.

Link: https://lore.kernel.org/all/397859cb-b127-4cc6-9c71-044afc99bf0c@kernel.org/

[Severity: High]

This isn't a bug introduced by this patch, but the deduplication refactoring 
appears to have missed an open-coded check in __free_frozen_pages() in 
mm/page_alloc.c. 

Could this open-coded check also be updated to use the new 
can_spin_trylock() helper to resolve a pre-existing UP NMI safety issue?

When __free_frozen_pages() handles FPI_TRYLOCK, it explicitly checks for
CONFIG_PREEMPT_RT and NMI/hardirq context. On UP systems where CONFIG_SMP 
is disabled, spin_trylock() and its variants are defined to unconditionally 
succeed because there is no actual lock state tracked.

If a caller invokes free_pages_nolock() from an NMI context on a UP system, 
the current RT-specific check fails. The execution then falls back to 
free_one_page(), where spin_trylock_irqsave() falsely succeeds even if the 
NMI interrupted another task that was actively modifying the free list under 
the zone lock. 

This allows concurrent modification of the free list, which can lead to 
corruption and kernel panics.

Updating the check in __free_frozen_pages() to use !can_spin_trylock() 
would catch the UP NMI case and safely route the allocation to the lockless 
add_page_to_zone_llist() path instead.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260702-alloc-trylock-v4-0-0af8ff387e80@google.com?part=18

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-07-02 12:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02  9:49 [PATCH v4 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 03/18] mm: name some args in a function declaration Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 04/18] mm: Split out internal page_alloc.h Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
2026-07-02 10:28   ` sashiko-bot
2026-07-02  9:49 ` [PATCH v4 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 09/18] KVM: VMX: " Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 10/18] x86/virt: " Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 11/18] sgi-xp: " Brendan Jackman
2026-07-02 10:54   ` sashiko-bot
2026-07-02  9:49 ` [PATCH v4 12/18] net/funeth: Switch to " Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
2026-07-02 11:11   ` sashiko-bot
2026-07-02  9:49 ` [PATCH v4 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 16/18] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
2026-07-02  9:49 ` [PATCH v4 18/18] mm: factor out can_spin_trylock() Brendan Jackman
2026-07-02 12:28   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox