* [PATCH v5 00/18] mm: Some cleanups for page allocator APIs
@ 2026-07-03 12:31 Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
` (18 more replies)
0 siblings, 19 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, JP Kobryn, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Sean Christopherson, Paolo Bonzini, kvm,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Robin Holt, Steve Wahl, Arnd Bergmann,
Greg Kroah-Hartman, Dimitris Michailidis, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Based on mm-new.
This depends on moving alloc_tag to mm/:
https://lore.kernel.org/all/aj5QBtJcphPElczI@lucifer/
Some tweaks and cleanups for page allocator entrypoint and flags. This
is motivated by preparation for __GFP_UNMAPPED [1] (which will probably
become ALLOC_UNMAPPED in its next iteration), but all this is supposed
to be an improvement to the codebase in its own right: unifying code
paths, reducing API surface, and removing GFP flags.
[1] https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/
This started with unifying __alloc_frozen_pages[_nolock]_noprof() and
expanded from there.
Unifying the nolock allocator entrypoint with the normal allocator
entrypoint means adding an alloc_flags argument to the later (only
exposed within mm/). This presents an opportunity to take advantage of
that arg to remove some GFP flags, if we add that alloc_flags arg a bit
more broadly to allocator entrypoints.
To distinguish between mm-internal and "public" allocator entrypoints,
it makes sense to use the __ prefix. There are already some public APIs
with that prefix. For *alloc_pages*, just removing those variants seems
like a nice cleanup anyway, so do that. For get_free_pages, the "__"
variant is the _only_ variant and it's very widely used, so it doesn't
seem worthwhile to modify that. Therefore, scope this "__" change
specifically to the *alloc_pages* API, which means we leave the
*folio_alloc* API untouched too, even though that could probably be
cleaned up if so desired.
Tested:
- KVM, mm, and BPF selftests in a QEMU VM
- kunit.py on x86_64
- For the ALLOC_NO_CODETAG bits I just booted a VM and read
/proc/allocinfo. I confirmed that if I remove ALLOC_NO_CODETAG, the
kernel crashes in early boot, so I was at least booting code that
depends on this logic.
I used Google's internal version of Antigravity (AI coding harness) to
do the repetitive bits, those commits are marked with Assisted-by, the
rest is manual.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Changes in v5:
- Just trivial non-functional fixes.
- Link to v4: https://patch.msgid.link/20260702-alloc-trylock-v4-0-0af8ff387e80@google.com
Changes in v4:
- Fixed some (harmless) missing applications of ac->alloc_flags (local
Sashiko)
- Fixed various build issues.
- Note that Sashiko pointed out a KMSAN build issue [0], I have
fixed it but KMSAN builds are currently broken by objtool [1]. At least
mm/kmsan/init.c compiles.
[2] https://lore.kernel.org/all/20260629141642.628271F00A3D@smtp.kernel.org/
[3] https://lore.kernel.org/all/20260630104434.GC751831@noisy.programming.kicks-ass.net/t/#u
- Avoided setting ALLOC_NOFRAGMENT under ALLOC_NOLOCK (Sashiko, Harry)
- Added patch to tweak alloc_flags_cma() interface (Vlastimil)
- More commit messages fixups (various)
- Added patch to create can_spin_trylock() (Harry)
- Link to v3: https://patch.msgid.link/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com
Changes in v3:
- Created mm/page_alloc.h
- Fixed EXPORT_SYMBOL() issues
- Reworded commit messages per Sashiko's pointers
- Dropped rename of alloc_flags arg in prepare_alloc_pages() (Suren)
- Renamed gfp_to_alloc_flags_nonblocking() too after rebasing onto:
https://lore.kernel.org/all/20260623004600.113347-1-jp.kobryn@linux.dev/
- Link to v2: https://patch.msgid.link/20260622-alloc-trylock-v2-0-31f31367d420@google.com
Changes in v2:
- Fixed up whitespace in nolock unification patch
- Introduced ALLOC_DEFAULT to replace literal 0 for alloc_flags
- All other patches are new
- Link to v1: https://patch.msgid.link/20260617-alloc-trylock-v1-1-83fd7858832e@google.com
---
Brendan Jackman (17):
mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
mm/page_alloc: some renames to clarify alloc_flags scopes
mm: name some args in a function declaration
mm: Split out internal page_alloc.h
mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
mm/page_alloc: relax GFP WARN in nolock allocs
mm: move some stuff to mm/page_alloc.h
perf/x86/intel: Use higher-level allocator API
KVM: VMX: Use higher-level allocator API
x86/virt: Use higher-level allocator API
sgi-xp: Use higher-level allocator API
net/funeth: Switch to higher-level allocator API
mm: Remove __alloc_pages_node()
mm: Move __alloc_pages() to mm/page_alloc.h
mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
mm/page_alloc: drop alloc_flags arg from alloc_flags_cma()
mm: factor out can_spin_trylock()
Vlastimil Babka (SUSE) (1):
mm: remove the __GFP_NO_OBJ_EXT flag
Documentation/admin-guide/cgroup-v1/cpusets.rst | 2 +-
Documentation/admin-guide/mm/transhuge.rst | 2 +-
MAINTAINERS | 1 +
arch/x86/events/intel/ds.c | 6 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/virt/hw.c | 2 +-
drivers/misc/sgi-xp/xpc_uv.c | 5 +-
drivers/net/ethernet/fungible/funeth/funeth_rx.c | 2 +-
include/linux/alloc_tag.h | 4 +-
include/linux/gfp.h | 54 +---
include/linux/gfp_types.h | 7 -
include/linux/skbuff.h | 2 +-
include/trace/events/mmflags.h | 10 +-
mm/alloc_tag.c | 23 +-
mm/compaction.c | 5 +-
mm/hugetlb.c | 4 +-
mm/internal.h | 275 ++------------------
mm/khugepaged.c | 1 +
mm/kmsan/init.c | 2 +-
mm/memory-failure.c | 1 +
mm/memory_hotplug.c | 1 +
mm/mempolicy.c | 11 +-
mm/migrate.c | 1 +
mm/mm_init.c | 1 +
mm/page_alloc.c | 269 ++++++++++---------
mm/page_alloc.h | 312 +++++++++++++++++++++++
mm/page_frag_cache.c | 6 +-
mm/page_isolation.c | 1 +
mm/page_owner.c | 2 +-
mm/page_reporting.c | 1 +
mm/show_mem.c | 1 +
mm/shuffle.c | 1 +
mm/slub.c | 17 +-
mm/swap.c | 1 +
mm/vmscan.c | 1 +
mm/vmstat.c | 1 +
tools/include/linux/gfp_types.h | 7 -
37 files changed, 536 insertions(+), 508 deletions(-)
---
base-commit: 32af3ff0925368eff29b2fed62f154150eb5dc10
change-id: 20260617-alloc-trylock-14ad37dab337
Best regards,
--
Brendan Jackman <jackmanb@google.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 13:59 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
` (17 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
It's confusing that the function is called "nolock" but the flag is
called "trylock", align them.
The function's terminology is more visible and has more mindshare so use that.
Suggested-by: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Link: https://lore.kernel.org/linux-mm/2399b3ad-4eac-4a14-94c3-27e9f07972a1@kernel.org/
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/internal.h | 2 +-
mm/page_alloc.c | 10 +++++-----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index fa4fb69444ecd..a2b09a13735bf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1480,7 +1480,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
#define ALLOC_NOFRAGMENT 0x0
#endif
#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
-#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */
+#define ALLOC_NOLOCK 0x400 /* Only use spin_trylock in allocation path */
#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
/* Flags that allow allocations below the min watermark. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 762d9b6bc792f..6004fe6583d47 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2530,7 +2530,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long flags;
int i;
- if (unlikely(alloc_flags & ALLOC_TRYLOCK)) {
+ if (unlikely(alloc_flags & ALLOC_NOLOCK)) {
if (!spin_trylock_irqsave(&zone->lock, flags))
return 0;
} else {
@@ -3218,7 +3218,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
do {
page = NULL;
- if (unlikely(alloc_flags & ALLOC_TRYLOCK)) {
+ if (unlikely(alloc_flags & ALLOC_NOLOCK)) {
if (!spin_trylock_irqsave(&zone->lock, flags))
return NULL;
} else {
@@ -5059,7 +5059,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
* Don't invoke should_fail logic, since it may call
* get_random_u32() and printk() which need to spin_lock.
*/
- if (!(*alloc_flags & ALLOC_TRYLOCK) &&
+ if (!(*alloc_flags & ALLOC_NOLOCK) &&
should_fail_alloc_page(gfp_mask, order))
return false;
@@ -7804,7 +7804,7 @@ static bool cond_accept_memory(struct zone *zone, unsigned int order,
return false;
/* Bailout, since try_to_accept_memory_one() needs to take a lock */
- if (alloc_flags & ALLOC_TRYLOCK)
+ if (alloc_flags & ALLOC_NOLOCK)
return false;
wmark = promo_wmark_pages(zone);
@@ -7896,7 +7896,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
*/
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
| gfp_flags;
- unsigned int alloc_flags = ALLOC_TRYLOCK;
+ unsigned int alloc_flags = ALLOC_NOLOCK;
struct alloc_context ac = { };
struct page *page;
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:01 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 03/18] mm: name some args in a function declaration Brendan Jackman
` (16 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, JP Kobryn
It's pretty confusing that:
- The slowpath and fastpath have a totally distinct set of alloc_flags.
- gfp_to_alloc_flags() sounds generic but it only influences the
slowpath.
Rename some variables to highlight which alloc_flags are
fastpath-specific. Rename gfp_to_alloc_flags() to highlight that it's
slowpath-specific.
gfp_to_alloc_flags_cma() and gfp_to_alloc_flags_nonblocking() currently
have perfectly harmless names, but to keep the naming consistent also
rename those to the alloc_flags_*() pattern (which already exists for
alloc_flags_nofragment()).
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: JP Kobryn <jp.kobryn@linux.dev>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/skbuff.h | 2 +-
mm/page_alloc.c | 28 ++++++++++++++--------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 22eda1d54a0e8..4431b026e429d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3573,7 +3573,7 @@ static inline struct page *__dev_alloc_pages_noprof(gfp_t gfp_mask,
* 3. If requesting a order 0 page it will not be compound
* due to the check to see if order has a value in prep_new_page
* 4. __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
- * code in gfp_to_alloc_flags that should be enforcing this.
+ * code in alloc_flags_slowpath() that should be enforcing this.
*/
gfp_mask |= __GFP_COMP | __GFP_MEMALLOC;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6004fe6583d47..df1345cde301f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3774,8 +3774,8 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
}
/* Must be called after current_gfp_context() which can change gfp_mask */
-static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
- unsigned int alloc_flags)
+static inline unsigned int alloc_flags_cma(gfp_t gfp_mask,
+ unsigned int alloc_flags)
{
#ifdef CONFIG_CMA
if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
@@ -4474,7 +4474,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
}
static inline unsigned int
-gfp_to_alloc_flags_nonblocking(gfp_t gfp_mask, unsigned int order)
+alloc_flags_nonblocking(gfp_t gfp_mask, unsigned int order)
{
unsigned int alloc_flags = 0;
@@ -4497,7 +4497,7 @@ gfp_to_alloc_flags_nonblocking(gfp_t gfp_mask, unsigned int order)
}
static inline unsigned int
-gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
+alloc_flags_slowpath(gfp_t gfp_mask, unsigned int order)
{
unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
@@ -4512,7 +4512,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
if (gfp_mask & __GFP_KSWAPD_RECLAIM)
alloc_flags |= ALLOC_KSWAPD;
- alloc_flags |= gfp_to_alloc_flags_nonblocking(gfp_mask, order);
+ alloc_flags |= alloc_flags_nonblocking(gfp_mask, order);
if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
/*
@@ -4525,7 +4525,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
} else if (unlikely(rt_or_dl_task(current)) && in_task())
alloc_flags |= ALLOC_MIN_RESERVE;
- alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
+ alloc_flags = alloc_flags_cma(gfp_mask, alloc_flags);
if (defrag_mode)
alloc_flags |= ALLOC_NOFRAGMENT;
@@ -4791,7 +4791,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* kswapd needs to be woken up, and to avoid the cost of setting up
* alloc_flags precisely. So we do that now.
*/
- alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
+ alloc_flags = alloc_flags_slowpath(gfp_mask, order);
/*
* We need to recalculate the starting point for the zonelist iterator
@@ -4832,7 +4832,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
if (reserve_flags)
- alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) |
+ alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
(alloc_flags & ALLOC_KSWAPD);
/*
@@ -5063,7 +5063,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
should_fail_alloc_page(gfp_mask, order))
return false;
- *alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags);
+ *alloc_flags = alloc_flags_cma(gfp_mask, *alloc_flags);
/* Dirty zone balancing only done in the fast path */
ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE);
@@ -5277,7 +5277,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
int preferred_nid, nodemask_t *nodemask)
{
struct page *page;
- unsigned int alloc_flags = ALLOC_WMARK_LOW;
+ unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = { };
@@ -5299,18 +5299,18 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
gfp = current_gfp_context(gfp);
alloc_gfp = gfp;
if (!prepare_alloc_pages(gfp, order, preferred_nid, nodemask, &ac,
- &alloc_gfp, &alloc_flags))
+ &alloc_gfp, &fastpath_alloc_flags))
return NULL;
/*
* Forbid the first pass from falling back to types that fragment
* memory until all local zones are considered.
*/
- alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
- alloc_flags |= gfp_to_alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
+ fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
+ fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
/* First allocation attempt */
- page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
+ page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
if (likely(page))
goto out;
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 03/18] mm: name some args in a function declaration
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:02 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 04/18] mm: Split out internal page_alloc.h Brendan Jackman
` (15 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
Checkpatch complains about this, a later patch will move the code, fix
it so that checkpatch doesn't complain about that patch. Do it in a
separate patch so the "move the code" patch is trivial to review using
Git's diff colouring.
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index a2b09a13735bf..1e252678bbc91 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -919,8 +919,8 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
extern int user_min_free_kbytes;
-struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
- nodemask_t *);
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
+ nodemask_t *nodemask);
#define __alloc_frozen_pages(...) \
alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
void free_frozen_pages(struct page *page, unsigned int order);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 04/18] mm: Split out internal page_alloc.h
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (2 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 03/18] mm: name some args in a function declaration Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:07 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
` (14 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
internal.h is a bit bloated, seems like time for a page_alloc.h.
Where it wasn't obvious, the heuristic for deciding what goes into this
new header was "does it support/correspond to a definition in
mm/page_alloc.c?"
Only need to include it from ~20 .c files out of ~150 so this does seem
like a genuine reduction in scopes, which is nice. And there's no
circular internal.h<->page_alloc.h dependency, so it seems worthwhile to
split this up before that inevitably emerges!
Suggested-by: "David Hildenbrand (Arm)" <david@kernel.org>
Link: https://lore.kernel.org/all/41e92bab-6882-401a-8de9-154adbdcfb36@kernel.org/
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
MAINTAINERS | 1 +
mm/compaction.c | 1 +
mm/hugetlb.c | 1 +
mm/internal.h | 252 -----------------------------------------------
mm/khugepaged.c | 1 +
mm/kmsan/init.c | 2 +-
mm/memory-failure.c | 1 +
mm/memory_hotplug.c | 1 +
mm/mempolicy.c | 1 +
mm/migrate.c | 1 +
mm/mm_init.c | 1 +
mm/page_alloc.c | 1 +
mm/page_alloc.h | 269 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/page_frag_cache.c | 2 +-
mm/page_isolation.c | 1 +
mm/page_owner.c | 2 +-
mm/page_reporting.c | 1 +
mm/show_mem.c | 1 +
mm/shuffle.c | 1 +
mm/slub.c | 1 +
mm/swap.c | 1 +
mm/vmscan.c | 1 +
22 files changed, 289 insertions(+), 255 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 29c302e9c17ba..b359ff4e0a1a6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17171,6 +17171,7 @@ F: mm/debug_page_alloc.c
F: mm/debug_page_ref.c
F: mm/fail_page_alloc.c
F: mm/page_alloc.c
+F: mm/page_alloc.h
F: mm/page_ext.c
F: mm/page_frag_cache.c
F: mm/page_isolation.c
diff --git a/mm/compaction.c b/mm/compaction.c
index f08765ade014c..7d80735502d9a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -24,6 +24,7 @@
#include <linux/page_owner.h>
#include <linux/psi.h>
#include <linux/cpuset.h>
+#include "page_alloc.h"
#include "internal.h"
#ifdef CONFIG_COMPACTION
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 391739ca7f711..0f51b36773f59 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,7 @@
#include <linux/node.h>
#include <linux/page_owner.h>
#include "internal.h"
+#include "page_alloc.h"
#include "hugetlb_vmemmap.h"
#include "hugetlb_cma.h"
#include "hugetlb_internal.h"
diff --git a/mm/internal.h b/mm/internal.h
index 1e252678bbc91..7e3b2386e274b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -658,165 +658,6 @@ extern int defrag_mode;
void setup_per_zone_wmarks(void);
void calculate_min_free_kbytes(void);
int __meminit init_per_zone_wmark_min(void);
-void page_alloc_sysctl_init(void);
-
-/*
- * Structure for holding the mostly immutable allocation parameters passed
- * between functions involved in allocations, including the alloc_pages*
- * family of functions.
- *
- * nodemask, migratetype and highest_zoneidx are initialized only once in
- * __alloc_pages() and then never change.
- *
- * zonelist, preferred_zone and highest_zoneidx are set first in
- * __alloc_pages() for the fast path, and might be later changed
- * in __alloc_pages_slowpath(). All other functions pass the whole structure
- * by a const pointer.
- */
-struct alloc_context {
- struct zonelist *zonelist;
- const nodemask_t *nodemask;
- struct zoneref *preferred_zoneref;
- int migratetype;
-
- /*
- * highest_zoneidx represents highest usable zone index of
- * the allocation request. Due to the nature of the zone,
- * memory on lower zone than the highest_zoneidx will be
- * protected by lowmem_reserve[highest_zoneidx].
- *
- * highest_zoneidx is also used by reclaim/compaction to limit
- * the target zone since higher zone than this index cannot be
- * usable for this allocation request.
- */
- enum zone_type highest_zoneidx;
- bool spread_dirty_pages;
-};
-
-/*
- * This function returns the order of a free page in the buddy system. In
- * general, page_zone(page)->lock must be held by the caller to prevent the
- * page from being allocated in parallel and returning garbage as the order.
- * If a caller does not hold page_zone(page)->lock, it must guarantee that the
- * page cannot be allocated or merged in parallel. Alternatively, it must
- * handle invalid values gracefully, and use buddy_order_unsafe() below.
- */
-static inline unsigned int buddy_order(struct page *page)
-{
- /* PageBuddy() must be checked by the caller */
- return page_private(page);
-}
-
-/*
- * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
- * PageBuddy() should be checked first by the caller to minimize race window,
- * and invalid values must be handled gracefully.
- *
- * READ_ONCE is used so that if the caller assigns the result into a local
- * variable and e.g. tests it for valid range before using, the compiler cannot
- * decide to remove the variable and inline the page_private(page) multiple
- * times, potentially observing different values in the tests and the actual
- * use of the result.
- */
-#define buddy_order_unsafe(page) READ_ONCE(page_private(page))
-
-/*
- * This function checks whether a page is free && is the buddy
- * we can coalesce a page and its buddy if
- * (a) the buddy is not in a hole (check before calling!) &&
- * (b) the buddy is in the buddy system &&
- * (c) a page and its buddy have the same order &&
- * (d) a page and its buddy are in the same zone.
- *
- * For recording whether a page is in the buddy system, we set PageBuddy.
- * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
- *
- * For recording page's order, we use page_private(page).
- */
-static inline bool page_is_buddy(struct page *page, struct page *buddy,
- unsigned int order)
-{
- if (!page_is_guard(buddy) && !PageBuddy(buddy))
- return false;
-
- if (buddy_order(buddy) != order)
- return false;
-
- /*
- * zone check is done late to avoid uselessly calculating
- * zone/node ids for pages that could never merge.
- */
- if (page_zone_id(page) != page_zone_id(buddy))
- return false;
-
- VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
-
- return true;
-}
-
-/*
- * Locate the struct page for both the matching buddy in our
- * pair (buddy1) and the combined O(n+1) page they form (page).
- *
- * 1) Any buddy B1 will have an order O twin B2 which satisfies
- * the following equation:
- * B2 = B1 ^ (1 << O)
- * For example, if the starting buddy (buddy2) is #8 its order
- * 1 buddy is #10:
- * B2 = 8 ^ (1 << 1) = 8 ^ 2 = 10
- *
- * 2) Any buddy B will have an order O+1 parent P which
- * satisfies the following equation:
- * P = B & ~(1 << O)
- *
- * Assumption: *_mem_map is contiguous at least up to MAX_PAGE_ORDER
- */
-static inline unsigned long
-__find_buddy_pfn(unsigned long page_pfn, unsigned int order)
-{
- return page_pfn ^ (1 << order);
-}
-
-/*
- * Find the buddy of @page and validate it.
- * @page: The input page
- * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
- * function is used in the performance-critical __free_one_page().
- * @order: The order of the page
- * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
- * page_to_pfn().
- *
- * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
- * not the same as @page. The validation is necessary before use it.
- *
- * Return: the found buddy page or NULL if not found.
- */
-static inline struct page *find_buddy_page_pfn(struct page *page,
- unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
-{
- unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
- struct page *buddy;
-
- buddy = page + (__buddy_pfn - pfn);
- if (buddy_pfn)
- *buddy_pfn = __buddy_pfn;
-
- if (page_is_buddy(page, buddy, order))
- return buddy;
- return NULL;
-}
-
-extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
- unsigned long end_pfn, struct zone *zone);
-
-static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
- unsigned long end_pfn, struct zone *zone)
-{
- if (zone->contiguous)
- return pfn_to_page(start_pfn);
-
- return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
-}
void set_zone_contiguous(struct zone *zone);
bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
@@ -831,8 +672,6 @@ extern int __isolate_free_page(struct page *page, unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
extern void memblock_free_pages(unsigned long pfn, unsigned int order);
-extern void __free_pages_core(struct page *page, unsigned int order,
- enum meminit_context context);
/*
* This will have no effect, other than possibly generating a warning, if the
@@ -914,40 +753,6 @@ static inline void init_compound_tail(struct page *tail,
prep_compound_tail(tail, head, order);
}
-void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
-extern bool free_pages_prepare(struct page *page, unsigned int order);
-
-extern int user_min_free_kbytes;
-
-struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
- nodemask_t *nodemask);
-#define __alloc_frozen_pages(...) \
- alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
-void free_frozen_pages(struct page *page, unsigned int order);
-void free_unref_folios(struct folio_batch *fbatch);
-
-#ifdef CONFIG_NUMA
-struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
-#else
-static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
-{
- return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
-}
-#endif
-
-#define alloc_frozen_pages(...) \
- alloc_hooks(alloc_frozen_pages_noprof(__VA_ARGS__))
-
-struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order);
-#define alloc_frozen_pages_nolock(...) \
- alloc_hooks(alloc_frozen_pages_nolock_noprof(__VA_ARGS__))
-void free_frozen_pages_nolock(struct page *page, unsigned int order);
-
-extern void zone_pcp_reset(struct zone *zone);
-extern void zone_pcp_disable(struct zone *zone);
-extern void zone_pcp_enable(struct zone *zone);
-extern void zone_pcp_init(struct zone *zone);
-
extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
phys_addr_t min_addr,
int nid, bool exact_nid);
@@ -1101,23 +906,6 @@ static inline void init_cma_pageblock(struct page *page)
}
#endif
-enum fallback_result {
- /* Found suitable migratetype, *mt_out is valid. */
- FALLBACK_FOUND,
- /* No fallback found in requested order. */
- FALLBACK_EMPTY,
- /* Passed @claimable, but claiming whole block is a bad idea. */
- FALLBACK_NOCLAIM,
-};
-enum fallback_result
-find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable, int *mt_out);
-
-static inline bool free_area_empty(struct free_area *area, int migratetype)
-{
- return list_empty(&area->free_list[migratetype]);
-}
-
/* mm/util.c */
struct anon_vma *folio_anon_vma(const struct folio *folio);
@@ -1445,46 +1233,6 @@ extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long,
unsigned long reclaim_pages(struct list_head *folio_list);
unsigned int reclaim_clean_pages_from_list(struct zone *zone,
struct list_head *folio_list);
-/* The ALLOC_WMARK bits are used as an index to zone->watermark */
-#define ALLOC_WMARK_MIN WMARK_MIN
-#define ALLOC_WMARK_LOW WMARK_LOW
-#define ALLOC_WMARK_HIGH WMARK_HIGH
-#define ALLOC_NO_WATERMARKS 0x04 /* don't check watermarks at all */
-
-/* Mask to get the watermark bits */
-#define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1)
-
-/*
- * Only MMU archs have async oom victim reclaim - aka oom_reaper so we
- * cannot assume a reduced access to memory reserves is sufficient for
- * !MMU
- */
-#ifdef CONFIG_MMU
-#define ALLOC_OOM 0x08
-#else
-#define ALLOC_OOM ALLOC_NO_WATERMARKS
-#endif
-
-#define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access
- * to 25% of the min watermark or
- * 62.5% if __GFP_HIGH is set.
- */
-#define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50%
- * of the min watermark.
- */
-#define ALLOC_CPUSET 0x40 /* check for correct cpuset */
-#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */
-#ifdef CONFIG_ZONE_DMA32
-#define ALLOC_NOFRAGMENT 0x100 /* avoid mixing pageblock types */
-#else
-#define ALLOC_NOFRAGMENT 0x0
-#endif
-#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
-#define ALLOC_NOLOCK 0x400 /* Only use spin_trylock in allocation path */
-#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
-
-/* Flags that allow allocations below the min watermark. */
-#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
enum ttu_flags;
struct tlbflush_unmap_batch;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 617bca76db49b..58e14d1543ecb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -26,6 +26,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "page_alloc.h"
#include "mm_slot.h"
enum scan_result {
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
index b14ce3417e65e..4983b6e9f7c99 100644
--- a/mm/kmsan/init.c
+++ b/mm/kmsan/init.c
@@ -13,7 +13,7 @@
#include <linux/mm.h>
#include <linux/memblock.h>
-#include "../internal.h"
+#include "../page_alloc.h"
#define NUM_FUTURE_RANGES 128
struct start_end_pair {
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4916ab1453257..bf717ec595087 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -66,6 +66,7 @@
#include <trace/events/memory-failure.h>
#include "swap.h"
+#include "page_alloc.h"
#include "internal.h"
static int sysctl_memory_failure_early_kill __read_mostly;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8b137328dcf01..11ab2f7bc7f3b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -40,6 +40,7 @@
#include <asm/tlbflush.h>
#include "internal.h"
+#include "page_alloc.h"
#include "shuffle.h"
enum {
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bba65898aee17..948264407dee3 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -119,6 +119,7 @@
#include <linux/memory.h>
#include "internal.h"
+#include "page_alloc.h"
/* Internal flags */
#define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */
diff --git a/mm/migrate.c b/mm/migrate.c
index a786549551e3d..db50e7b66fbf8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -49,6 +49,7 @@
#include <trace/events/migrate.h>
#include "internal.h"
+#include "page_alloc.h"
#include "swap.h"
static const struct movable_operations *offline_movable_ops;
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 07a8c74cf7ade..537664974ab1c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -33,6 +33,7 @@
#include <linux/kexec_handover.h>
#include <linux/hugetlb.h>
#include "internal.h"
+#include "page_alloc.h"
#include "slab.h"
#include "shuffle.h"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df1345cde301f..85cee8a0031f2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -56,6 +56,7 @@
#include <linux/pgalloc_tag.h>
#include <asm/div64.h>
#include "internal.h"
+#include "page_alloc.h"
#include "shuffle.h"
#include "page_reporting.h"
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
new file mode 100644
index 0000000000000..3250d44f96457
--- /dev/null
+++ b/mm/page_alloc.h
@@ -0,0 +1,269 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * mm-internal API for the page (buddy) allocator. Public API lives in
+ * include/linux/gfp.h.
+ */
+#ifndef __MM_PAGE_ALLOC_H
+#define __MM_PAGE_ALLOC_H
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/nodemask.h>
+#include <linux/types.h>
+
+/* The ALLOC_WMARK bits are used as an index to zone->watermark */
+#define ALLOC_WMARK_MIN WMARK_MIN
+#define ALLOC_WMARK_LOW WMARK_LOW
+#define ALLOC_WMARK_HIGH WMARK_HIGH
+#define ALLOC_NO_WATERMARKS 0x04 /* don't check watermarks at all */
+
+/* Mask to get the watermark bits */
+#define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1)
+
+/*
+ * Only MMU archs have async oom victim reclaim - aka oom_reaper so we
+ * cannot assume a reduced access to memory reserves is sufficient for
+ * !MMU
+ */
+#ifdef CONFIG_MMU
+#define ALLOC_OOM 0x08
+#else
+#define ALLOC_OOM ALLOC_NO_WATERMARKS
+#endif
+
+#define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access
+ * to 25% of the min watermark or
+ * 62.5% if __GFP_HIGH is set.
+ */
+#define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50%
+ * of the min watermark.
+ */
+#define ALLOC_CPUSET 0x40 /* check for correct cpuset */
+#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */
+#ifdef CONFIG_ZONE_DMA32
+#define ALLOC_NOFRAGMENT 0x100 /* avoid mixing pageblock types */
+#else
+#define ALLOC_NOFRAGMENT 0x0
+#endif
+#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
+#define ALLOC_NOLOCK 0x400 /* Only use spin_trylock in allocation path */
+#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+
+/* Flags that allow allocations below the min watermark. */
+#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+
+/*
+ * Structure for holding the mostly immutable allocation parameters passed
+ * between functions involved in allocations, including the alloc_pages*
+ * family of functions.
+ *
+ * nodemask, migratetype and highest_zoneidx are initialized only once in
+ * __alloc_pages() and then never change.
+ *
+ * zonelist, preferred_zone and highest_zoneidx are set first in
+ * __alloc_pages() for the fast path, and might be later changed
+ * in __alloc_pages_slowpath(). All other functions pass the whole structure
+ * by a const pointer.
+ */
+struct alloc_context {
+ struct zonelist *zonelist;
+ const nodemask_t *nodemask;
+ struct zoneref *preferred_zoneref;
+ int migratetype;
+
+ /*
+ * highest_zoneidx represents highest usable zone index of
+ * the allocation request. Due to the nature of the zone,
+ * memory on lower zone than the highest_zoneidx will be
+ * protected by lowmem_reserve[highest_zoneidx].
+ *
+ * highest_zoneidx is also used by reclaim/compaction to limit
+ * the target zone since higher zone than this index cannot be
+ * usable for this allocation request.
+ */
+ enum zone_type highest_zoneidx;
+ bool spread_dirty_pages;
+};
+
+/*
+ * This function returns the order of a free page in the buddy system. In
+ * general, page_zone(page)->lock must be held by the caller to prevent the
+ * page from being allocated in parallel and returning garbage as the order.
+ * If a caller does not hold page_zone(page)->lock, it must guarantee that the
+ * page cannot be allocated or merged in parallel. Alternatively, it must
+ * handle invalid values gracefully, and use buddy_order_unsafe() below.
+ */
+static inline unsigned int buddy_order(struct page *page)
+{
+ /* PageBuddy() must be checked by the caller */
+ return page_private(page);
+}
+
+/*
+ * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
+ * PageBuddy() should be checked first by the caller to minimize race window,
+ * and invalid values must be handled gracefully.
+ *
+ * READ_ONCE is used so that if the caller assigns the result into a local
+ * variable and e.g. tests it for valid range before using, the compiler cannot
+ * decide to remove the variable and inline the page_private(page) multiple
+ * times, potentially observing different values in the tests and the actual
+ * use of the result.
+ */
+#define buddy_order_unsafe(page) READ_ONCE(page_private(page))
+
+/*
+ * This function checks whether a page is free && is the buddy
+ * we can coalesce a page and its buddy if
+ * (a) the buddy is not in a hole (check before calling!) &&
+ * (b) the buddy is in the buddy system &&
+ * (c) a page and its buddy have the same order &&
+ * (d) a page and its buddy are in the same zone.
+ *
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
+ *
+ * For recording page's order, we use page_private(page).
+ */
+static inline bool page_is_buddy(struct page *page, struct page *buddy,
+ unsigned int order)
+{
+ if (!page_is_guard(buddy) && !PageBuddy(buddy))
+ return false;
+
+ if (buddy_order(buddy) != order)
+ return false;
+
+ /*
+ * zone check is done late to avoid uselessly calculating
+ * zone/node ids for pages that could never merge.
+ */
+ if (page_zone_id(page) != page_zone_id(buddy))
+ return false;
+
+ VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
+
+ return true;
+}
+
+/*
+ * Locate the struct page for both the matching buddy in our
+ * pair (buddy1) and the combined O(n+1) page they form (page).
+ *
+ * 1) Any buddy B1 will have an order O twin B2 which satisfies
+ * the following equation:
+ * B2 = B1 ^ (1 << O)
+ * For example, if the starting buddy (buddy2) is #8 its order
+ * 1 buddy is #10:
+ * B2 = 8 ^ (1 << 1) = 8 ^ 2 = 10
+ *
+ * 2) Any buddy B will have an order O+1 parent P which
+ * satisfies the following equation:
+ * P = B & ~(1 << O)
+ *
+ * Assumption: *_mem_map is contiguous at least up to MAX_PAGE_ORDER
+ */
+static inline unsigned long
+__find_buddy_pfn(unsigned long page_pfn, unsigned int order)
+{
+ return page_pfn ^ (1 << order);
+}
+
+/*
+ * Find the buddy of @page and validate it.
+ * @page: The input page
+ * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
+ * function is used in the performance-critical __free_one_page().
+ * @order: The order of the page
+ * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
+ * page_to_pfn().
+ *
+ * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
+ * not the same as @page. The validation is necessary before use it.
+ *
+ * Return: the found buddy page or NULL if not found.
+ */
+static inline struct page *find_buddy_page_pfn(struct page *page,
+ unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
+{
+ unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
+ struct page *buddy;
+
+ buddy = page + (__buddy_pfn - pfn);
+ if (buddy_pfn)
+ *buddy_pfn = __buddy_pfn;
+
+ if (page_is_buddy(page, buddy, order))
+ return buddy;
+ return NULL;
+}
+
+extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+ unsigned long end_pfn, struct zone *zone);
+
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+ unsigned long end_pfn, struct zone *zone)
+{
+ if (zone->contiguous)
+ return pfn_to_page(start_pfn);
+
+ return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
+extern void __free_pages_core(struct page *page, unsigned int order,
+ enum meminit_context context);
+
+void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
+extern bool free_pages_prepare(struct page *page, unsigned int order);
+
+extern int user_min_free_kbytes;
+
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
+ nodemask_t *nodemask);
+#define __alloc_frozen_pages(...) \
+ alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
+void free_frozen_pages(struct page *page, unsigned int order);
+void free_unref_folios(struct folio_batch *fbatch);
+
+#ifdef CONFIG_NUMA
+struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
+#else
+static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
+{
+ return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
+}
+#endif
+
+#define alloc_frozen_pages(...) \
+ alloc_hooks(alloc_frozen_pages_noprof(__VA_ARGS__))
+
+struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order);
+#define alloc_frozen_pages_nolock(...) \
+ alloc_hooks(alloc_frozen_pages_nolock_noprof(__VA_ARGS__))
+void free_frozen_pages_nolock(struct page *page, unsigned int order);
+
+extern void zone_pcp_reset(struct zone *zone);
+extern void zone_pcp_disable(struct zone *zone);
+extern void zone_pcp_enable(struct zone *zone);
+extern void zone_pcp_init(struct zone *zone);
+
+enum fallback_result {
+ /* Found suitable migratetype, *mt_out is valid. */
+ FALLBACK_FOUND,
+ /* No fallback found in requested order. */
+ FALLBACK_EMPTY,
+ /* Passed @claimable, but claiming whole block is a bad idea. */
+ FALLBACK_NOCLAIM,
+};
+enum fallback_result
+find_suitable_fallback(struct free_area *area, unsigned int order,
+ int migratetype, bool claimable, int *mt_out);
+
+static inline bool free_area_empty(struct free_area *area, int migratetype)
+{
+ return list_empty(&area->free_list[migratetype]);
+}
+
+void page_alloc_sysctl_init(void);
+
+#endif /* __MM_PAGE_ALLOC_H */
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index d2423f30577e4..a1077cef3a791 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -18,7 +18,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/page_frag_cache.h>
-#include "internal.h"
+#include "page_alloc.h"
static unsigned long encoded_page_create(struct page *page, unsigned int order,
bool pfmemalloc)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 32ce8a7d9df35..e5dfc7bf49446 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -11,6 +11,7 @@
#include <linux/page_owner.h>
#include <linux/migrate.h>
#include "internal.h"
+#include "page_alloc.h"
#define CREATE_TRACE_POINTS
#include <trace/events/page_isolation.h>
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 26d6ab6530ce0..e399ebed27234 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -13,7 +13,7 @@
#include <linux/memcontrol.h>
#include <linux/sched/clock.h>
-#include "internal.h"
+#include "page_alloc.h"
/*
* TODO: teach PAGE_OWNER_STACK_DEPTH (__dump_page_owner and save_stack)
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 7418f2e500bb4..c7325704c3202 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -8,6 +8,7 @@
#include <linux/delay.h>
#include <linux/scatterlist.h>
+#include "page_alloc.h"
#include "page_reporting.h"
#include "internal.h"
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 1b721a8ade67d..d1288b4c2b640 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -16,6 +16,7 @@
#include <linux/vmstat.h>
#include "internal.h"
+#include "page_alloc.h"
#include "swap.h"
atomic_long_t _totalram_pages __read_mostly;
diff --git a/mm/shuffle.c b/mm/shuffle.c
index fb1393b8b3a9d..82a2c7725a08a 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -7,6 +7,7 @@
#include <linux/random.h>
#include <linux/moduleparam.h>
#include "internal.h"
+#include "page_alloc.h"
#include "shuffle.h"
DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
diff --git a/mm/slub.c b/mm/slub.c
index 9ec774dc70096..877021e69cc41 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -53,6 +53,7 @@
#include <trace/events/kmem.h>
#include "internal.h"
+#include "page_alloc.h"
/*
* Lock order:
diff --git a/mm/swap.c b/mm/swap.c
index 58e4eff698cc4..d25131305c94c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,6 +39,7 @@
#include <linux/buffer_head.h>
#include "internal.h"
+#include "page_alloc.h"
#define CREATE_TRACE_POINTS
#include <trace/events/pagemap.h>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 56fe5393f30f8..1474a7234ea16 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -66,6 +66,7 @@
#include <linux/sched/sysctl.h>
#include "internal.h"
+#include "page_alloc.h"
#include "swap.h"
#define CREATE_TRACE_POINTS
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (3 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 04/18] mm: Split out internal page_alloc.h Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:42 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
` (13 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
main entry point function is significantly different from the normal
__alloc_frozen_pages_nolock(), this is tiring when reading the code.
Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
exposed to mm/) and then turn the nolock variant into a thin wrapper
that just sets that flag (as well as handling NUMA_NO_NODE, similar to
how some of the wrappers in gfp.h do).
For consistency, set ALLOC_WMARK_MIN explicitly in fastpath_alloc_flags
for the new ALLOC_NOLOCK path. This was already "done" silently in
__alloc_frozen_pages_nolock_noprof(): ALLOC_WMARK_MIN is 0.
Rationale that this doesn't change anything:
1. Simple bits: A bunch of the nolock-specific handling is just moved to
the new alloc_order_allowed(), alloc_nolock_allowed() and
gfp_nolock.
2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
previously in the nolock variant:
a. Application of gfp_allowed_mask; this only affects early boot,
only flags that affect the slowpath get changed here, and the
nolock allocation path isn't allowed to the GFP_BOOT_MASK flags.
b. Application of current_gfp_context() - also only affects the
slowpath
3. The slowpath itself: this is now just explicitly skipped under
!ALLOC_TRYLOCK.
Ulterior motive: adding an alloc_flags arg to the allocator's
mm-internal entrypoint can later be used to do more allocation
customisation without needing to create new GFP flags.
No functional change intended.
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/hugetlb.c | 3 +-
mm/mempolicy.c | 10 +--
mm/page_alloc.c | 192 +++++++++++++++++++++++++++++---------------------------
mm/page_alloc.h | 6 +-
mm/slub.c | 6 +-
5 files changed, 117 insertions(+), 100 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f51b36773f59..48471503984c1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1790,7 +1790,8 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
if (alloc_try_hard)
gfp_mask |= __GFP_RETRY_MAYFAIL;
- folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
+ folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask,
+ ALLOC_DEFAULT);
/*
* If we did not specify __GFP_RETRY_MAYFAIL, but still got a
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 948264407dee3..914f81863db5a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2417,9 +2417,11 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
*/
preferred_gfp = gfp | __GFP_NOWARN;
preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
- page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask);
+ page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask,
+ ALLOC_DEFAULT);
if (!page)
- page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL);
+ page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL,
+ ALLOC_DEFAULT);
return page;
}
@@ -2467,7 +2469,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
*/
page = __alloc_frozen_pages_noprof(
gfp | __GFP_THISNODE | __GFP_NORETRY, order,
- nid, NULL);
+ nid, NULL, ALLOC_DEFAULT);
if (page || !(gfp & __GFP_DIRECT_RECLAIM))
return page;
/*
@@ -2479,7 +2481,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
}
}
- page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);
+ page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask, ALLOC_DEFAULT);
if (unlikely(pol->mode == MPOL_INTERLEAVE ||
pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 85cee8a0031f2..f47a848555077 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5222,7 +5222,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
}
nr_account++;
- prep_new_page(page, 0, gfp, 0);
+ prep_new_page(page, 0, gfp, ALLOC_DEFAULT);
set_page_refcounted(page);
page_array[nr_populated++] = page;
}
@@ -5271,24 +5271,99 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
}
}
-/*
- * This is the 'heart' of the zoned buddy allocator.
- */
-struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
- int preferred_nid, nodemask_t *nodemask)
+static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
+ unsigned int alloc_flags)
{
- struct page *page;
- unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
- gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
- struct alloc_context ac = { };
+ if (alloc_flags & ALLOC_NOLOCK)
+ return pcp_allowed_order(order);
/*
* There are several places where we assume that the order value is sane
* so bail out early if the request is out of bound.
*/
- if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
+ return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
+}
+
+static inline bool alloc_nolock_allowed(void)
+{
+ /*
+ * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
+ * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
+ * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
+ * mark the task as the owner of another rt_spin_lock which will
+ * confuse PI logic, so return immediately if called from hard IRQ or
+ * NMI.
+ *
+ * Note, irqs_disabled() case is ok. This function can be called
+ * from raw_spin_lock_irqsave region.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
+ return false;
+
+ /* On UP, spin_trylock() always succeeds even when it is locked */
+ if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+ return false;
+
+ /* Bailout, since _deferred_grow_zone() needs to take a lock */
+ if (deferred_pages_enabled())
+ return false;
+
+ return true;
+}
+
+/*
+ * GFP flags to set for ALLOC_NOLOCK i.e. alloc_pages_nolock().
+ *
+ * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
+ * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
+ * is not safe in arbitrary context.
+ *
+ * These two are the conditions for gfpflags_allow_spinning() being true.
+ *
+ * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
+ * to warn. Also warn would trigger printk() which is unsafe from
+ * various contexts. We cannot use printk_deferred_enter() to mitigate,
+ * since the running context is unknown.
+ *
+ * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
+ * is safe in any context. Also zeroing the page is mandatory for
+ * BPF use cases.
+ *
+ * Though __GFP_NOMEMALLOC is not checked in the code path below,
+ * specify it here to highlight that alloc_pages_nolock()
+ * doesn't want to deplete reserves.
+ */
+static const gfp_t gfp_nolock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
+ __GFP_COMP;
+
+/*
+ * This is the 'heart' of the zoned buddy allocator.
+ */
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
+ int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
+{
+ struct page *page;
+ gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
+ struct alloc_context ac = { };
+ unsigned int fastpath_alloc_flags = alloc_flags;
+
+ /* Other flags could be supported later if needed. */
+ if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
return NULL;
+ if (!alloc_order_allowed(gfp, order, alloc_flags))
+ return NULL;
+
+ if (alloc_flags & ALLOC_NOLOCK) {
+ VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
+ if (!alloc_nolock_allowed())
+ return NULL;
+ gfp |= gfp_nolock;
+ fastpath_alloc_flags |= ALLOC_WMARK_MIN;
+ } else {
+ fastpath_alloc_flags |= ALLOC_WMARK_LOW;
+ }
+
gfp &= gfp_allowed_mask;
/*
* Apply scoped allocation constraints. This is mainly about GFP_NOFS
@@ -5303,16 +5378,19 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
&alloc_gfp, &fastpath_alloc_flags))
return NULL;
- /*
- * Forbid the first pass from falling back to types that fragment
- * memory until all local zones are considered.
- */
- fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
+ if (!(alloc_flags & ALLOC_NOLOCK)) {
+ /*
+ * Forbid the first pass from falling back to types that
+ * fragment memory until all local zones are considered.
+ */
+ fastpath_alloc_flags |= alloc_flags_nofragment(
+ zonelist_zone(ac.preferred_zoneref), gfp);
+ }
fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
- /* First allocation attempt */
+ /* First allocation attempt (or, for nolock, only attempt) */
page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
- if (likely(page))
+ if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
goto out;
alloc_gfp = gfp;
@@ -5329,7 +5407,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
out:
if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
- free_frozen_pages(page, order);
+ __free_frozen_pages(page, order,
+ alloc_flags & ALLOC_NOLOCK ? FPI_TRYLOCK : 0);
page = NULL;
}
@@ -5345,7 +5424,8 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
{
struct page *page;
- page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
+ page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
+ ALLOC_DEFAULT);
if (page)
set_page_refcounted(page);
return page;
@@ -7875,80 +7955,10 @@ static bool __free_unaccepted(struct page *page)
struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
{
- /*
- * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
- * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
- * is not safe in arbitrary context.
- *
- * These two are the conditions for gfpflags_allow_spinning() being true.
- *
- * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
- * to warn. Also warn would trigger printk() which is unsafe from
- * various contexts. We cannot use printk_deferred_enter() to mitigate,
- * since the running context is unknown.
- *
- * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
- * is safe in any context. Also zeroing the page is mandatory for
- * BPF use cases.
- *
- * Though __GFP_NOMEMALLOC is not checked in the code path below,
- * specify it here to highlight that alloc_pages_nolock()
- * doesn't want to deplete reserves.
- */
- gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
- | gfp_flags;
- unsigned int alloc_flags = ALLOC_NOLOCK;
- struct alloc_context ac = { };
- struct page *page;
-
- VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
- /*
- * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
- * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
- * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
- * mark the task as the owner of another rt_spin_lock which will
- * confuse PI logic, so return immediately if called from hard IRQ or
- * NMI.
- *
- * Note, irqs_disabled() case is ok. This function can be called
- * from raw_spin_lock_irqsave region.
- */
- if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
- return NULL;
-
- /* On UP, spin_trylock() always succeeds even when it is locked */
- if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
- return NULL;
-
- if (!pcp_allowed_order(order))
- return NULL;
-
- /* Bailout, since _deferred_grow_zone() needs to take a lock */
- if (deferred_pages_enabled())
- return NULL;
-
if (nid == NUMA_NO_NODE)
nid = numa_node_id();
- prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
- &alloc_gfp, &alloc_flags);
-
- /*
- * Best effort allocation from percpu free list.
- * If it's empty attempt to spin_trylock zone->lock.
- */
- page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
-
- /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
-
- if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
- unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
- __free_frozen_pages(page, order, FPI_TRYLOCK);
- page = NULL;
- }
- trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
- kmsan_alloc_page(page, order, alloc_gfp);
- return page;
+ return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_NOLOCK);
}
/**
* alloc_pages_nolock - opportunistic reentrant allocation from any context
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index 3250d44f96457..a4f4b325381ad 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -11,6 +11,7 @@
#include <linux/nodemask.h>
#include <linux/types.h>
+#define ALLOC_DEFAULT 0
/* The ALLOC_WMARK bits are used as an index to zone->watermark */
#define ALLOC_WMARK_MIN WMARK_MIN
#define ALLOC_WMARK_LOW WMARK_LOW
@@ -219,7 +220,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
extern int user_min_free_kbytes;
struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
- nodemask_t *nodemask);
+ nodemask_t *nodemask, unsigned int alloc_flags);
#define __alloc_frozen_pages(...) \
alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
void free_frozen_pages(struct page *page, unsigned int order);
@@ -230,7 +231,8 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
#else
static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
{
- return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
+ return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL,
+ ALLOC_DEFAULT);
}
#endif
diff --git a/mm/slub.c b/mm/slub.c
index 877021e69cc41..3989b4758ae0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3292,7 +3292,8 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
else if (node == NUMA_NO_NODE)
page = alloc_frozen_pages(flags, order);
else
- page = __alloc_frozen_pages(flags, order, node, NULL);
+ page = __alloc_frozen_pages(flags, order, node, NULL,
+ ALLOC_DEFAULT);
if (!page)
return NULL;
@@ -5302,7 +5303,8 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
if (node == NUMA_NO_NODE)
page = alloc_frozen_pages_noprof(flags, order);
else
- page = __alloc_frozen_pages_noprof(flags, order, node, NULL);
+ page = __alloc_frozen_pages_noprof(flags, order, node, NULL,
+ ALLOC_DEFAULT);
if (page) {
ptr = page_address(page);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (4 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:43 ` sashiko-bot
2026-07-03 14:44 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
` (12 subsequent siblings)
18 siblings, 2 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
This WARN forbids setting other flags than __GFP_ACCOUNT but we
unconditionally set the ones in gfp_nolock so they are certainly fine
for the caller to set.
There are other GFP flags that are almost certainly fine to set here;
Willy noted GFP_HIGHMEM, GFP_DMA, GFP_MOVABLE and GFP_HARDWALL. But,
nolock allocation is rather special, so be conservative to try and
ensure we have a chance to think carefully before nontrivial new
usecases arise.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/linux-mm/ajS96fWbG4dzP3u3@casper.infradead.org/
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f47a848555077..c2839959d7908 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5355,7 +5355,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
return NULL;
if (alloc_flags & ALLOC_NOLOCK) {
- VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
+ /* Certain other flags could be supported later if needed. */
+ VM_WARN_ON_ONCE(gfp & ~(__GFP_ACCOUNT | gfp_nolock));
if (!alloc_nolock_allowed())
return NULL;
gfp |= gfp_nolock;
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 07/18] mm: move some stuff to mm/page_alloc.h
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (5 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:46 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
` (11 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
Some of this stuff in the public header is only used internally so
shrink the scope to avoid silently growing new users.
drain_local_pages() is still used from kernel/power/snapshot.c so that
needs to stay behind.
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/gfp.h | 26 --------------------------
mm/page_alloc.h | 27 +++++++++++++++++++++++++++
mm/vmstat.c | 1 +
3 files changed, 28 insertions(+), 26 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index cdf95a9f0b87c..01d6d2591f49e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -17,28 +17,6 @@ struct mempolicy;
#define __default_gfp(a,b,...) b
#define default_gfp(...) __default_gfp(,##__VA_ARGS__,GFP_KERNEL)
-/* Convert GFP flags to their corresponding migrate type */
-#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
-#define GFP_MOVABLE_SHIFT 3
-
-static inline int gfp_migratetype(const gfp_t gfp_flags)
-{
- VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
- BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
- BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
- BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
- BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
- GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
-
- if (unlikely(page_group_by_mobility_disabled))
- return MIGRATE_UNMOVABLE;
-
- /* Group based on mobility */
- return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
-}
-#undef GFP_MOVABLE_MASK
-#undef GFP_MOVABLE_SHIFT
-
static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
{
return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
@@ -395,10 +373,6 @@ extern void free_pages(unsigned long addr, unsigned int order);
#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr), 0)
-void page_alloc_init_cpuhp(void);
-bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
-void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
-void drain_all_pages(struct zone *zone);
void drain_local_pages(struct zone *zone);
void page_alloc_init_late(void);
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index a4f4b325381ad..2d60551b4453f 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -266,6 +266,33 @@ static inline bool free_area_empty(struct free_area *area, int migratetype)
return list_empty(&area->free_list[migratetype]);
}
+/* Convert GFP flags to their corresponding migrate type */
+#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
+#define GFP_MOVABLE_SHIFT 3
+
+static inline int gfp_migratetype(const gfp_t gfp_flags)
+{
+ VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
+ BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
+ BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
+ BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
+ BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
+ GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
+
+ if (unlikely(page_group_by_mobility_disabled))
+ return MIGRATE_UNMOVABLE;
+
+ /* Group based on mobility */
+ return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
+}
+#undef GFP_MOVABLE_MASK
+#undef GFP_MOVABLE_SHIFT
+
+bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
+void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
+void drain_all_pages(struct zone *zone);
+
+void page_alloc_init_cpuhp(void);
void page_alloc_sysctl_init(void);
#endif /* __MM_PAGE_ALLOC_H */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7b93fbf9af092..3b5cb1031f720 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -30,6 +30,7 @@
#include <linux/sched/isolation.h>
#include "internal.h"
+#include "page_alloc.h"
#ifdef CONFIG_PROC_FS
#ifdef CONFIG_NUMA
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 08/18] perf/x86/intel: Use higher-level allocator API
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (6 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:49 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 09/18] KVM: VMX: " Brendan Jackman
` (10 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark
The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.
The former is going away and the latter works fine here so switch over.
No functional change intended.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/events/intel/ds.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 91a093d8cf2e7..70be80211d823 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -832,7 +832,7 @@ static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)
int node = cpu_to_node(cpu);
struct page *page;
- page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+ page = alloc_pages_node(node, flags | __GFP_ZERO, order);
return page ? page_address(page) : NULL;
}
@@ -1088,9 +1088,9 @@ void init_arch_pebs_on_cpu(int cpu)
/*
* 4KB-aligned pointer of the output buffer
- * (__alloc_pages_node() return page aligned address)
+ * (alloc_pages_node() returns page aligned address)
* Buffer Size = 4KB * 2^SIZE
- * contiguous physical buffer (__alloc_pages_node() with order)
+ * contiguous physical buffer (alloc_pages_node() with order)
*/
arch_pebs_base = virt_to_phys(cpuc->pebs_vaddr) | PEBS_BUFFER_SHIFT;
wrmsrq_on_cpu(cpu, MSR_IA32_PEBS_BASE, arch_pebs_base);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 09/18] KVM: VMX: Use higher-level allocator API
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (7 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:49 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 10/18] x86/virt: " Brendan Jackman
` (9 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, Sean Christopherson, Paolo Bonzini, kvm
The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.
The former is going away and the latter works fine here so switch over.
No functional change intended.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/kvm/vmx/vmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2325be57d3d75..ad6a7fc6a54da 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3028,7 +3028,7 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
struct page *pages;
struct vmcs *vmcs;
- pages = __alloc_pages_node(node, flags, 0);
+ pages = alloc_pages_node(node, flags, 0);
if (!pages)
return NULL;
vmcs = page_address(pages);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 10/18] x86/virt: Use higher-level allocator API
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (8 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 09/18] KVM: VMX: " Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 14:50 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 11/18] sgi-xp: " Brendan Jackman
` (8 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.
The former is going away and the latter works fine here so switch over.
No functional change intended.
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/virt/hw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index 7e9091c640be0..a236447ac7a26 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -196,7 +196,7 @@ static __init int __x86_vmx_init(void)
struct page *page;
struct vmcs *vmcs;
- page = __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+ page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
if (WARN_ON_ONCE(!page)) {
x86_vmx_exit();
return -ENOMEM;
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 11/18] sgi-xp: Use higher-level allocator API
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (9 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 10/18] x86/virt: " Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:48 ` sashiko-bot
2026-07-03 14:51 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 12/18] net/funeth: Switch to " Brendan Jackman
` (7 subsequent siblings)
18 siblings, 2 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, Robin Holt, Steve Wahl, Arnd Bergmann,
Greg Kroah-Hartman
The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.
The former is going away and the latter works fine here so switch over.
No functional change intended.
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Assisted-by: Gemini:unknown-model
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
drivers/misc/sgi-xp/xpc_uv.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
index 772c787268932..79c2f00ed4d70 100644
--- a/drivers/misc/sgi-xp/xpc_uv.c
+++ b/drivers/misc/sgi-xp/xpc_uv.c
@@ -170,9 +170,8 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
mq->mmr_blade = uv_cpu_to_blade_id(cpu);
nid = cpu_to_node(cpu);
- page = __alloc_pages_node(nid,
- GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
- pg_order);
+ page = alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+ pg_order);
if (page == NULL) {
dev_err(xpc_part, "xpc_create_gru_mq_uv() failed to alloc %d "
"bytes of memory on nid=%d for GRU mq\n", mq_size, nid);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 12/18] net/funeth: Switch to higher-level allocator API
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (10 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 11/18] sgi-xp: " Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:53 ` sashiko-bot
2026-07-03 14:52 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
` (6 subsequent siblings)
18 siblings, 2 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed, Dimitris Michailidis, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.
The former is going away and the latter works fine here so switch over.
No functional change intended.
Cc: Dimitris Michailidis <dmichail@fungible.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Assisted-by: Gemini:unknown-version
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
drivers/net/ethernet/fungible/funeth/funeth_rx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/fungible/funeth/funeth_rx.c b/drivers/net/ethernet/fungible/funeth/funeth_rx.c
index 7e2584895de39..d7000017ac2bd 100644
--- a/drivers/net/ethernet/fungible/funeth/funeth_rx.c
+++ b/drivers/net/ethernet/fungible/funeth/funeth_rx.c
@@ -103,7 +103,7 @@ static int funeth_alloc_page(struct funeth_rxq *q, struct funeth_rxbuf *rb,
if (cache_get(q, rb))
return 0;
- p = __alloc_pages_node(node, gfp | __GFP_NOWARN, 0);
+ p = alloc_pages_node(node, gfp | __GFP_NOWARN, 0);
if (unlikely(!p))
return -ENOMEM;
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 13/18] mm: Remove __alloc_pages_node()
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (11 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 12/18] net/funeth: Switch to " Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:54 ` sashiko-bot
2026-07-03 14:57 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
` (5 subsequent siblings)
18 siblings, 2 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
There were only a few users, which have been removed. The only advantage
of this API over alloc_pages_node() is avoiding a single conditional
branch. The disadvantages are:
1. More API surface, more sources of confusion, more maintenance.
2. Worse impact of CPU hotplug bugs: most users of __alloc_pages_node()
were using the result of cpu_to_node(); if the CPU gets hotplugged
out this will return NUMA_NO_NODE. If one of these paths fails to
protect against a concurrent hotplug then page_alloc.c will use
NUMA_NO_NODE as an index into NODE_DATA() and cause some horrible
memory corruption or other. With alloc_pages_node(), the code might
just work fine.
Ulterior motive: this frees up the __* variants of the allocator APIs to
serve specifically for use as mm-internal API.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/gfp.h | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 01d6d2591f49e..3bf55a5f9143e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -256,21 +256,6 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
dump_stack();
}
-/*
- * Allocate pages, preferring the node given as nid. The node must be valid and
- * online. For more general interface, see alloc_pages_node().
- */
-static inline struct page *
-__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
-{
- VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
- warn_if_node_offline(nid, gfp_mask);
-
- return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
-}
-
-#define __alloc_pages_node(...) alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
-
static inline
struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
{
@@ -293,7 +278,10 @@ static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
- return __alloc_pages_node_noprof(nid, gfp_mask, order);
+ VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+ warn_if_node_offline(nid, gfp_mask);
+
+ return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
}
#define alloc_pages_node(...) alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 14/18] mm: Move __alloc_pages() to mm/page_alloc.h
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (12 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 15:05 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
` (4 subsequent siblings)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
It's no longer used outside of mm/.
Since this means __alloc_pages_noprof() is no longer visible from gfp.h,
this also means moving the definition of alloc_pages_node_noprof into
the .c file.
Also remove references to this API from the documentation tree -
referring to the specific function name was already questionable but
now the function is not even public it definitely seems wrong.
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Documentation/admin-guide/cgroup-v1/cpusets.rst | 2 +-
Documentation/admin-guide/mm/transhuge.rst | 2 +-
include/linux/gfp.h | 16 +---------------
mm/page_alloc.c | 13 ++++++++++++-
mm/page_alloc.h | 4 ++++
5 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
index c7909e5ac1361..52a213aff04e5 100644
--- a/Documentation/admin-guide/cgroup-v1/cpusets.rst
+++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
@@ -284,7 +284,7 @@ take action.
==>
Unless this feature is enabled by writing "1" to the special file
/dev/cpuset/memory_pressure_enabled, the hook in the rebalance
- code of __alloc_pages() for this metric reduces to simply noticing
+ code of the page allocator for this metric reduces to simply noticing
that the cpuset_memory_pressure_enabled flag is zero. So only
systems that enable this feature will compute the metric.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 23f8d13c2629d..16f37135ed80d 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -761,7 +761,7 @@ compact_fail
but failed.
It is possible to establish how long the stalls were using the function
-tracer to record how long was spent in __alloc_pages() and
+tracer to record how long was spent in the page allocator and
using the mm_page_alloc tracepoint to identify which allocations were
for huge pages.
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3bf55a5f9143e..4d57e9c0bf204 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -204,10 +204,6 @@ static inline void arch_free_page(struct page *page, int order) { }
static inline void arch_alloc_page(struct page *page, int order) { }
#endif
-struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
- nodemask_t *nodemask);
-#define __alloc_pages(...) alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
-
struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
#define __folio_alloc(...) alloc_hooks(__folio_alloc_noprof(__VA_ARGS__))
@@ -272,17 +268,7 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
* prefer the current CPU's closest node. Otherwise node must be valid and
* online.
*/
-static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
- unsigned int order)
-{
- if (nid == NUMA_NO_NODE)
- nid = numa_mem_id();
-
- VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
- warn_if_node_offline(nid, gfp_mask);
-
- return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
-}
+struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order);
#define alloc_pages_node(...) alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c2839959d7908..f68b2b138a2e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5431,7 +5431,18 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
set_page_refcounted(page);
return page;
}
-EXPORT_SYMBOL(__alloc_pages_noprof);
+
+struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
+{
+ if (nid == NUMA_NO_NODE)
+ nid = numa_mem_id();
+
+ VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+ warn_if_node_offline(nid, gfp_mask);
+
+ return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
+}
+EXPORT_SYMBOL(alloc_pages_node_noprof);
struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask)
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index 2d60551b4453f..aa0c1481f7ca3 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -244,6 +244,10 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
alloc_hooks(alloc_frozen_pages_nolock_noprof(__VA_ARGS__))
void free_frozen_pages_nolock(struct page *page, unsigned int order);
+struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
+ nodemask_t *nodemask);
+#define __alloc_pages(...) alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
+
extern void zone_pcp_reset(struct zone *zone);
extern void zone_pcp_disable(struct zone *zone);
extern void zone_pcp_enable(struct zone *zone);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (13 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 16/18] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
` (3 subsequent siblings)
18 siblings, 0 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
Now that alloc_pages has an entrypoint that allows passing alloc_flags,
we can take advantage of this to start removing GFP flags that are only
used for mm-internal stuff.
This requires also plumbing the alloc_flags into some more of the
allocator code, in particular __alloc_pages[_noprof]() gets an
alloc_flags arg to go along with its callees, and we now need to pass
those flags deeper into the allocator so they can reach the alloc_tag
code.
While moving the flag definition into page_alloc.h, also update the
comment per Hao's suggestion.
No functional change intended.
Link: https://lore.kernel.org/all/b4916118-3537-4e19-8bc8-1d103dd0d225@linux.dev/
Tested-by: Hao Ge <hao.ge@linux.dev>
Acked-by: Hao Ge <hao.ge@linux.dev>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/alloc_tag.h | 4 ++--
mm/alloc_tag.c | 23 +++++++--------------
mm/compaction.c | 4 ++--
mm/page_alloc.c | 52 +++++++++++++++++++++++++++--------------------
mm/page_alloc.h | 14 +++++++++++--
mm/page_frag_cache.c | 4 ++--
6 files changed, 55 insertions(+), 46 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 068ba2e77c5d6..fcf90e6b24204 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -163,11 +163,11 @@ static inline void alloc_tag_sub_check(union codetag_ref *ref)
{
WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
}
-void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags);
+void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags);
#else
static inline void alloc_tag_add_check(union codetag_ref *ref, struct alloc_tag *tag) {}
static inline void alloc_tag_sub_check(union codetag_ref *ref) {}
-static inline void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags) {}
+static inline void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags) {}
#endif
/* Caller should verify both ref and tag to be valid */
diff --git a/mm/alloc_tag.c b/mm/alloc_tag.c
index d9be1cf5187d9..cf65e9992fda3 100644
--- a/mm/alloc_tag.c
+++ b/mm/alloc_tag.c
@@ -15,6 +15,9 @@
#include <linux/vmalloc.h>
#include <linux/kmemleak.h>
+#include "internal.h"
+#include "page_alloc.h"
+
#define ALLOCINFO_FILE_NAME "allocinfo"
#define MODULE_ALLOC_TAG_VMAP_SIZE (100000UL * sizeof(struct alloc_tag))
#define SECTION_START(NAME) (CODETAG_SECTION_START_PREFIX NAME)
@@ -783,19 +786,6 @@ struct pfn_pool {
#define PFN_POOL_SIZE ((PAGE_SIZE - offsetof(struct pfn_pool, pfns)) / \
sizeof(unsigned long))
-
-/*
- * Skip early PFN recording for a page allocation. Reuses the
- * %__GFP_NO_OBJ_EXT bit. Used by __alloc_tag_add_early_pfn() to avoid
- * recursion when allocating pages for the early PFN tracking list
- * itself.
- *
- * Codetags of the pages allocated with __GFP_NO_CODETAG should be
- * cleared (via clear_page_tag_ref()) before freeing the pages to prevent
- * alloc_tag_sub_check() from triggering a warning.
- */
-#define __GFP_NO_CODETAG __GFP_NO_OBJ_EXT
-
static struct pfn_pool *current_pfn_pool __initdata;
static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
@@ -806,7 +796,8 @@ static void __init __alloc_tag_add_early_pfn(unsigned long pfn)
do {
pool = READ_ONCE(current_pfn_pool);
if (!pool || atomic_read(&pool->count) >= PFN_POOL_SIZE) {
- struct page *new_page = alloc_page(__GFP_HIGH | __GFP_NO_CODETAG);
+ struct page *new_page = __alloc_pages(__GFP_HIGH, 0, numa_mem_id(),
+ NULL, ALLOC_NO_CODETAG);
struct pfn_pool *new;
if (!new_page) {
@@ -837,7 +828,7 @@ typedef void alloc_tag_add_func(unsigned long pfn);
static alloc_tag_add_func __rcu *alloc_tag_add_early_pfn_ptr __refdata =
RCU_INITIALIZER(__alloc_tag_add_early_pfn);
-void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
+void alloc_tag_add_early_pfn(unsigned long pfn, unsigned int alloc_flags)
{
alloc_tag_add_func *alloc_tag_add;
@@ -845,7 +836,7 @@ void alloc_tag_add_early_pfn(unsigned long pfn, gfp_t gfp_flags)
return;
/* Skip allocations for the tracking list itself to avoid recursion. */
- if (gfp_flags & __GFP_NO_CODETAG)
+ if (alloc_flags & ALLOC_NO_CODETAG)
return;
rcu_read_lock();
diff --git a/mm/compaction.c b/mm/compaction.c
index 7d80735502d9a..4b2318fad4eb5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -83,7 +83,7 @@ static inline bool is_via_compact_memory(int order) { return false; }
static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
{
- post_alloc_hook(page, order, __GFP_MOVABLE);
+ post_alloc_hook(page, order, __GFP_MOVABLE, ALLOC_DEFAULT);
set_page_refcounted(page);
return page;
}
@@ -1851,7 +1851,7 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
}
dst = (struct folio *)freepage;
- post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
+ post_alloc_hook(&dst->page, order, __GFP_MOVABLE, ALLOC_DEFAULT);
set_page_refcounted(&dst->page);
if (order)
prep_compound_page(&dst->page, order);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f68b2b138a2e8..cfaf16244f56d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1249,7 +1249,7 @@ void __clear_page_tag_ref(struct page *page)
/* Should be called only if mem_alloc_profiling_enabled() */
static noinline
void __pgalloc_tag_add(struct page *page, struct task_struct *task,
- unsigned int nr, gfp_t gfp_flags)
+ unsigned int nr, unsigned int alloc_flags)
{
union pgtag_ref_handle handle;
union codetag_ref ref;
@@ -1263,17 +1263,17 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
* page_ext is not available yet, record the pfn so we can
* clear the tag ref later when page_ext is initialized.
*/
- alloc_tag_add_early_pfn(page_to_pfn(page), gfp_flags);
+ alloc_tag_add_early_pfn(page_to_pfn(page), alloc_flags);
if (task->alloc_tag)
alloc_tag_set_inaccurate(task->alloc_tag);
}
}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
- unsigned int nr, gfp_t gfp_flags)
+ unsigned int nr, unsigned int alloc_flags)
{
if (mem_alloc_profiling_enabled())
- __pgalloc_tag_add(page, task, nr, gfp_flags);
+ __pgalloc_tag_add(page, task, nr, alloc_flags);
}
/* Should be called only if mem_alloc_profiling_enabled() */
@@ -1306,7 +1306,7 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
#else /* CONFIG_MEM_ALLOC_PROFILING */
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
- unsigned int nr, gfp_t gfp_flags) {}
+ unsigned int nr, unsigned int alloc_flags) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
@@ -1810,7 +1810,7 @@ static inline bool should_skip_init(gfp_t flags)
}
inline void post_alloc_hook(struct page *page, unsigned int order,
- gfp_t gfp_flags)
+ gfp_t gfp_flags, unsigned int alloc_flags)
{
const bool zero_tags = gfp_flags & __GFP_ZEROTAGS;
bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
@@ -1861,13 +1861,13 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
set_page_owner(page, order, gfp_flags);
page_table_check_alloc(page, order);
- pgalloc_tag_add(page, current, 1 << order, gfp_flags);
+ pgalloc_tag_add(page, current, 1 << order, alloc_flags);
}
static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
unsigned int alloc_flags)
{
- post_alloc_hook(page, order, gfp_flags);
+ post_alloc_hook(page, order, gfp_flags, alloc_flags);
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
@@ -4078,7 +4078,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
*/
page = get_page_from_freelist((gfp_mask | __GFP_HARDWALL) &
~__GFP_DIRECT_RECLAIM, order,
- ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
+ ac->alloc_flags|ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
if (page)
goto out;
@@ -4124,7 +4124,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
*/
if (gfp_mask & __GFP_NOFAIL)
page = __alloc_pages_cpuset_fallback(gfp_mask, order,
- ALLOC_NO_WATERMARKS, ac);
+ ac->alloc_flags|ALLOC_NO_WATERMARKS, ac);
}
out:
mutex_unlock(&oom_lock);
@@ -4791,8 +4791,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* The fast path uses conservative alloc_flags to succeed only until
* kswapd needs to be woken up, and to avoid the cost of setting up
* alloc_flags precisely. So we do that now.
+ *
+ * Can't just or alloc_flags if it contains WMARK bits, but those flags
+ * shouldn't be set in ac->alloc_flags.
*/
- alloc_flags = alloc_flags_slowpath(gfp_mask, order);
+ VM_WARN_ON(ac->alloc_flags & ALLOC_WMARK_MASK);
+ alloc_flags = ac->alloc_flags | alloc_flags_slowpath(gfp_mask, order);
/*
* We need to recalculate the starting point for the zonelist iterator
@@ -4834,7 +4838,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
if (reserve_flags)
alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
- (alloc_flags & ALLOC_KSWAPD);
+ ac->alloc_flags | (alloc_flags & ALLOC_KSWAPD);
/*
* Reset the nodemask and zonelist iterators if memory policies can be
@@ -5003,6 +5007,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* we always retry
*/
if (unlikely(nofail)) {
+ unsigned int alloc_flags = ac->alloc_flags | ALLOC_MIN_RESERVE;
+
/*
* Lacking direct_reclaim we can't do anything to reclaim memory,
* we disregard these unreasonable nofail requests and still
@@ -5018,7 +5024,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* could deplete whole memory reserves which would just make
* the situation worse.
*/
- page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
+ page = __alloc_pages_cpuset_fallback(gfp_mask, order, alloc_flags, ac);
if (page)
goto got_pg;
@@ -5236,7 +5242,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
return nr_populated;
failed:
- page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask);
+ page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask, ALLOC_DEFAULT);
if (page)
page_array[nr_populated++] = page;
goto out;
@@ -5344,11 +5350,13 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
{
struct page *page;
gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
- struct alloc_context ac = { };
+ struct alloc_context ac = {
+ .alloc_flags = alloc_flags,
+ };
unsigned int fastpath_alloc_flags = alloc_flags;
/* Other flags could be supported later if needed. */
- if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
+ if (WARN_ON(alloc_flags & ~(ALLOC_NOLOCK | ALLOC_NO_CODETAG)))
return NULL;
if (!alloc_order_allowed(gfp, order, alloc_flags))
@@ -5421,12 +5429,12 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
EXPORT_SYMBOL(__alloc_frozen_pages_noprof);
struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
- int preferred_nid, nodemask_t *nodemask)
+ int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
{
struct page *page;
page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
- ALLOC_DEFAULT);
+ alloc_flags);
if (page)
set_page_refcounted(page);
return page;
@@ -5440,7 +5448,7 @@ struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp_mask);
- return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
+ return __alloc_pages_noprof(gfp_mask, order, nid, NULL, ALLOC_DEFAULT);
}
EXPORT_SYMBOL(alloc_pages_node_noprof);
@@ -5448,7 +5456,7 @@ struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_
nodemask_t *nodemask)
{
struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
- preferred_nid, nodemask);
+ preferred_nid, nodemask, ALLOC_DEFAULT);
return page_rmappable_folio(page);
}
EXPORT_SYMBOL(__folio_alloc_noprof);
@@ -7130,7 +7138,7 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
list_for_each_entry_safe(page, next, &list[order], lru) {
int i;
- post_alloc_hook(page, order, gfp_mask);
+ post_alloc_hook(page, order, gfp_mask, ALLOC_DEFAULT);
if (!order)
continue;
@@ -7335,7 +7343,7 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
struct page *head = pfn_to_page(start);
check_new_pages(head, order);
- prep_new_page(head, order, gfp_mask, 0);
+ prep_new_page(head, order, gfp_mask, ALLOC_DEFAULT);
} else {
ret = -EINVAL;
WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
diff --git a/mm/page_alloc.h b/mm/page_alloc.h
index aa0c1481f7ca3..b9259deddb59d 100644
--- a/mm/page_alloc.h
+++ b/mm/page_alloc.h
@@ -49,6 +49,13 @@
#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
#define ALLOC_NOLOCK 0x400 /* Only use spin_trylock in allocation path */
#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+/*
+ * Avoid alloc_tag recursion for internal allocations.
+ *
+ * Callers must clear_page_tag_ref() before freeing to avoid warnings from
+ * alloc_tag_sub_check().
+ */
+#define ALLOC_NO_CODETAG 0x1000
/* Flags that allow allocations below the min watermark. */
#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
@@ -84,6 +91,8 @@ struct alloc_context {
*/
enum zone_type highest_zoneidx;
bool spread_dirty_pages;
+ /* Only flags that are global to the whole allocation go here. */
+ unsigned int alloc_flags;
};
/*
@@ -214,7 +223,8 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
extern void __free_pages_core(struct page *page, unsigned int order,
enum meminit_context context);
-void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
+void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
+ unsigned int alloc_flags);
extern bool free_pages_prepare(struct page *page, unsigned int order);
extern int user_min_free_kbytes;
@@ -245,7 +255,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
void free_frozen_pages_nolock(struct page *page, unsigned int order);
struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
- nodemask_t *nodemask);
+ nodemask_t *nodemask, unsigned int alloc_flags);
#define __alloc_pages(...) alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
extern void zone_pcp_reset(struct zone *zone);
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index a1077cef3a791..e63efe78b7d4b 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -57,10 +57,10 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP |
__GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
- numa_mem_id(), NULL);
+ numa_mem_id(), NULL, ALLOC_DEFAULT);
#endif
if (unlikely(!page)) {
- page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
+ page = __alloc_pages(gfp, 0, numa_mem_id(), NULL, ALLOC_DEFAULT);
order = 0;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 16/18] mm: remove the __GFP_NO_OBJ_EXT flag
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (14 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
` (2 subsequent siblings)
18 siblings, 0 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
All users of the flag are converted to SLAB_ALLOC_NO_RECURSE or
ALLOC_NO_CODETAG (from __GFP_NO_CODETAG which reused the NO_OBJ_EXT bit).
Free up the flag bit.
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
[Rebased onto __GFP_NO_CODETAG removal]
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/gfp_types.h | 7 -------
include/trace/events/mmflags.h | 10 +---------
tools/include/linux/gfp_types.h | 7 -------
3 files changed, 1 insertion(+), 23 deletions(-)
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 463b551d12d99..190191411009f 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -55,7 +55,6 @@ enum {
#ifdef CONFIG_LOCKDEP
___GFP_NOLOCKDEP_BIT,
#endif
- ___GFP_NO_OBJ_EXT_BIT,
___GFP_LAST_BIT
};
@@ -96,7 +95,6 @@ enum {
#else
#define ___GFP_NOLOCKDEP 0
#endif
-#define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT)
/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -138,17 +136,12 @@ enum {
*
* %__GFP_ACCOUNT causes the allocation to be accounted to the active
* cgroup context.
- *
- * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
- * mark_obj_codetag_empty() should be called upon freeing for objects allocated
- * with this flag to indicate that their NULL tags are expected and normal.
*/
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE)
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL)
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)
#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT)
-#define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT)
/**
* DOC: Watermark modifiers
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a6e5a44c9b429..c1a05ff0feab0 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -54,18 +54,10 @@
# define TRACE_GFP_FLAGS_LOCKDEP
#endif
-#ifdef CONFIG_SLAB_OBJ_EXT
-# define TRACE_GFP_FLAGS_SLAB \
- TRACE_GFP_EM(NO_OBJ_EXT)
-#else
-# define TRACE_GFP_FLAGS_SLAB
-#endif
-
#define TRACE_GFP_FLAGS \
TRACE_GFP_FLAGS_GENERAL \
TRACE_GFP_FLAGS_KASAN \
- TRACE_GFP_FLAGS_LOCKDEP \
- TRACE_GFP_FLAGS_SLAB
+ TRACE_GFP_FLAGS_LOCKDEP
#undef TRACE_GFP_EM
#define TRACE_GFP_EM(a) TRACE_DEFINE_ENUM(___GFP_##a##_BIT);
diff --git a/tools/include/linux/gfp_types.h b/tools/include/linux/gfp_types.h
index 6c75df30a281d..a93b8bd200b76 100644
--- a/tools/include/linux/gfp_types.h
+++ b/tools/include/linux/gfp_types.h
@@ -55,7 +55,6 @@ enum {
#ifdef CONFIG_LOCKDEP
___GFP_NOLOCKDEP_BIT,
#endif
- ___GFP_NO_OBJ_EXT_BIT,
___GFP_LAST_BIT
};
@@ -96,7 +95,6 @@ enum {
#else
#define ___GFP_NOLOCKDEP 0
#endif
-#define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT)
/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -137,17 +135,12 @@ enum {
* node with no fallbacks or placement policy enforcements.
*
* %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
- *
- * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
- * mark_obj_codetag_empty() should be called upon freeing for objects allocated
- * with this flag to indicate that their NULL tags are expected and normal.
*/
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE)
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL)
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)
#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT)
-#define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT)
/**
* DOC: Watermark modifiers
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma()
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (15 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 16/18] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 15:10 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 18/18] mm: factor out can_spin_trylock() Brendan Jackman
2026-07-03 12:47 ` [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Vlastimil Babka (SUSE)
18 siblings, 1 reply; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
To align the style with other alloc_flags_*() functions, drop this
additive argument and just have the callers do that themselves.
Note you can't always freely or alloc_flags like these callers do
(because of the WMARK bits that encode an enum) but this is fine for
ALLOC_CMA, just like it's fine for e.g. ALLOC_NON_BLOCK returned by
alloc_flags_nonblocking() and or'd by its caller.
Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Link: https://lore.kernel.org/all/5dcdd1ef-21ad-4ed0-9e8a-0e5cf96b4392@kernel.org/
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/page_alloc.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfaf16244f56d..c3b246e67ed14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3775,14 +3775,13 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
}
/* Must be called after current_gfp_context() which can change gfp_mask */
-static inline unsigned int alloc_flags_cma(gfp_t gfp_mask,
- unsigned int alloc_flags)
+static inline unsigned int alloc_flags_cma(gfp_t gfp_mask)
{
#ifdef CONFIG_CMA
if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
- alloc_flags |= ALLOC_CMA;
+ return ALLOC_CMA;
#endif
- return alloc_flags;
+ return ALLOC_DEFAULT;
}
/*
@@ -4526,7 +4525,7 @@ alloc_flags_slowpath(gfp_t gfp_mask, unsigned int order)
} else if (unlikely(rt_or_dl_task(current)) && in_task())
alloc_flags |= ALLOC_MIN_RESERVE;
- alloc_flags = alloc_flags_cma(gfp_mask, alloc_flags);
+ alloc_flags |= alloc_flags_cma(gfp_mask);
if (defrag_mode)
alloc_flags |= ALLOC_NOFRAGMENT;
@@ -4837,7 +4836,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
if (reserve_flags)
- alloc_flags = alloc_flags_cma(gfp_mask, reserve_flags) |
+ alloc_flags = alloc_flags_cma(gfp_mask) | reserve_flags |
ac->alloc_flags | (alloc_flags & ALLOC_KSWAPD);
/*
@@ -5070,7 +5069,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
should_fail_alloc_page(gfp_mask, order))
return false;
- *alloc_flags = alloc_flags_cma(gfp_mask, *alloc_flags);
+ *alloc_flags |= alloc_flags_cma(gfp_mask);
/* Dirty zone balancing only done in the fast path */
ac->spread_dirty_pages = (gfp_mask & __GFP_WRITE);
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v5 18/18] mm: factor out can_spin_trylock()
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (16 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
@ 2026-07-03 12:31 ` Brendan Jackman
2026-07-03 12:55 ` sashiko-bot
2026-07-03 15:12 ` Zi Yan
2026-07-03 12:47 ` [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Vlastimil Babka (SUSE)
18 siblings, 2 replies; 41+ messages in thread
From: Brendan Jackman @ 2026-07-03 12:31 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner,
Alexei Starovoitov, Matthew Wilcox, Hao Ge, linux-mm,
linux-kernel, linux-rt-devel, derkling, reijiw, Brendan Jackman,
Yosry Ahmed
Deduplicate checks for whether the current context is safe for
spin_trylock().
Does this function really belong in mm/internal.h or is it generic? Not
sure. If someone ends up duplicating this logic elsewhere in the kernel,
that would be a shame. But if it goes in some generic header, someone
treats it as documentation about where it's guaranteed safe to
spin_trylock(), and then it emerges that there are other subtle
preconditions that didn't affect the mm usecase, that would be worse.
So, just be conservative and keep it local.
Suggested-by: Harry Yoo <harry@kernel.org>
Link: https://lore.kernel.org/all/397859cb-b127-4cc6-9c71-044afc99bf0c@kernel.org/
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/internal.h | 23 +++++++++++++++++++++++
mm/page_alloc.c | 17 +----------------
mm/slub.c | 10 +---------
3 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 7e3b2386e274b..0ae6ad2265125 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1716,4 +1716,27 @@ static inline void mm_prepare_for_swap_entries(struct mm_struct *mm)
}
}
+static inline bool can_spin_trylock(void)
+{
+ /*
+ * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
+ * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
+ * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
+ * mark the task as the owner of another rt_spin_lock which will
+ * confuse PI logic, so return immediately if called from hard IRQ or
+ * NMI.
+ *
+ * Note, irqs_disabled() case is ok. spin_trylock() can be called
+ * from raw_spin_lock_irqsave region.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
+ return false;
+
+ /* On UP, spin_trylock() always succeeds even when it is locked */
+ if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+ return false;
+
+ return true;
+}
+
#endif /* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3b246e67ed14..a63733dac659e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5291,22 +5291,7 @@ static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
static inline bool alloc_nolock_allowed(void)
{
- /*
- * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
- * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
- * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
- * mark the task as the owner of another rt_spin_lock which will
- * confuse PI logic, so return immediately if called from hard IRQ or
- * NMI.
- *
- * Note, irqs_disabled() case is ok. This function can be called
- * from raw_spin_lock_irqsave region.
- */
- if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
- return false;
-
- /* On UP, spin_trylock() always succeeds even when it is locked */
- if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+ if (!can_spin_trylock())
return false;
/* Bailout, since _deferred_grow_zone() needs to take a lock */
diff --git a/mm/slub.c b/mm/slub.c
index 3989b4758ae0a..b19dc46de73c5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5408,15 +5408,7 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f
if (unlikely(!size))
return ZERO_SIZE_PTR;
- /*
- * See the comment for the same check in
- * alloc_frozen_pages_nolock_noprof()
- */
- if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
- return NULL;
-
- /* On UP, spin_trylock() always succeeds even when it is locked */
- if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+ if (!can_spin_trylock())
return NULL;
retry:
--
2.54.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs
2026-07-03 12:31 ` [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
@ 2026-07-03 12:43 ` sashiko-bot
2026-07-03 14:44 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: sashiko-bot @ 2026-07-03 12:43 UTC (permalink / raw)
To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The kernel-doc comment for the public API `alloc_pages_nolock` was not updated to reflect the relaxed GFP flags.
--
commit 311d28c3e8fc444b6e80f8dca47557fd6cfebc1f
Author: Brendan Jackman <jackmanb@google.com>
mm/page_alloc: relax GFP WARN in nolock allocs
This commit relaxes the VM_WARN_ON_ONCE check in nolock allocations to
allow flags included in gfp_nolock. It updates the validation to permit
callers to provide a wider set of flags safely.
Link: https://lore.kernel.org/linux-mm/ajS96fWbG4dzP3u3@casper.infradead.org/
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f47a848555077..c2839959d7908 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5355,7 +5355,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> return NULL;
>
> if (alloc_flags & ALLOC_NOLOCK) {
> - VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> + /* Certain other flags could be supported later if needed. */
> + VM_WARN_ON_ONCE(gfp & ~(__GFP_ACCOUNT | gfp_nolock));
[Severity: Low]
Does the kernel-doc comment for alloc_pages_nolock() also need to be
updated to reflect this relaxation?
Currently in mm/page_alloc.c, the documentation still states:
* @gfp_flags: GFP flags. Only __GFP_ACCOUNT allowed.
This might mislead callers who refer to the API documentation into
believing they can strictly only pass __GFP_ACCOUNT, even though the
implementation now permits other flags included in gfp_nolock.
> if (!alloc_nolock_allowed())
> return NULL;
> gfp |= gfp_nolock;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260703-alloc-trylock-v5-0-c87b714e19d3@google.com?part=6
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 00/18] mm: Some cleanups for page allocator APIs
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
` (17 preceding siblings ...)
2026-07-03 12:31 ` [PATCH v5 18/18] mm: factor out can_spin_trylock() Brendan Jackman
@ 2026-07-03 12:47 ` Vlastimil Babka (SUSE)
18 siblings, 0 replies; 41+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-07-03 12:47 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, JP Kobryn, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Sean Christopherson, Paolo Bonzini, kvm,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Robin Holt, Steve Wahl, Arnd Bergmann,
Greg Kroah-Hartman, Dimitris Michailidis, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
On 7/3/26 14:31, Brendan Jackman wrote:
> Based on mm-new.
Seems ready for mm-new to me. I suspect most further non-trivial page
allocator work will conflict, so better have it in mm to be based upon.
I was worried about Johannes' defrag fixes [1] that were supposed to be
stable bacported and thus would be better to add (the next revision) to
mm-new first. But seems like if I apply this series on mm-new, and then [1]
on top, nothing conflicts, so the ordering shouldn't matter.
[1] https://lore.kernel.org/all/20260626182215.1107966-1-hannes@cmpxchg.org/
> This depends on moving alloc_tag to mm/:
> https://lore.kernel.org/all/aj5QBtJcphPElczI@lucifer/
>
> Some tweaks and cleanups for page allocator entrypoint and flags. This
> is motivated by preparation for __GFP_UNMAPPED [1] (which will probably
> become ALLOC_UNMAPPED in its next iteration), but all this is supposed
> to be an improvement to the codebase in its own right: unifying code
> paths, reducing API surface, and removing GFP flags.
>
> [1] https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/
>
> This started with unifying __alloc_frozen_pages[_nolock]_noprof() and
> expanded from there.
>
> Unifying the nolock allocator entrypoint with the normal allocator
> entrypoint means adding an alloc_flags argument to the later (only
> exposed within mm/). This presents an opportunity to take advantage of
> that arg to remove some GFP flags, if we add that alloc_flags arg a bit
> more broadly to allocator entrypoints.
>
> To distinguish between mm-internal and "public" allocator entrypoints,
> it makes sense to use the __ prefix. There are already some public APIs
> with that prefix. For *alloc_pages*, just removing those variants seems
> like a nice cleanup anyway, so do that. For get_free_pages, the "__"
> variant is the _only_ variant and it's very widely used, so it doesn't
> seem worthwhile to modify that. Therefore, scope this "__" change
> specifically to the *alloc_pages* API, which means we leave the
> *folio_alloc* API untouched too, even though that could probably be
> cleaned up if so desired.
>
> Tested:
>
> - KVM, mm, and BPF selftests in a QEMU VM
>
> - kunit.py on x86_64
>
> - For the ALLOC_NO_CODETAG bits I just booted a VM and read
> /proc/allocinfo. I confirmed that if I remove ALLOC_NO_CODETAG, the
> kernel crashes in early boot, so I was at least booting code that
> depends on this logic.
>
> I used Google's internal version of Antigravity (AI coding harness) to
> do the repetitive bits, those commits are marked with Assisted-by, the
> rest is manual.
>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> Changes in v5:
> - Just trivial non-functional fixes.
> - Link to v4: https://patch.msgid.link/20260702-alloc-trylock-v4-0-0af8ff387e80@google.com
>
> Changes in v4:
> - Fixed some (harmless) missing applications of ac->alloc_flags (local
> Sashiko)
> - Fixed various build issues.
> - Note that Sashiko pointed out a KMSAN build issue [0], I have
> fixed it but KMSAN builds are currently broken by objtool [1]. At least
> mm/kmsan/init.c compiles.
> [2] https://lore.kernel.org/all/20260629141642.628271F00A3D@smtp.kernel.org/
> [3] https://lore.kernel.org/all/20260630104434.GC751831@noisy.programming.kicks-ass.net/t/#u
> - Avoided setting ALLOC_NOFRAGMENT under ALLOC_NOLOCK (Sashiko, Harry)
> - Added patch to tweak alloc_flags_cma() interface (Vlastimil)
> - More commit messages fixups (various)
> - Added patch to create can_spin_trylock() (Harry)
> - Link to v3: https://patch.msgid.link/20260629-alloc-trylock-v3-0-57bef0eadbc2@google.com
>
> Changes in v3:
> - Created mm/page_alloc.h
> - Fixed EXPORT_SYMBOL() issues
> - Reworded commit messages per Sashiko's pointers
> - Dropped rename of alloc_flags arg in prepare_alloc_pages() (Suren)
> - Renamed gfp_to_alloc_flags_nonblocking() too after rebasing onto:
> https://lore.kernel.org/all/20260623004600.113347-1-jp.kobryn@linux.dev/
> - Link to v2: https://patch.msgid.link/20260622-alloc-trylock-v2-0-31f31367d420@google.com
>
> Changes in v2:
> - Fixed up whitespace in nolock unification patch
> - Introduced ALLOC_DEFAULT to replace literal 0 for alloc_flags
> - All other patches are new
> - Link to v1: https://patch.msgid.link/20260617-alloc-trylock-v1-1-83fd7858832e@google.com
>
> ---
> Brendan Jackman (17):
> mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
> mm/page_alloc: some renames to clarify alloc_flags scopes
> mm: name some args in a function declaration
> mm: Split out internal page_alloc.h
> mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
> mm/page_alloc: relax GFP WARN in nolock allocs
> mm: move some stuff to mm/page_alloc.h
> perf/x86/intel: Use higher-level allocator API
> KVM: VMX: Use higher-level allocator API
> x86/virt: Use higher-level allocator API
> sgi-xp: Use higher-level allocator API
> net/funeth: Switch to higher-level allocator API
> mm: Remove __alloc_pages_node()
> mm: Move __alloc_pages() to mm/page_alloc.h
> mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG
> mm/page_alloc: drop alloc_flags arg from alloc_flags_cma()
> mm: factor out can_spin_trylock()
>
> Vlastimil Babka (SUSE) (1):
> mm: remove the __GFP_NO_OBJ_EXT flag
>
> Documentation/admin-guide/cgroup-v1/cpusets.rst | 2 +-
> Documentation/admin-guide/mm/transhuge.rst | 2 +-
> MAINTAINERS | 1 +
> arch/x86/events/intel/ds.c | 6 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> arch/x86/virt/hw.c | 2 +-
> drivers/misc/sgi-xp/xpc_uv.c | 5 +-
> drivers/net/ethernet/fungible/funeth/funeth_rx.c | 2 +-
> include/linux/alloc_tag.h | 4 +-
> include/linux/gfp.h | 54 +---
> include/linux/gfp_types.h | 7 -
> include/linux/skbuff.h | 2 +-
> include/trace/events/mmflags.h | 10 +-
> mm/alloc_tag.c | 23 +-
> mm/compaction.c | 5 +-
> mm/hugetlb.c | 4 +-
> mm/internal.h | 275 ++------------------
> mm/khugepaged.c | 1 +
> mm/kmsan/init.c | 2 +-
> mm/memory-failure.c | 1 +
> mm/memory_hotplug.c | 1 +
> mm/mempolicy.c | 11 +-
> mm/migrate.c | 1 +
> mm/mm_init.c | 1 +
> mm/page_alloc.c | 269 ++++++++++---------
> mm/page_alloc.h | 312 +++++++++++++++++++++++
> mm/page_frag_cache.c | 6 +-
> mm/page_isolation.c | 1 +
> mm/page_owner.c | 2 +-
> mm/page_reporting.c | 1 +
> mm/show_mem.c | 1 +
> mm/shuffle.c | 1 +
> mm/slub.c | 17 +-
> mm/swap.c | 1 +
> mm/vmscan.c | 1 +
> mm/vmstat.c | 1 +
> tools/include/linux/gfp_types.h | 7 -
> 37 files changed, 536 insertions(+), 508 deletions(-)
> ---
> base-commit: 32af3ff0925368eff29b2fed62f154150eb5dc10
> change-id: 20260617-alloc-trylock-14ad37dab337
>
> Best regards,
> --
> Brendan Jackman <jackmanb@google.com>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 11/18] sgi-xp: Use higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 11/18] sgi-xp: " Brendan Jackman
@ 2026-07-03 12:48 ` sashiko-bot
2026-07-03 14:51 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: sashiko-bot @ 2026-07-03 12:48 UTC (permalink / raw)
To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] In `xpc_init_mq_node()`, the loop iterates over CPUs but mistakenly passes the `nid` (node ID) variable to `xpc_create_gru_mq_uv()` instead of the `cpu` iterator.
--
commit 60dc55f94c78f2b8722b9d132aa85a4603393bdd
Author: Brendan Jackman <jackmanb@google.com>
sgi-xp: Use higher-level allocator API
The difference between __alloc_pages_node() and alloc_pages_node() is
that the latter allows you to pass NUMA_NO_NODE.
The former is going away and the latter works fine here so switch over.
No functional change intended.
> diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
> index 772c787268932..79c2f00ed4d70 100644
> --- a/drivers/misc/sgi-xp/xpc_uv.c
> +++ b/drivers/misc/sgi-xp/xpc_uv.c
> @@ -170,9 +170,8 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
> mq->mmr_blade = uv_cpu_to_blade_id(cpu);
>
> nid = cpu_to_node(cpu);
[Severity: High]
This isn't a bug introduced by this patch, but while looking at how the cpu
argument is handled here in xpc_create_gru_mq_uv(), I noticed an issue in
its caller.
In xpc_init_mq_node():
for_each_cpu(cpu, cpumask_of_node(nid)) {
xpc_activate_mq_uv =
xpc_create_gru_mq_uv(XPC_ACTIVATE_MQ_SIZE_UV, nid,
XPC_ACTIVATE_IRQ_NAME,
xpc_handle_activate_IRQ_uv);
Is it possible that the node ID (nid) is mistakenly passed into
xpc_create_gru_mq_uv() instead of the cpu iterator? This also happens a few
lines later when allocating xpc_notify_mq_uv.
Passing a node ID where a CPU index is expected will cause per-CPU lookups
like uv_cpu_to_blade_id() and cpu_to_node() to access data for the wrong CPU.
If the node ID exceeds nr_cpu_ids, this could also cause out of bounds memory
accesses.
> - page = __alloc_pages_node(nid,
> - GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
> - pg_order);
> + page = alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
> + pg_order);
> if (page == NULL) {
> dev_err(xpc_part, "xpc_create_gru_mq_uv() failed to alloc %d "
> "bytes of memory on nid=%d for GRU mq\n", mq_size, nid);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260703-alloc-trylock-v5-0-c87b714e19d3@google.com?part=11
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 12/18] net/funeth: Switch to higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 12/18] net/funeth: Switch to " Brendan Jackman
@ 2026-07-03 12:53 ` sashiko-bot
2026-07-03 14:52 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: sashiko-bot @ 2026-07-03 12:53 UTC (permalink / raw)
To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] Untrusted packet length from hardware CQE leads to stack buffer overflow and OOB memory corruption.
--
commit fd0faaa7ff432d936af6fe6e5da6c09cbf2f16eb
Author: Brendan Jackman <jackmanb@google.com>
net/funeth: Switch to higher-level allocator API
This commit replaces the deprecated __alloc_pages_node() with
alloc_pages_node(). The new API supports passing NUMA_NO_NODE. No functional
changes are intended by this switch.
> diff --git a/drivers/net/ethernet/fungible/funeth/funeth_rx.c b/drivers/net/ethernet/fungible/funeth/funeth_rx.c
> index 7e2584895de39..d7000017ac2bd 100644
> --- a/drivers/net/ethernet/fungible/funeth/funeth_rx.c
> +++ b/drivers/net/ethernet/fungible/funeth/funeth_rx.c
> @@ -103,7 +103,7 @@ static int funeth_alloc_page(struct funeth_rxq *q, struct funeth_rxbuf *rb,
> if (cache_get(q, rb))
> return 0;
>
> - p = __alloc_pages_node(node, gfp | __GFP_NOWARN, 0);
> + p = alloc_pages_node(node, gfp | __GFP_NOWARN, 0);
[Severity: Critical]
This is a pre-existing issue, but while reviewing this file I noticed a
potential stack buffer overflow when handling untrusted packet lengths
from the hardware CQE.
In fun_handle_cqe_pkt(), the untrusted packet length and fragment count
are read from the CQE:
unsigned int i, tot_len, pkt_len = be32_to_cpu(rxreq->pkt_len);
...
tot_len = pkt_len;
...
ref_ok = fun_gather_pkt(q, tot_len, frags);
...
skb = napi_build_skb(va, ALIGN(tot_len, FUN_EPRQ_PKT_ALIGN));
...
si->nr_frags = rxreq->nsgl;
for (i = 0; i < si->nr_frags; i++)
si->frags[i] = frags[i];
Does the loop in fun_gather_pkt() iterate until tot_len is exhausted without
validating the size?
for (;;) {
...
skb_frag_fill_page_desc(frags++, buf->page, q->buf_offset, frag_len);
tot_len -= frag_len;
if (!tot_len)
break;
}
If rxreq->pkt_len is maliciously large, could this increment the frags array
pointer past the bounds of the stack-allocated frags array (size 4),
resulting in a stack buffer overflow?
Furthermore, could napi_build_skb() use this untrusted length to place
skb_shared_info at an out-of-bounds offset, corrupting unrelated memory?
> if (unlikely(!p))
> return -ENOMEM;
>
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260703-alloc-trylock-v5-0-c87b714e19d3@google.com?part=12
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 13/18] mm: Remove __alloc_pages_node()
2026-07-03 12:31 ` [PATCH v5 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
@ 2026-07-03 12:54 ` sashiko-bot
2026-07-03 14:57 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: sashiko-bot @ 2026-07-03 12:54 UTC (permalink / raw)
To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] The patch removes `__alloc_pages_node()` to fix CPU hotplug bugs but completely misses the equivalent `__folio_alloc_node()` API, leaving it exposed with the same `NUMA_NO_NODE` vulnerability.
--
commit 4b93237c4920d8046053a32bcd0656e3236e97c9
Author: Brendan Jackman <jackmanb@google.com>
mm: Remove __alloc_pages_node()
There were only a few users, which have been removed. The only advantage
of this API over alloc_pages_node() is avoiding a single conditional
branch.
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 01d6d2591f49e..3bf55a5f9143e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
[ ... ]
> @@ -256,21 +256,6 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
> dump_stack();
> }
>
> -/*
> - * Allocate pages, preferring the node given as nid. The node must be valid and
> - * online. For more general interface, see alloc_pages_node().
> - */
> -static inline struct page *
> -__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
> -{
> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> - warn_if_node_offline(nid, gfp_mask);
> -
> - return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
> -}
> -
> -#define __alloc_pages_node(...) alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
> -
> static inline
> struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
> {
[Severity: Critical]
This is a pre-existing issue, but does __folio_alloc_node_noprof() suffer
from the exact same NUMA_NO_NODE vulnerability that you're fixing here?
This function lacks the fallback safety check for NUMA_NO_NODE that
alloc_pages_node() provides.
For example, looking at svc_prepare_thread() in net/sunrpc/svc.c, it calls
__folio_alloc_node() using a node ID that can be derived from cpu_to_node():
net/sunrpc/svc.c:svc_prepare_thread() {
...
rqstp->rq_scratch_folio = __folio_alloc_node(GFP_KERNEL, 0, node);
if (!rqstp->rq_scratch_folio)
goto out_enomem;
...
}
If a CPU hotplug event causes the node to be NUMA_NO_NODE, couldn't this
trigger the same out-of-bounds array access into NODE_DATA() and result in
the memory corruption mentioned in the commit message?
Should __folio_alloc_node() also be removed or replaced with a safe
folio_alloc_node() wrapper?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260703-alloc-trylock-v5-0-c87b714e19d3@google.com?part=13
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 18/18] mm: factor out can_spin_trylock()
2026-07-03 12:31 ` [PATCH v5 18/18] mm: factor out can_spin_trylock() Brendan Jackman
@ 2026-07-03 12:55 ` sashiko-bot
2026-07-03 15:12 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: sashiko-bot @ 2026-07-03 12:55 UTC (permalink / raw)
To: Brendan Jackman; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Unsafe trylock from NMI on UP systems due to missed deduplication in `free_unref_page_commit()`
--
commit b53c6ec11ab7f82add433847a77e54a590090942
Author: Brendan Jackman <jackmanb@google.com>
mm: factor out can_spin_trylock()
Deduplicate checks for whether the current context is safe for
spin_trylock().
Link: https://lore.kernel.org/all/397859cb-b127-4cc6-9c71-044afc99bf0c@kernel.org/
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c3b246e67ed14..a63733dac659e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
[ ... ]
> static inline bool alloc_nolock_allowed(void)
> {
[ ... ]
> - if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> - return false;
> -
> - /* On UP, spin_trylock() always succeeds even when it is locked */
> - if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> + if (!can_spin_trylock())
> return false;
[Severity: High]
This is a pre-existing issue, but could this deduplication also be applied to
free_unref_page_commit() in mm/page_alloc.c?
It looks like free_unref_page_commit() still uses an open-coded check:
if (unlikely((fpi_flags & FPI_TRYLOCK) && IS_ENABLED(CONFIG_PREEMPT_RT)
&& (in_nmi() || in_hardirq()))) {
add_page_to_zone_llist(zone, page, order);
return;
}
Since the open-coded check in free_unref_page_commit() doesn't check for UP
systems, would an asynchronous page free executed from an NMI context on a
uniprocessor system (e.g., via a BPF memory allocator) incorrectly proceed past
this check?
On UP systems, spin_trylock_irqsave() only disables interrupts and
unconditionally returns true, providing no mutual exclusion against the
interrupted context. Could this allow both the NMI and the interrupted process
to concurrently modify the zone list structures and corrupt memory?
Applying !can_spin_trylock() here seems like it would fulfill the intent of
this patch and resolve the UP + NMI case.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260703-alloc-trylock-v5-0-c87b714e19d3@google.com?part=18
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK
2026-07-03 12:31 ` [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
@ 2026-07-03 13:59 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 13:59 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> It's confusing that the function is called "nolock" but the flag is
> called "trylock", align them.
>
> The function's terminology is more visible and has more mindshare so use that.
>
> Suggested-by: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
> Link: https://lore.kernel.org/linux-mm/2399b3ad-4eac-4a14-94c3-27e9f07972a1@kernel.org/
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/internal.h | 2 +-
> mm/page_alloc.c | 10 +++++-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index fa4fb69444ecd..a2b09a13735bf 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1480,7 +1480,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> #define ALLOC_NOFRAGMENT 0x0
> #endif
> #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
> -#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */
> +#define ALLOC_NOLOCK 0x400 /* Only use spin_trylock in allocation path */
It is strange to me that _NOLOCK uses spin_trylock. Lock or no lock? :)
But it matches the _nolock function. Anyway,
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes
2026-07-03 12:31 ` [PATCH v5 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
@ 2026-07-03 14:01 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:01 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, JP Kobryn
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> It's pretty confusing that:
>
> - The slowpath and fastpath have a totally distinct set of alloc_flags.
>
> - gfp_to_alloc_flags() sounds generic but it only influences the
> slowpath.
>
> Rename some variables to highlight which alloc_flags are
> fastpath-specific. Rename gfp_to_alloc_flags() to highlight that it's
> slowpath-specific.
>
> gfp_to_alloc_flags_cma() and gfp_to_alloc_flags_nonblocking() currently
> have perfectly harmless names, but to keep the naming consistent also
> rename those to the alloc_flags_*() pattern (which already exists for
> alloc_flags_nofragment()).
>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Acked-by: JP Kobryn <jp.kobryn@linux.dev>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> include/linux/skbuff.h | 2 +-
> mm/page_alloc.c | 28 ++++++++++++++--------------
> 2 files changed, 15 insertions(+), 15 deletions(-)
>
LGTM.
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 03/18] mm: name some args in a function declaration
2026-07-03 12:31 ` [PATCH v5 03/18] mm: name some args in a function declaration Brendan Jackman
@ 2026-07-03 14:02 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:02 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> Checkpatch complains about this, a later patch will move the code, fix
> it so that checkpatch doesn't complain about that patch. Do it in a
> separate patch so the "move the code" patch is trivial to review using
> Git's diff colouring.
>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/internal.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
LGTM.
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 04/18] mm: Split out internal page_alloc.h
2026-07-03 12:31 ` [PATCH v5 04/18] mm: Split out internal page_alloc.h Brendan Jackman
@ 2026-07-03 14:07 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:07 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> internal.h is a bit bloated, seems like time for a page_alloc.h.
>
> Where it wasn't obvious, the heuristic for deciding what goes into this
> new header was "does it support/correspond to a definition in
> mm/page_alloc.c?"
>
> Only need to include it from ~20 .c files out of ~150 so this does seem
> like a genuine reduction in scopes, which is nice. And there's no
> circular internal.h<->page_alloc.h dependency, so it seems worthwhile to
> split this up before that inevitably emerges!
>
> Suggested-by: "David Hildenbrand (Arm)" <david@kernel.org>
> Link: https://lore.kernel.org/all/41e92bab-6882-401a-8de9-154adbdcfb36@kernel.org/
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> MAINTAINERS | 1 +
> mm/compaction.c | 1 +
> mm/hugetlb.c | 1 +
> mm/internal.h | 252 -----------------------------------------------
> mm/khugepaged.c | 1 +
> mm/kmsan/init.c | 2 +-
> mm/memory-failure.c | 1 +
> mm/memory_hotplug.c | 1 +
> mm/mempolicy.c | 1 +
> mm/migrate.c | 1 +
> mm/mm_init.c | 1 +
> mm/page_alloc.c | 1 +
> mm/page_alloc.h | 269 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/page_frag_cache.c | 2 +-
> mm/page_isolation.c | 1 +
> mm/page_owner.c | 2 +-
> mm/page_reporting.c | 1 +
> mm/show_mem.c | 1 +
> mm/shuffle.c | 1 +
> mm/slub.c | 1 +
> mm/swap.c | 1 +
> mm/vmscan.c | 1 +
> 22 files changed, 289 insertions(+), 255 deletions(-)
>
Thank you for the cleanup.
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
2026-07-03 12:31 ` [PATCH v5 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
@ 2026-07-03 14:42 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:42 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
> main entry point function is significantly different from the normal
> __alloc_frozen_pages_nolock(), this is tiring when reading the code.
>
> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
> exposed to mm/) and then turn the nolock variant into a thin wrapper
> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
> how some of the wrappers in gfp.h do).
>
> For consistency, set ALLOC_WMARK_MIN explicitly in fastpath_alloc_flags
> for the new ALLOC_NOLOCK path. This was already "done" silently in
> __alloc_frozen_pages_nolock_noprof(): ALLOC_WMARK_MIN is 0.
>
> Rationale that this doesn't change anything:
>
> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
> the new alloc_order_allowed(), alloc_nolock_allowed() and
> gfp_nolock.
>
> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
> previously in the nolock variant:
>
> a. Application of gfp_allowed_mask; this only affects early boot,
> only flags that affect the slowpath get changed here, and the
> nolock allocation path isn't allowed to the GFP_BOOT_MASK flags.
>
> b. Application of current_gfp_context() - also only affects the
> slowpath
>
> 3. The slowpath itself: this is now just explicitly skipped under
> !ALLOC_TRYLOCK.
s/TRYLOCK/NOLOCK
>
> Ulterior motive: adding an alloc_flags arg to the allocator's
> mm-internal entrypoint can later be used to do more allocation
> customisation without needing to create new GFP flags.
>
> No functional change intended.
>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/hugetlb.c | 3 +-
> mm/mempolicy.c | 10 +--
> mm/page_alloc.c | 192 +++++++++++++++++++++++++++++---------------------------
> mm/page_alloc.h | 6 +-
> mm/slub.c | 6 +-
> 5 files changed, 117 insertions(+), 100 deletions(-)
>
<snip>
> +/*
> + * This is the 'heart' of the zoned buddy allocator.
> + */
> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> + int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
> +{
> + struct page *page;
> + gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> + struct alloc_context ac = { };
> + unsigned int fastpath_alloc_flags = alloc_flags;
> +
> + /* Other flags could be supported later if needed. */
> + if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
> return NULL;
>
> + if (!alloc_order_allowed(gfp, order, alloc_flags))
> + return NULL;
> +
> + if (alloc_flags & ALLOC_NOLOCK) {
> + VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> + if (!alloc_nolock_allowed())
> + return NULL;
At first look, I wonder why __alloc_frozen_pages_noprof() needs to care
about alloc_nolock_allowed(). But the patch's idea is to centralize all
allocation policies, so it makes sense.
Ideally, I would want alloc_frozen_pages_nolock_noprof() to filter as
much as possible, so that __alloc_frozen_pages_noprof() has minimal/no
awareness of ALLOC_NOLOCK. But ALLOC_NOLOCK has different preferences
compared to the default __alloc_frozen_pages_noprof() policy like
ALLOC_WMARK_MIN vs ALLOC_WMARK_LOW, skip slowpath, and more. Maybe we
could do something like:
__alloc_frozen_pages_noprof()
{
alloc_fastpath();
alloc_slowpath();
}
alloc_frozen_pages_nolock_noprof()
{
alloc_order_allowed();
alloc_nolock_allow();
alloc_fastpath();
}
But it still cannot remove ALLOC_NOLOCK completely from
__alloc_frozen_pages_noprof(), like the nofragment skip. Anyway, this
patch is a reasonable cleanup. Thanks.
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs
2026-07-03 12:31 ` [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
2026-07-03 12:43 ` sashiko-bot
@ 2026-07-03 14:44 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:44 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> This WARN forbids setting other flags than __GFP_ACCOUNT but we
> unconditionally set the ones in gfp_nolock so they are certainly fine
> for the caller to set.
>
> There are other GFP flags that are almost certainly fine to set here;
> Willy noted GFP_HIGHMEM, GFP_DMA, GFP_MOVABLE and GFP_HARDWALL. But,
> nolock allocation is rather special, so be conservative to try and
> ensure we have a chance to think carefully before nontrivial new
> usecases arise.
>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/linux-mm/ajS96fWbG4dzP3u3@casper.infradead.org/
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 07/18] mm: move some stuff to mm/page_alloc.h
2026-07-03 12:31 ` [PATCH v5 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
@ 2026-07-03 14:46 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:46 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> Some of this stuff in the public header is only used internally so
> shrink the scope to avoid silently growing new users.
>
> drain_local_pages() is still used from kernel/power/snapshot.c so that
> needs to stay behind.
>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> include/linux/gfp.h | 26 --------------------------
> mm/page_alloc.h | 27 +++++++++++++++++++++++++++
> mm/vmstat.c | 1 +
> 3 files changed, 28 insertions(+), 26 deletions(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 08/18] perf/x86/intel: Use higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
@ 2026-07-03 14:49 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:49 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> The difference between __alloc_pages_node() and alloc_pages_node() is
> that the latter allows you to pass NUMA_NO_NODE.
>
> The former is going away and the latter works fine here so switch over.
>
> No functional change intended.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Ian Rogers <irogers@google.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: James Clark <james.clark@linaro.org>
> Assisted-by: Gemini:unknown-version
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> arch/x86/events/intel/ds.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 09/18] KVM: VMX: Use higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 09/18] KVM: VMX: " Brendan Jackman
@ 2026-07-03 14:49 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:49 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, Sean Christopherson, Paolo Bonzini,
kvm
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> The difference between __alloc_pages_node() and alloc_pages_node() is
> that the latter allows you to pass NUMA_NO_NODE.
>
> The former is going away and the latter works fine here so switch over.
>
> No functional change intended.
>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: kvm@vger.kernel.org
> Assisted-by: Gemini:unknown-version
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 10/18] x86/virt: Use higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 10/18] x86/virt: " Brendan Jackman
@ 2026-07-03 14:50 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:50 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> The difference between __alloc_pages_node() and alloc_pages_node() is
> that the latter allows you to pass NUMA_NO_NODE.
>
> The former is going away and the latter works fine here so switch over.
>
> No functional change intended.
>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Assisted-by: Gemini:unknown-version
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> arch/x86/virt/hw.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 11/18] sgi-xp: Use higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 11/18] sgi-xp: " Brendan Jackman
2026-07-03 12:48 ` sashiko-bot
@ 2026-07-03 14:51 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:51 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, Robin Holt, Steve Wahl,
Arnd Bergmann, Greg Kroah-Hartman
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> The difference between __alloc_pages_node() and alloc_pages_node() is
> that the latter allows you to pass NUMA_NO_NODE.
>
> The former is going away and the latter works fine here so switch over.
>
> No functional change intended.
>
> Cc: Robin Holt <robinmholt@gmail.com>
> Cc: Steve Wahl <steve.wahl@hpe.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Assisted-by: Gemini:unknown-model
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Acked-by: Steve Wahl <steve.wahl@hpe.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> drivers/misc/sgi-xp/xpc_uv.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 12/18] net/funeth: Switch to higher-level allocator API
2026-07-03 12:31 ` [PATCH v5 12/18] net/funeth: Switch to " Brendan Jackman
2026-07-03 12:53 ` sashiko-bot
@ 2026-07-03 14:52 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:52 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed, Dimitris Michailidis, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> The difference between __alloc_pages_node() and alloc_pages_node() is
> that the latter allows you to pass NUMA_NO_NODE.
>
> The former is going away and the latter works fine here so switch over.
>
> No functional change intended.
>
> Cc: Dimitris Michailidis <dmichail@fungible.com>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Assisted-by: Gemini:unknown-version
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> drivers/net/ethernet/fungible/funeth/funeth_rx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 13/18] mm: Remove __alloc_pages_node()
2026-07-03 12:31 ` [PATCH v5 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
2026-07-03 12:54 ` sashiko-bot
@ 2026-07-03 14:57 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 14:57 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> There were only a few users, which have been removed. The only advantage
> of this API over alloc_pages_node() is avoiding a single conditional
> branch. The disadvantages are:
>
> 1. More API surface, more sources of confusion, more maintenance.
>
> 2. Worse impact of CPU hotplug bugs: most users of __alloc_pages_node()
> were using the result of cpu_to_node(); if the CPU gets hotplugged
> out this will return NUMA_NO_NODE. If one of these paths fails to
> protect against a concurrent hotplug then page_alloc.c will use
> NUMA_NO_NODE as an index into NODE_DATA() and cause some horrible
> memory corruption or other. With alloc_pages_node(), the code might
> just work fine.
>
> Ulterior motive: this frees up the __* variants of the allocator APIs to
> serve specifically for use as mm-internal API.
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> include/linux/gfp.h | 20 ++++----------------
> 1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 01d6d2591f49e..3bf55a5f9143e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -256,21 +256,6 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
> dump_stack();
> }
>
> -/*
> - * Allocate pages, preferring the node given as nid. The node must be valid and
> - * online. For more general interface, see alloc_pages_node().
> - */
> -static inline struct page *
> -__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
> -{
> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> - warn_if_node_offline(nid, gfp_mask);
> -
> - return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
> -}
> -
> -#define __alloc_pages_node(...) alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
> -
> static inline
> struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
> {
> @@ -293,7 +278,10 @@ static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
> if (nid == NUMA_NO_NODE)
> nid = numa_mem_id();
>
> - return __alloc_pages_node_noprof(nid, gfp_mask, order);
> + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
Could this become a VM_WARN_ON?
Anyway,
Reviewed-by: Zi Yan <ziy@nvidia.com>
> + warn_if_node_offline(nid, gfp_mask);
> +
> + return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
> }
>
> #define alloc_pages_node(...) alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 14/18] mm: Move __alloc_pages() to mm/page_alloc.h
2026-07-03 12:31 ` [PATCH v5 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
@ 2026-07-03 15:05 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 15:05 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> It's no longer used outside of mm/.
>
> Since this means __alloc_pages_noprof() is no longer visible from gfp.h,
> this also means moving the definition of alloc_pages_node_noprof into
> the .c file.
>
> Also remove references to this API from the documentation tree -
> referring to the specific function name was already questionable but
> now the function is not even public it definitely seems wrong.
>
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> Documentation/admin-guide/cgroup-v1/cpusets.rst | 2 +-
> Documentation/admin-guide/mm/transhuge.rst | 2 +-
> include/linux/gfp.h | 16 +---------------
> mm/page_alloc.c | 13 ++++++++++++-
> mm/page_alloc.h | 4 ++++
> 5 files changed, 19 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
> index c7909e5ac1361..52a213aff04e5 100644
> --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst
> +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
> @@ -284,7 +284,7 @@ take action.
> ==>
> Unless this feature is enabled by writing "1" to the special file
> /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
> - code of __alloc_pages() for this metric reduces to simply noticing
> + code of the page allocator for this metric reduces to simply noticing
> that the cpuset_memory_pressure_enabled flag is zero. So only
> systems that enable this feature will compute the metric.
>
kernel/cgroup/cpuset.c still has 3 references to __alloc_pages(). They
can be converted as well. Otherwise,
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma()
2026-07-03 12:31 ` [PATCH v5 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
@ 2026-07-03 15:10 ` Zi Yan
0 siblings, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 15:10 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> To align the style with other alloc_flags_*() functions, drop this
> additive argument and just have the callers do that themselves.
>
> Note you can't always freely or alloc_flags like these callers do
> (because of the WMARK bits that encode an enum) but this is fine for
> ALLOC_CMA, just like it's fine for e.g. ALLOC_NON_BLOCK returned by
> alloc_flags_nonblocking() and or'd by its caller.
>
> Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Link: https://lore.kernel.org/all/5dcdd1ef-21ad-4ed0-9e8a-0e5cf96b4392@kernel.org/
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/page_alloc.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
Nice cleanup.
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v5 18/18] mm: factor out can_spin_trylock()
2026-07-03 12:31 ` [PATCH v5 18/18] mm: factor out can_spin_trylock() Brendan Jackman
2026-07-03 12:55 ` sashiko-bot
@ 2026-07-03 15:12 ` Zi Yan
1 sibling, 0 replies; 41+ messages in thread
From: Zi Yan @ 2026-07-03 15:12 UTC (permalink / raw)
To: Brendan Jackman, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Muchun Song,
Oscar Salvador, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Matthew Brost, Joshua Hahn,
Rakie Kim, Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
Cc: Harry Yoo (Oracle), Gregory Price, Alexei Starovoitov,
Matthew Wilcox, Hao Ge, linux-mm, linux-kernel, linux-rt-devel,
derkling, reijiw, Yosry Ahmed
On Fri Jul 3, 2026 at 8:31 AM EDT, Brendan Jackman wrote:
> Deduplicate checks for whether the current context is safe for
> spin_trylock().
>
> Does this function really belong in mm/internal.h or is it generic? Not
> sure. If someone ends up duplicating this logic elsewhere in the kernel,
> that would be a shame. But if it goes in some generic header, someone
> treats it as documentation about where it's guaranteed safe to
> spin_trylock(), and then it emerges that there are other subtle
> preconditions that didn't affect the mm usecase, that would be worse.
> So, just be conservative and keep it local.
>
> Suggested-by: Harry Yoo <harry@kernel.org>
> Link: https://lore.kernel.org/all/397859cb-b127-4cc6-9c71-044afc99bf0c@kernel.org/
> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/internal.h | 23 +++++++++++++++++++++++
> mm/page_alloc.c | 17 +----------------
> mm/slub.c | 10 +---------
> 3 files changed, 25 insertions(+), 25 deletions(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2026-07-03 15:12 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-03 12:31 [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 01/18] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
2026-07-03 13:59 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 02/18] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
2026-07-03 14:01 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 03/18] mm: name some args in a function declaration Brendan Jackman
2026-07-03 14:02 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 04/18] mm: Split out internal page_alloc.h Brendan Jackman
2026-07-03 14:07 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 05/18] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
2026-07-03 14:42 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 06/18] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
2026-07-03 12:43 ` sashiko-bot
2026-07-03 14:44 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 07/18] mm: move some stuff to mm/page_alloc.h Brendan Jackman
2026-07-03 14:46 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 08/18] perf/x86/intel: Use higher-level allocator API Brendan Jackman
2026-07-03 14:49 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 09/18] KVM: VMX: " Brendan Jackman
2026-07-03 14:49 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 10/18] x86/virt: " Brendan Jackman
2026-07-03 14:50 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 11/18] sgi-xp: " Brendan Jackman
2026-07-03 12:48 ` sashiko-bot
2026-07-03 14:51 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 12/18] net/funeth: Switch to " Brendan Jackman
2026-07-03 12:53 ` sashiko-bot
2026-07-03 14:52 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 13/18] mm: Remove __alloc_pages_node() Brendan Jackman
2026-07-03 12:54 ` sashiko-bot
2026-07-03 14:57 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 14/18] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
2026-07-03 15:05 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 15/18] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 16/18] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
2026-07-03 12:31 ` [PATCH v5 17/18] mm/page_alloc: drop alloc_flags arg from alloc_flags_cma() Brendan Jackman
2026-07-03 15:10 ` Zi Yan
2026-07-03 12:31 ` [PATCH v5 18/18] mm: factor out can_spin_trylock() Brendan Jackman
2026-07-03 12:55 ` sashiko-bot
2026-07-03 15:12 ` Zi Yan
2026-07-03 12:47 ` [PATCH v5 00/18] mm: Some cleanups for page allocator APIs Vlastimil Babka (SUSE)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox