[PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
@ 2026-06-17 15:29 Brendan Jackman
  2026-06-17 16:39 ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 4+ messages in thread
From: Brendan Jackman @ 2026-06-17 15:29 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt
  Cc: Harry Yoo (Oracle), Gregory Price, Johannes Weiner, linux-mm,
	linux-kernel, linux-rt-devel, Brendan Jackman

Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
main entry point function is significantly different from the normal
__alloc_frozen_pages_nolock(), this is tiring when reading the code.

Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
exposed to mm/) and then turn the nolock variant into a thin wrapper
that just sets that flag (as well as handling NUMA_NO_NODE, similar to
how some of the wrappers in gfp.h do).

Rationale that this doesn't change anything:

1. Simple bits: A bunch of the nolock-specific handling is just moved to
   the new alloc_order_allowed(), alloc_trylock_allowed() and
   gfp_trylock.

2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
   previously in the nolock variant:

   a. Application of gfp_allowed_mask; this only affects early boot, and
      only flags that affect the slowpath get changed here.

   b. Application of current_gfp_context() - also only affects the
      slowpath

3. The slowpath itself: this is now just explicitly skipped under
   !ALLOC_TRYLOCK.

Ulterior motive: adding an alloc_flags arg to the allocator's
mm-internal entrypoint can later be used to do more allocation
customisation without needing to create new GFP flags.

No functional change intended.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/hugetlb.c    |   2 +-
 mm/internal.h   |   4 +-
 mm/mempolicy.c  |   8 +--
 mm/page_alloc.c | 175 +++++++++++++++++++++++++++++---------------------------
 mm/slub.c       |   4 +-
 5 files changed, 99 insertions(+), 94 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 571212b80835e..619f6307dc98d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1806,7 +1806,7 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
 	if (alloc_try_hard)
 		gfp_mask |= __GFP_RETRY_MAYFAIL;
 
-	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
+	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask, 0);
 
 	/*
 	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a20..1043eb833836c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -913,7 +913,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
 extern int user_min_free_kbytes;
 
 struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
-		nodemask_t *);
+		nodemask_t *, unsigned int alloc_flags);
 #define __alloc_frozen_pages(...) \
 	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
 void free_frozen_pages(struct page *page, unsigned int order);
@@ -924,7 +924,7 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
 #else
 static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
 {
-	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
+	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL, 0);
 }
 #endif
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36699fabd3c22..dccff90682035 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2425,9 +2425,9 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
 	 */
 	preferred_gfp = gfp | __GFP_NOWARN;
 	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
-	page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask);
+	page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask, 0);
 	if (!page)
-		page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL);
+		page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL, 0);
 
 	return page;
 }
@@ -2475,7 +2475,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 			 */
 			page = __alloc_frozen_pages_noprof(
 				gfp | __GFP_THISNODE | __GFP_NORETRY, order,
-				nid, NULL);
+				nid, NULL, 0);
 			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
 				return page;
 			/*
@@ -2487,7 +2487,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 		}
 	}
 
-	page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);
+	page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask, 0);
 
 	if (unlikely(pol->mode == MPOL_INTERLEAVE ||
 		     pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0111cdbdb5321..fc4d07bbf44b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5253,24 +5253,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
 	}
 }
 
-/*
- * This is the 'heart' of the zoned buddy allocator.
- */
-struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
-		int preferred_nid, nodemask_t *nodemask)
+static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
+				       unsigned int alloc_flags)
 {
-	struct page *page;
-	unsigned int alloc_flags = ALLOC_WMARK_LOW;
-	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
-	struct alloc_context ac = { };
+
+	if (alloc_flags & ALLOC_TRYLOCK)
+		return pcp_allowed_order(order);
 
 	/*
 	 * There are several places where we assume that the order value is sane
 	 * so bail out early if the request is out of bound.
 	 */
-	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
+	return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
+}
+
+static inline bool alloc_trylock_allowed(void)
+{
+	/*
+	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
+	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
+	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
+	 * mark the task as the owner of another rt_spin_lock which will
+	 * confuse PI logic, so return immediately if called from hard IRQ or
+	 * NMI.
+	 *
+	 * Note, irqs_disabled() case is ok. This function can be called
+	 * from raw_spin_lock_irqsave region.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
+		return false;
+
+	/* On UP, spin_trylock() always succeeds even when it is locked */
+	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
+		return false;
+
+	/* Bailout, since _deferred_grow_zone() needs to take a lock */
+	if (deferred_pages_enabled())
+		return false;
+
+	return true;
+}
+
+/*
+ * GFP flags to set for ALLOC_TRYLOCK i.e. alloc_pages_nolock().
+ *
+ * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
+ * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
+ * is not safe in arbitrary context.
+ *
+ * These two are the conditions for gfpflags_allow_spinning() being true.
+ *
+ * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
+ * to warn. Also warn would trigger printk() which is unsafe from
+ * various contexts. We cannot use printk_deferred_enter() to mitigate,
+ * since the running context is unknown.
+ *
+ * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
+ * is safe in any context. Also zeroing the page is mandatory for
+ * BPF use cases.
+ *
+ * Though __GFP_NOMEMALLOC is not checked in the code path below,
+ * specify it here to highlight that alloc_pages_nolock()
+ * doesn't want to deplete reserves.
+ */
+static const gfp_t gfp_trylock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
+				__GFP_COMP;
+
+/*
+ * This is the 'heart' of the zoned buddy allocator.
+ */
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
+{
+	struct page *page;
+	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
+	struct alloc_context ac = { };
+
+	/* Other flags could be supported later if needed. */
+	if (WARN_ON(alloc_flags & ~ALLOC_TRYLOCK))
 		return NULL;
 
+	if (!alloc_order_allowed(gfp, order, alloc_flags))
+		return NULL;
+
+	if (alloc_flags & ALLOC_TRYLOCK) {
+		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
+		if (!alloc_trylock_allowed())
+			return NULL;
+		gfp |= gfp_trylock;
+	} else {
+		alloc_flags |= ALLOC_WMARK_LOW;
+	}
+
 	gfp &= gfp_allowed_mask;
 	/*
 	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
@@ -5291,9 +5365,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 	 */
 	alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
 
-	/* First allocation attempt */
+	/* First allocation attempt (or, for trylock, only attempt) */
 	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
-	if (likely(page))
+	if (likely(page) || (alloc_flags & ALLOC_TRYLOCK))
 		goto out;
 
 	alloc_gfp = gfp;
@@ -5310,7 +5384,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 out:
 	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
 	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
-		free_frozen_pages(page, order);
+		__free_frozen_pages(page, order,
+				    alloc_flags & ALLOC_TRYLOCK ? FPI_TRYLOCK : 0);
 		page = NULL;
 	}
 
@@ -5326,7 +5401,7 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
 {
 	struct page *page;
 
-	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
+	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask, 0);
 	if (page)
 		set_page_refcounted(page);
 	return page;
@@ -7856,80 +7931,10 @@ static bool __free_unaccepted(struct page *page)
 
 struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
 {
-	/*
-	 * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
-	 * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
-	 * is not safe in arbitrary context.
-	 *
-	 * These two are the conditions for gfpflags_allow_spinning() being true.
-	 *
-	 * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
-	 * to warn. Also warn would trigger printk() which is unsafe from
-	 * various contexts. We cannot use printk_deferred_enter() to mitigate,
-	 * since the running context is unknown.
-	 *
-	 * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
-	 * is safe in any context. Also zeroing the page is mandatory for
-	 * BPF use cases.
-	 *
-	 * Though __GFP_NOMEMALLOC is not checked in the code path below,
-	 * specify it here to highlight that alloc_pages_nolock()
-	 * doesn't want to deplete reserves.
-	 */
-	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
-			| gfp_flags;
-	unsigned int alloc_flags = ALLOC_TRYLOCK;
-	struct alloc_context ac = { };
-	struct page *page;
-
-	VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
-	/*
-	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
-	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
-	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
-	 * mark the task as the owner of another rt_spin_lock which will
-	 * confuse PI logic, so return immediately if called from hard IRQ or
-	 * NMI.
-	 *
-	 * Note, irqs_disabled() case is ok. This function can be called
-	 * from raw_spin_lock_irqsave region.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
-		return NULL;
-
-	/* On UP, spin_trylock() always succeeds even when it is locked */
-	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
-		return NULL;
-
-	if (!pcp_allowed_order(order))
-		return NULL;
-
-	/* Bailout, since _deferred_grow_zone() needs to take a lock */
-	if (deferred_pages_enabled())
-		return NULL;
-
 	if (nid == NUMA_NO_NODE)
 		nid = numa_node_id();
 
-	prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
-			    &alloc_gfp, &alloc_flags);
-
-	/*
-	 * Best effort allocation from percpu free list.
-	 * If it's empty attempt to spin_trylock zone->lock.
-	 */
-	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
-
-	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
-
-	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
-	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
-		__free_frozen_pages(page, order, FPI_TRYLOCK);
-		page = NULL;
-	}
-	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
-	kmsan_alloc_page(page, order, alloc_gfp);
-	return page;
+	return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_TRYLOCK);
 }
 /**
  * alloc_pages_nolock - opportunistic reentrant allocation from any context
diff --git a/mm/slub.c b/mm/slub.c
index a2bf3756ca7d0..b9fb66071bd07 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3275,7 +3275,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
 	else if (node == NUMA_NO_NODE)
 		page = alloc_frozen_pages(flags, order);
 	else
-		page = __alloc_frozen_pages(flags, order, node, NULL);
+		page = __alloc_frozen_pages(flags, order, node, NULL, 0);
 
 	if (!page)
 		return NULL;
@@ -5236,7 +5236,7 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
 	if (node == NUMA_NO_NODE)
 		page = alloc_frozen_pages_noprof(flags, order);
 	else
-		page = __alloc_frozen_pages_noprof(flags, order, node, NULL);
+		page = __alloc_frozen_pages_noprof(flags, order, node, NULL, 0);
 
 	if (page) {
 		ptr = page_address(page);

---
base-commit: 1111012ec6508a38a39f8d20c213c8c9cf3c96c0
change-id: 20260617-alloc-trylock-14ad37dab337

Best regards,
--  
Brendan Jackman <jackmanb@google.com>



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-17 15:29 [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
@ 2026-06-17 16:39 ` Vlastimil Babka (SUSE)
  2026-06-17 16:49   ` Suren Baghdasaryan
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-17 16:39 UTC (permalink / raw)
  To: Brendan Jackman, Andrew Morton, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Zi Yan, Muchun Song, Oscar Salvador,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Mike Rapoport, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Alexei Starovoitov
  Cc: Harry Yoo (Oracle), Gregory Price, linux-mm, linux-kernel,
	linux-rt-devel

+Cc Alexei

On 6/17/26 17:29, Brendan Jackman wrote:
> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the

It's not, it's ALLOC_TRYLOCK! Thanks for proving that we need to rename it
to ALLOC_NOLOCK:

https://lore.kernel.org/all/DJ9QPTO2WXNB.10E88ZHWRDHB0@gmail.com/

So you just won the job to do the rename :) I think it should be done before
this patch, so that the new usages and other _trylock names introduced here
can be done as _nolock outright.

> main entry point function is significantly different from the normal
> __alloc_frozen_pages_nolock(), this is tiring when reading the code.
> 
> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
> exposed to mm/) and then turn the nolock variant into a thin wrapper
> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
> how some of the wrappers in gfp.h do).
> 
> Rationale that this doesn't change anything:
> 
> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>    gfp_trylock.
> 
> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>    previously in the nolock variant:
> 
>    a. Application of gfp_allowed_mask; this only affects early boot, and
>       only flags that affect the slowpath get changed here.
> 
>    b. Application of current_gfp_context() - also only affects the
>       slowpath
> 
> 3. The slowpath itself: this is now just explicitly skipped under
>    !ALLOC_TRYLOCK.

I'll have to ponder it more closely.

> Ulterior motive: adding an alloc_flags arg to the allocator's
> mm-internal entrypoint can later be used to do more allocation
> customisation without needing to create new GFP flags.

Ack.

> No functional change intended.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Besides the need to ponder unintended effects, mostly LGTM. Just not a fan
of the hardcoded '0' passed at various places. In the slab variant of this
(the thread I've linked above) I went with SLAB_ALLOC_DEFAULT, so you can do
e.g. ALLOC_DEFAULT here?

> ---
>  mm/hugetlb.c    |   2 +-
>  mm/internal.h   |   4 +-
>  mm/mempolicy.c  |   8 +--
>  mm/page_alloc.c | 175 +++++++++++++++++++++++++++++---------------------------
>  mm/slub.c       |   4 +-
>  5 files changed, 99 insertions(+), 94 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 571212b80835e..619f6307dc98d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1806,7 +1806,7 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
>  	if (alloc_try_hard)
>  		gfp_mask |= __GFP_RETRY_MAYFAIL;
>  
> -	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
> +	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask, 0);
>  
>  	/*
>  	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
> diff --git a/mm/internal.h b/mm/internal.h
> index 181e79f1d6a20..1043eb833836c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -913,7 +913,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
>  extern int user_min_free_kbytes;
>  
>  struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
> -		nodemask_t *);
> +		nodemask_t *, unsigned int alloc_flags);
>  #define __alloc_frozen_pages(...) \
>  	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
>  void free_frozen_pages(struct page *page, unsigned int order);
> @@ -924,7 +924,7 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
>  #else
>  static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
>  {
> -	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
> +	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL, 0);
>  }
>  #endif
>  
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 36699fabd3c22..dccff90682035 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2425,9 +2425,9 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
>  	 */
>  	preferred_gfp = gfp | __GFP_NOWARN;
>  	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
> -	page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask);
> +	page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask, 0);
>  	if (!page)
> -		page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL);
> +		page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL, 0);
>  
>  	return page;
>  }
> @@ -2475,7 +2475,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>  			 */
>  			page = __alloc_frozen_pages_noprof(
>  				gfp | __GFP_THISNODE | __GFP_NORETRY, order,
> -				nid, NULL);
> +				nid, NULL, 0);
>  			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
>  				return page;
>  			/*
> @@ -2487,7 +2487,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>  		}
>  	}
>  
> -	page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);
> +	page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask, 0);
>  
>  	if (unlikely(pol->mode == MPOL_INTERLEAVE ||
>  		     pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0111cdbdb5321..fc4d07bbf44b5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5253,24 +5253,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>  	}
>  }
>  
> -/*
> - * This is the 'heart' of the zoned buddy allocator.
> - */
> -struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> -		int preferred_nid, nodemask_t *nodemask)
> +static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
> +				       unsigned int alloc_flags)
>  {
> -	struct page *page;
> -	unsigned int alloc_flags = ALLOC_WMARK_LOW;
> -	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> -	struct alloc_context ac = { };
> +
> +	if (alloc_flags & ALLOC_TRYLOCK)
> +		return pcp_allowed_order(order);
>  
>  	/*
>  	 * There are several places where we assume that the order value is sane
>  	 * so bail out early if the request is out of bound.
>  	 */
> -	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
> +	return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
> +}
> +
> +static inline bool alloc_trylock_allowed(void)
> +{
> +	/*
> +	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> +	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> +	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> +	 * mark the task as the owner of another rt_spin_lock which will
> +	 * confuse PI logic, so return immediately if called from hard IRQ or
> +	 * NMI.
> +	 *
> +	 * Note, irqs_disabled() case is ok. This function can be called
> +	 * from raw_spin_lock_irqsave region.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> +		return false;
> +
> +	/* On UP, spin_trylock() always succeeds even when it is locked */
> +	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> +		return false;
> +
> +	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> +	if (deferred_pages_enabled())
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * GFP flags to set for ALLOC_TRYLOCK i.e. alloc_pages_nolock().
> + *
> + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> + * is not safe in arbitrary context.
> + *
> + * These two are the conditions for gfpflags_allow_spinning() being true.
> + *
> + * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> + * to warn. Also warn would trigger printk() which is unsafe from
> + * various contexts. We cannot use printk_deferred_enter() to mitigate,
> + * since the running context is unknown.
> + *
> + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> + * is safe in any context. Also zeroing the page is mandatory for
> + * BPF use cases.
> + *
> + * Though __GFP_NOMEMALLOC is not checked in the code path below,
> + * specify it here to highlight that alloc_pages_nolock()
> + * doesn't want to deplete reserves.
> + */
> +static const gfp_t gfp_trylock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
> +				__GFP_COMP;
> +
> +/*
> + * This is the 'heart' of the zoned buddy allocator.
> + */
> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> +		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
> +{
> +	struct page *page;
> +	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> +	struct alloc_context ac = { };
> +
> +	/* Other flags could be supported later if needed. */
> +	if (WARN_ON(alloc_flags & ~ALLOC_TRYLOCK))
>  		return NULL;
>  
> +	if (!alloc_order_allowed(gfp, order, alloc_flags))
> +		return NULL;
> +
> +	if (alloc_flags & ALLOC_TRYLOCK) {
> +		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> +		if (!alloc_trylock_allowed())
> +			return NULL;
> +		gfp |= gfp_trylock;
> +	} else {
> +		alloc_flags |= ALLOC_WMARK_LOW;
> +	}
> +
>  	gfp &= gfp_allowed_mask;
>  	/*
>  	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
> @@ -5291,9 +5365,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  	 */
>  	alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
>  
> -	/* First allocation attempt */
> +	/* First allocation attempt (or, for trylock, only attempt) */
>  	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
> -	if (likely(page))
> +	if (likely(page) || (alloc_flags & ALLOC_TRYLOCK))
>  		goto out;
>  
>  	alloc_gfp = gfp;
> @@ -5310,7 +5384,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  out:
>  	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>  	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> -		free_frozen_pages(page, order);
> +		__free_frozen_pages(page, order,
> +				    alloc_flags & ALLOC_TRYLOCK ? FPI_TRYLOCK : 0);
>  		page = NULL;
>  	}
>  
> @@ -5326,7 +5401,7 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
>  {
>  	struct page *page;
>  
> -	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
> +	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask, 0);
>  	if (page)
>  		set_page_refcounted(page);
>  	return page;
> @@ -7856,80 +7931,10 @@ static bool __free_unaccepted(struct page *page)
>  
>  struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
>  {
> -	/*
> -	 * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> -	 * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> -	 * is not safe in arbitrary context.
> -	 *
> -	 * These two are the conditions for gfpflags_allow_spinning() being true.
> -	 *
> -	 * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> -	 * to warn. Also warn would trigger printk() which is unsafe from
> -	 * various contexts. We cannot use printk_deferred_enter() to mitigate,
> -	 * since the running context is unknown.
> -	 *
> -	 * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> -	 * is safe in any context. Also zeroing the page is mandatory for
> -	 * BPF use cases.
> -	 *
> -	 * Though __GFP_NOMEMALLOC is not checked in the code path below,
> -	 * specify it here to highlight that alloc_pages_nolock()
> -	 * doesn't want to deplete reserves.
> -	 */
> -	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
> -			| gfp_flags;
> -	unsigned int alloc_flags = ALLOC_TRYLOCK;
> -	struct alloc_context ac = { };
> -	struct page *page;
> -
> -	VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
> -	/*
> -	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> -	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> -	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> -	 * mark the task as the owner of another rt_spin_lock which will
> -	 * confuse PI logic, so return immediately if called from hard IRQ or
> -	 * NMI.
> -	 *
> -	 * Note, irqs_disabled() case is ok. This function can be called
> -	 * from raw_spin_lock_irqsave region.
> -	 */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> -		return NULL;
> -
> -	/* On UP, spin_trylock() always succeeds even when it is locked */
> -	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> -		return NULL;
> -
> -	if (!pcp_allowed_order(order))
> -		return NULL;
> -
> -	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> -	if (deferred_pages_enabled())
> -		return NULL;
> -
>  	if (nid == NUMA_NO_NODE)
>  		nid = numa_node_id();
>  
> -	prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
> -			    &alloc_gfp, &alloc_flags);
> -
> -	/*
> -	 * Best effort allocation from percpu free list.
> -	 * If it's empty attempt to spin_trylock zone->lock.
> -	 */
> -	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
> -
> -	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
> -
> -	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
> -	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
> -		__free_frozen_pages(page, order, FPI_TRYLOCK);
> -		page = NULL;
> -	}
> -	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
> -	kmsan_alloc_page(page, order, alloc_gfp);
> -	return page;
> +	return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_TRYLOCK);
>  }
>  /**
>   * alloc_pages_nolock - opportunistic reentrant allocation from any context
> diff --git a/mm/slub.c b/mm/slub.c
> index a2bf3756ca7d0..b9fb66071bd07 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3275,7 +3275,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
>  	else if (node == NUMA_NO_NODE)
>  		page = alloc_frozen_pages(flags, order);
>  	else
> -		page = __alloc_frozen_pages(flags, order, node, NULL);
> +		page = __alloc_frozen_pages(flags, order, node, NULL, 0);
>  
>  	if (!page)
>  		return NULL;
> @@ -5236,7 +5236,7 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
>  	if (node == NUMA_NO_NODE)
>  		page = alloc_frozen_pages_noprof(flags, order);
>  	else
> -		page = __alloc_frozen_pages_noprof(flags, order, node, NULL);
> +		page = __alloc_frozen_pages_noprof(flags, order, node, NULL, 0);
>  
>  	if (page) {
>  		ptr = page_address(page);
> 
> ---
> base-commit: 1111012ec6508a38a39f8d20c213c8c9cf3c96c0
> change-id: 20260617-alloc-trylock-14ad37dab337
> 
> Best regards,
> --  
> Brendan Jackman <jackmanb@google.com>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-17 16:39 ` Vlastimil Babka (SUSE)
@ 2026-06-17 16:49   ` Suren Baghdasaryan
  2026-06-17 17:14     ` Brendan Jackman
  0 siblings, 1 reply; 4+ messages in thread
From: Suren Baghdasaryan @ 2026-06-17 16:49 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Brendan Jackman, Andrew Morton, Michal Hocko, Johannes Weiner,
	Zi Yan, Muchun Song, Oscar Salvador, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Alexei Starovoitov, Harry Yoo (Oracle),
	Gregory Price, linux-mm, linux-kernel, linux-rt-devel, Hao Ge

On Wed, Jun 17, 2026 at 9:39 AM Vlastimil Babka (SUSE)
<vbabka@kernel.org> wrote:
>
> +Cc Alexei
>
> On 6/17/26 17:29, Brendan Jackman wrote:
> > Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
>
> It's not, it's ALLOC_TRYLOCK! Thanks for proving that we need to rename it
> to ALLOC_NOLOCK:
>
> https://lore.kernel.org/all/DJ9QPTO2WXNB.10E88ZHWRDHB0@gmail.com/
>
> So you just won the job to do the rename :) I think it should be done before
> this patch, so that the new usages and other _trylock names introduced here
> can be done as _nolock outright.
>
> > main entry point function is significantly different from the normal
> > __alloc_frozen_pages_nolock(), this is tiring when reading the code.
> >
> > Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
> > an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
> > exposed to mm/) and then turn the nolock variant into a thin wrapper
> > that just sets that flag (as well as handling NUMA_NO_NODE, similar to
> > how some of the wrappers in gfp.h do).
> >
> > Rationale that this doesn't change anything:
> >
> > 1. Simple bits: A bunch of the nolock-specific handling is just moved to
> >    the new alloc_order_allowed(), alloc_trylock_allowed() and
> >    gfp_trylock.
> >
> > 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
> >    previously in the nolock variant:
> >
> >    a. Application of gfp_allowed_mask; this only affects early boot, and
> >       only flags that affect the slowpath get changed here.
> >
> >    b. Application of current_gfp_context() - also only affects the
> >       slowpath
> >
> > 3. The slowpath itself: this is now just explicitly skipped under
> >    !ALLOC_TRYLOCK.
>
> I'll have to ponder it more closely.
>
> > Ulterior motive: adding an alloc_flags arg to the allocator's
> > mm-internal entrypoint can later be used to do more allocation
> > customisation without needing to create new GFP flags.
>
> Ack.

I think this change might also help us in removing __GFP_NO_CODETAG
introduced in [1] and being the only user of __GFP_NO_OBJ_EXT once
Vlastimil's patchset removing other __GFP_NO_OBJ_EXT users lands.
CC'ing Hao as he is brainstorming ways to remove __GFP_NO_CODETAG, and
this might be the answer.

[1] https://lore.kernel.org/all/20260604024008.46592-1-hao.ge@linux.dev/

>
> > No functional change intended.
> >
> > Signed-off-by: Brendan Jackman <jackmanb@google.com>
>
> Besides the need to ponder unintended effects, mostly LGTM. Just not a fan
> of the hardcoded '0' passed at various places. In the slab variant of this
> (the thread I've linked above) I went with SLAB_ALLOC_DEFAULT, so you can do
> e.g. ALLOC_DEFAULT here?
>
> > ---
> >  mm/hugetlb.c    |   2 +-
> >  mm/internal.h   |   4 +-
> >  mm/mempolicy.c  |   8 +--
> >  mm/page_alloc.c | 175 +++++++++++++++++++++++++++++---------------------------
> >  mm/slub.c       |   4 +-
> >  5 files changed, 99 insertions(+), 94 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 571212b80835e..619f6307dc98d 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1806,7 +1806,7 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
> >       if (alloc_try_hard)
> >               gfp_mask |= __GFP_RETRY_MAYFAIL;
> >
> > -     folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
> > +     folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask, 0);
> >
> >       /*
> >        * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 181e79f1d6a20..1043eb833836c 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -913,7 +913,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
> >  extern int user_min_free_kbytes;
> >
> >  struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
> > -             nodemask_t *);
> > +             nodemask_t *, unsigned int alloc_flags);
> >  #define __alloc_frozen_pages(...) \
> >       alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
> >  void free_frozen_pages(struct page *page, unsigned int order);
> > @@ -924,7 +924,7 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
> >  #else
> >  static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
> >  {
> > -     return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
> > +     return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL, 0);
> >  }
> >  #endif
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 36699fabd3c22..dccff90682035 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2425,9 +2425,9 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
> >        */
> >       preferred_gfp = gfp | __GFP_NOWARN;
> >       preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
> > -     page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask);
> > +     page = __alloc_frozen_pages_noprof(preferred_gfp, order, nid, nodemask, 0);
> >       if (!page)
> > -             page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL);
> > +             page = __alloc_frozen_pages_noprof(gfp, order, nid, NULL, 0);
> >
> >       return page;
> >  }
> > @@ -2475,7 +2475,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
> >                        */
> >                       page = __alloc_frozen_pages_noprof(
> >                               gfp | __GFP_THISNODE | __GFP_NORETRY, order,
> > -                             nid, NULL);
> > +                             nid, NULL, 0);
> >                       if (page || !(gfp & __GFP_DIRECT_RECLAIM))
> >                               return page;
> >                       /*
> > @@ -2487,7 +2487,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
> >               }
> >       }
> >
> > -     page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);
> > +     page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask, 0);
> >
> >       if (unlikely(pol->mode == MPOL_INTERLEAVE ||
> >                    pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0111cdbdb5321..fc4d07bbf44b5 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5253,24 +5253,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
> >       }
> >  }
> >
> > -/*
> > - * This is the 'heart' of the zoned buddy allocator.
> > - */
> > -struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> > -             int preferred_nid, nodemask_t *nodemask)
> > +static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
> > +                                    unsigned int alloc_flags)
> >  {
> > -     struct page *page;
> > -     unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > -     gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> > -     struct alloc_context ac = { };
> > +
> > +     if (alloc_flags & ALLOC_TRYLOCK)
> > +             return pcp_allowed_order(order);
> >
> >       /*
> >        * There are several places where we assume that the order value is sane
> >        * so bail out early if the request is out of bound.
> >        */
> > -     if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
> > +     return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
> > +}
> > +
> > +static inline bool alloc_trylock_allowed(void)
> > +{
> > +     /*
> > +      * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> > +      * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> > +      * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> > +      * mark the task as the owner of another rt_spin_lock which will
> > +      * confuse PI logic, so return immediately if called from hard IRQ or
> > +      * NMI.
> > +      *
> > +      * Note, irqs_disabled() case is ok. This function can be called
> > +      * from raw_spin_lock_irqsave region.
> > +      */
> > +     if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> > +             return false;
> > +
> > +     /* On UP, spin_trylock() always succeeds even when it is locked */
> > +     if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> > +             return false;
> > +
> > +     /* Bailout, since _deferred_grow_zone() needs to take a lock */
> > +     if (deferred_pages_enabled())
> > +             return false;
> > +
> > +     return true;
> > +}
> > +
> > +/*
> > + * GFP flags to set for ALLOC_TRYLOCK i.e. alloc_pages_nolock().
> > + *
> > + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> > + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> > + * is not safe in arbitrary context.
> > + *
> > + * These two are the conditions for gfpflags_allow_spinning() being true.
> > + *
> > + * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> > + * to warn. Also warn would trigger printk() which is unsafe from
> > + * various contexts. We cannot use printk_deferred_enter() to mitigate,
> > + * since the running context is unknown.
> > + *
> > + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> > + * is safe in any context. Also zeroing the page is mandatory for
> > + * BPF use cases.
> > + *
> > + * Though __GFP_NOMEMALLOC is not checked in the code path below,
> > + * specify it here to highlight that alloc_pages_nolock()
> > + * doesn't want to deplete reserves.
> > + */
> > +static const gfp_t gfp_trylock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
> > +                             __GFP_COMP;
> > +
> > +/*
> > + * This is the 'heart' of the zoned buddy allocator.
> > + */
> > +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> > +             int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
> > +{
> > +     struct page *page;
> > +     gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> > +     struct alloc_context ac = { };
> > +
> > +     /* Other flags could be supported later if needed. */
> > +     if (WARN_ON(alloc_flags & ~ALLOC_TRYLOCK))
> >               return NULL;
> >
> > +     if (!alloc_order_allowed(gfp, order, alloc_flags))
> > +             return NULL;
> > +
> > +     if (alloc_flags & ALLOC_TRYLOCK) {
> > +             VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> > +             if (!alloc_trylock_allowed())
> > +                     return NULL;
> > +             gfp |= gfp_trylock;
> > +     } else {
> > +             alloc_flags |= ALLOC_WMARK_LOW;
> > +     }
> > +
> >       gfp &= gfp_allowed_mask;
> >       /*
> >        * Apply scoped allocation constraints. This is mainly about GFP_NOFS
> > @@ -5291,9 +5365,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> >        */
> >       alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
> >
> > -     /* First allocation attempt */
> > +     /* First allocation attempt (or, for trylock, only attempt) */
> >       page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
> > -     if (likely(page))
> > +     if (likely(page) || (alloc_flags & ALLOC_TRYLOCK))
> >               goto out;
> >
> >       alloc_gfp = gfp;
> > @@ -5310,7 +5384,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> >  out:
> >       if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
> >           unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> > -             free_frozen_pages(page, order);
> > +             __free_frozen_pages(page, order,
> > +                                 alloc_flags & ALLOC_TRYLOCK ? FPI_TRYLOCK : 0);
> >               page = NULL;
> >       }
> >
> > @@ -5326,7 +5401,7 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
> >  {
> >       struct page *page;
> >
> > -     page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
> > +     page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask, 0);
> >       if (page)
> >               set_page_refcounted(page);
> >       return page;
> > @@ -7856,80 +7931,10 @@ static bool __free_unaccepted(struct page *page)
> >
> >  struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
> >  {
> > -     /*
> > -      * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> > -      * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> > -      * is not safe in arbitrary context.
> > -      *
> > -      * These two are the conditions for gfpflags_allow_spinning() being true.
> > -      *
> > -      * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> > -      * to warn. Also warn would trigger printk() which is unsafe from
> > -      * various contexts. We cannot use printk_deferred_enter() to mitigate,
> > -      * since the running context is unknown.
> > -      *
> > -      * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> > -      * is safe in any context. Also zeroing the page is mandatory for
> > -      * BPF use cases.
> > -      *
> > -      * Though __GFP_NOMEMALLOC is not checked in the code path below,
> > -      * specify it here to highlight that alloc_pages_nolock()
> > -      * doesn't want to deplete reserves.
> > -      */
> > -     gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
> > -                     | gfp_flags;
> > -     unsigned int alloc_flags = ALLOC_TRYLOCK;
> > -     struct alloc_context ac = { };
> > -     struct page *page;
> > -
> > -     VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
> > -     /*
> > -      * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> > -      * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> > -      * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> > -      * mark the task as the owner of another rt_spin_lock which will
> > -      * confuse PI logic, so return immediately if called from hard IRQ or
> > -      * NMI.
> > -      *
> > -      * Note, irqs_disabled() case is ok. This function can be called
> > -      * from raw_spin_lock_irqsave region.
> > -      */
> > -     if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> > -             return NULL;
> > -
> > -     /* On UP, spin_trylock() always succeeds even when it is locked */
> > -     if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> > -             return NULL;
> > -
> > -     if (!pcp_allowed_order(order))
> > -             return NULL;
> > -
> > -     /* Bailout, since _deferred_grow_zone() needs to take a lock */
> > -     if (deferred_pages_enabled())
> > -             return NULL;
> > -
> >       if (nid == NUMA_NO_NODE)
> >               nid = numa_node_id();
> >
> > -     prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
> > -                         &alloc_gfp, &alloc_flags);
> > -
> > -     /*
> > -      * Best effort allocation from percpu free list.
> > -      * If it's empty attempt to spin_trylock zone->lock.
> > -      */
> > -     page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
> > -
> > -     /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
> > -
> > -     if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
> > -         unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
> > -             __free_frozen_pages(page, order, FPI_TRYLOCK);
> > -             page = NULL;
> > -     }
> > -     trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
> > -     kmsan_alloc_page(page, order, alloc_gfp);
> > -     return page;
> > +     return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_TRYLOCK);
> >  }
> >  /**
> >   * alloc_pages_nolock - opportunistic reentrant allocation from any context
> > diff --git a/mm/slub.c b/mm/slub.c
> > index a2bf3756ca7d0..b9fb66071bd07 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3275,7 +3275,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
> >       else if (node == NUMA_NO_NODE)
> >               page = alloc_frozen_pages(flags, order);
> >       else
> > -             page = __alloc_frozen_pages(flags, order, node, NULL);
> > +             page = __alloc_frozen_pages(flags, order, node, NULL, 0);
> >
> >       if (!page)
> >               return NULL;
> > @@ -5236,7 +5236,7 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
> >       if (node == NUMA_NO_NODE)
> >               page = alloc_frozen_pages_noprof(flags, order);
> >       else
> > -             page = __alloc_frozen_pages_noprof(flags, order, node, NULL);
> > +             page = __alloc_frozen_pages_noprof(flags, order, node, NULL, 0);
> >
> >       if (page) {
> >               ptr = page_address(page);
> >
> > ---
> > base-commit: 1111012ec6508a38a39f8d20c213c8c9cf3c96c0
> > change-id: 20260617-alloc-trylock-14ad37dab337
> >
> > Best regards,
> > --
> > Brendan Jackman <jackmanb@google.com>
> >
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
  2026-06-17 16:49   ` Suren Baghdasaryan
@ 2026-06-17 17:14     ` Brendan Jackman
  0 siblings, 0 replies; 4+ messages in thread
From: Brendan Jackman @ 2026-06-17 17:14 UTC (permalink / raw)
  To: Suren Baghdasaryan, Vlastimil Babka (SUSE)
  Cc: Brendan Jackman, Andrew Morton, Michal Hocko, Johannes Weiner,
	Zi Yan, Muchun Song, Oscar Salvador, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Alexei Starovoitov, Harry Yoo (Oracle),
	Gregory Price, linux-mm, linux-kernel, linux-rt-devel, Hao Ge

On Wed Jun 17, 2026 at 4:49 PM UTC, Suren Baghdasaryan wrote:
> On Wed, Jun 17, 2026 at 9:39 AM Vlastimil Babka (SUSE)
> <vbabka@kernel.org> wrote:
>>
>> +Cc Alexei
>>
>> On 6/17/26 17:29, Brendan Jackman wrote:
>> > Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
>>
>> It's not, it's ALLOC_TRYLOCK! Thanks for proving that we need to rename it
>> to ALLOC_NOLOCK:
>>
>> https://lore.kernel.org/all/DJ9QPTO2WXNB.10E88ZHWRDHB0@gmail.com/
>>
>> So you just won the job to do the rename :) I think it should be done before
>> this patch, so that the new usages and other _trylock names introduced here
>> can be done as _nolock outright.

Ack. I'll aim to send that tomorrow once Sashiko has caught up.

>> > main entry point function is significantly different from the normal
>> > __alloc_frozen_pages_nolock(), this is tiring when reading the code.
>> >
>> > Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
>> > an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only
>> > exposed to mm/) and then turn the nolock variant into a thin wrapper
>> > that just sets that flag (as well as handling NUMA_NO_NODE, similar to
>> > how some of the wrappers in gfp.h do).
>> >
>> > Rationale that this doesn't change anything:
>> >
>> > 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>> >    the new alloc_order_allowed(), alloc_trylock_allowed() and
>> >    gfp_trylock.
>> >
>> > 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>> >    previously in the nolock variant:
>> >
>> >    a. Application of gfp_allowed_mask; this only affects early boot, and
>> >       only flags that affect the slowpath get changed here.
>> >
>> >    b. Application of current_gfp_context() - also only affects the
>> >       slowpath
>> >
>> > 3. The slowpath itself: this is now just explicitly skipped under
>> >    !ALLOC_TRYLOCK.
>>
>> I'll have to ponder it more closely.
>>
>> > Ulterior motive: adding an alloc_flags arg to the allocator's
>> > mm-internal entrypoint can later be used to do more allocation
>> > customisation without needing to create new GFP flags.
>>
>> Ack.
>
> I think this change might also help us in removing __GFP_NO_CODETAG

Nice, this actually looks trivial? I can probably just tack it onto the
v2 for this patch/series.

> introduced in [1] and being the only user of __GFP_NO_OBJ_EXT once
> Vlastimil's patchset removing other __GFP_NO_OBJ_EXT users lands.
> CC'ing Hao as he is brainstorming ways to remove __GFP_NO_CODETAG, and
> this might be the answer.
>>
>> Besides the need to ponder unintended effects, mostly LGTM. Just not a fan
>> of the hardcoded '0' passed at various places. In the slab variant of this
>> (the thread I've linked above) I went with SLAB_ALLOC_DEFAULT, so you can do
>> e.g. ALLOC_DEFAULT here?

Yup ALLOC_DEFAULT sounds fine to me.

Thanks for the reviews as always.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-17 17:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 15:29 [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
2026-06-17 16:39 ` Vlastimil Babka (SUSE)
2026-06-17 16:49   ` Suren Baghdasaryan
2026-06-17 17:14     ` Brendan Jackman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.