Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: Brendan Jackman <jackmanb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Mike Rapoport <rppt@kernel.org>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>, Hao Li <hao.li@linux.dev>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Clark Williams <clrkwllms@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Cc: "Harry Yoo (Oracle)" <harry@kernel.org>,
	Gregory Price <gourry@gourry.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Matthew Wilcox <willy@infradead.org>, Hao Ge <hao.ge@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
Date: Tue, 30 Jun 2026 18:16:20 +0200	[thread overview]
Message-ID: <611bd3dc-95d4-45e0-ae5a-158c6cf1472f@kernel.org> (raw)
In-Reply-To: <20260629-alloc-trylock-v3-5-57bef0eadbc2@google.com>

On 6/29/26 15:11, Brendan Jackman wrote:
> Currently the core allocator code is controlled by ALLOC_NOLOCK, but the
> main entry point function is significantly different from the normal

Let's mention it explicitly, alloc_frozen_pages_nolock_noprof().

> __alloc_frozen_pages_nolock(), this is tiring when reading the code.

You mean __alloc_frozen_pages_noprof()?

> 
> Plumb the ALLOC_NOLOCK control one layer up in the call stack: create
> an alloc_flags argument to __alloc_frozen_pages_nolock() (which is only

Again __alloc_frozen_pages_noprof()

> exposed to mm/) and then turn the nolock variant into a thin wrapper
> that just sets that flag (as well as handling NUMA_NO_NODE, similar to
> how some of the wrappers in gfp.h do).
> 
> Rationale that this doesn't change anything:
> 
> 1. Simple bits: A bunch of the nolock-specific handling is just moved to
>    the new alloc_order_allowed(), alloc_trylock_allowed() and
>    gfp_trylock.

Should be alloc_nolock_allowed() and gfp_nolock

> 2. __alloc_frozen_pages_noprof() has some extra logic that wasn't
>    previously in the nolock variant:
> 
>    a. Application of gfp_allowed_mask; this only affects early boot, and
>       only flags that affect the slowpath get changed here.

As discussed in reply to Harry, I'd mention the flags excluded by
GFP_BOOT_MASK are not usable by _nolock() anyway.

>    b. Application of current_gfp_context() - also only affects the
>       slowpath
> 
> 3. The slowpath itself: this is now just explicitly skipped under
>    !ALLOC_TRYLOCK.

ALLOC_NOLOCK.

> 
> Ulterior motive: adding an alloc_flags arg to the allocator's
> mm-internal entrypoint can later be used to do more allocation
> customisation without needing to create new GFP flags.
> 
> While adding this flag to a bunch of places, create ALLOC_DEFAULT to
> avoid a mysterious literal 0 in most places.


> alloc_frozen_pages_noprof()
> is defined above the alloc flags so just leave that as a slightly messy
> exception instead of trying to fully reorder mm/internal.h for that one
> case.

This no longer applies in v3?

> No functional change intended.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  mm/hugetlb.c    |   3 +-
>  mm/mempolicy.c  |  10 ++--
>  mm/page_alloc.c | 178 +++++++++++++++++++++++++++++---------------------------
>  mm/page_alloc.h |   6 +-
>  mm/slub.c       |   6 +-
>  5 files changed, 108 insertions(+), 95 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f7925624c4d2e..dfcfcfa4715bf 100644

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a3ba63c7f9199..8d409d075e3e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5222,7 +5222,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
>  		}
>  		nr_account++;
>  
> -		prep_new_page(page, 0, gfp, 0);
> +		prep_new_page(page, 0, gfp, ALLOC_DEFAULT);
>  		set_page_refcounted(page);
>  		page_array[nr_populated++] = page;
>  	}
> @@ -5271,24 +5271,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>  	}
>  }
>  
> -/*
> - * This is the 'heart' of the zoned buddy allocator.
> - */
> -struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> -		int preferred_nid, nodemask_t *nodemask)
> +static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
> +				       unsigned int alloc_flags)
>  {
> -	struct page *page;
> -	unsigned int fastpath_alloc_flags = ALLOC_WMARK_LOW;
> -	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> -	struct alloc_context ac = { };
> +	if (alloc_flags & ALLOC_NOLOCK)
> +		return pcp_allowed_order(order);
>  
>  	/*
>  	 * There are several places where we assume that the order value is sane
>  	 * so bail out early if the request is out of bound.
>  	 */
> -	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
> +	return !(WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp));
> +}
> +
> +static inline bool alloc_trylock_allowed(void)

alloc_nolock_allowed()

> +{
> +	/*
> +	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> +	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> +	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> +	 * mark the task as the owner of another rt_spin_lock which will
> +	 * confuse PI logic, so return immediately if called from hard IRQ or
> +	 * NMI.
> +	 *
> +	 * Note, irqs_disabled() case is ok. This function can be called
> +	 * from raw_spin_lock_irqsave region.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> +		return false;
> +
> +	/* On UP, spin_trylock() always succeeds even when it is locked */
> +	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> +		return false;
> +
> +	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> +	if (deferred_pages_enabled())
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * GFP flags to set for ALLOC_NOLOCK i.e. alloc_pages_nolock().
> + *
> + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> + * is not safe in arbitrary context.
> + *
> + * These two are the conditions for gfpflags_allow_spinning() being true.
> + *
> + * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> + * to warn. Also warn would trigger printk() which is unsafe from
> + * various contexts. We cannot use printk_deferred_enter() to mitigate,
> + * since the running context is unknown.
> + *
> + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> + * is safe in any context. Also zeroing the page is mandatory for
> + * BPF use cases.
> + *
> + * Though __GFP_NOMEMALLOC is not checked in the code path below,
> + * specify it here to highlight that alloc_pages_nolock()
> + * doesn't want to deplete reserves.
> + */
> +static const gfp_t gfp_nolock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
> +				__GFP_COMP;
> +
> +/*
> + * This is the 'heart' of the zoned buddy allocator.
> + */
> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
> +		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
> +{
> +	struct page *page;
> +	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
> +	struct alloc_context ac = { };
> +	unsigned int fastpath_alloc_flags = alloc_flags;
> +
> +	/* Other flags could be supported later if needed. */
> +	if (WARN_ON(alloc_flags & ~ALLOC_NOLOCK))
>  		return NULL;
>  
> +	if (!alloc_order_allowed(gfp, order, alloc_flags))
> +		return NULL;
> +
> +	if (alloc_flags & ALLOC_NOLOCK) {
> +		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
> +		if (!alloc_trylock_allowed())
> +			return NULL;
> +		gfp |= gfp_nolock;

I think we could do a
		fastpath_alloc_flags |= ALLOC_WMARK_MIN;

to make it explicit, even though it's a no-op (the value is 0) and
alloc_frozen_pages_nolock_noprof() didn't do it.

> +	} else {
> +		fastpath_alloc_flags |= ALLOC_WMARK_LOW;
> +	}
> +
>  	gfp &= gfp_allowed_mask;
>  	/*
>  	 * Apply scoped allocation constraints. This is mainly about GFP_NOFS
> @@ -5310,9 +5384,9 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  	fastpath_alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
>  	fastpath_alloc_flags |= alloc_flags_nonblocking(gfp, order) & ALLOC_HIGHATOMIC;
>  
> -	/* First allocation attempt */
> +	/* First allocation attempt (or, for nolock, only attempt) */
>  	page = get_page_from_freelist(alloc_gfp, order, fastpath_alloc_flags, &ac);
> -	if (likely(page))
> +	if (likely(page) || (alloc_flags & ALLOC_NOLOCK))
>  		goto out;
>  
>  	alloc_gfp = gfp;
> @@ -5329,7 +5403,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>  out:
>  	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>  	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> -		free_frozen_pages(page, order);
> +		__free_frozen_pages(page, order,
> +				    alloc_flags & ALLOC_NOLOCK ? FPI_TRYLOCK : 0);
>  		page = NULL;
>  	}
>  
> @@ -5345,7 +5420,8 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
>  {
>  	struct page *page;
>  
> -	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
> +	page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask,
> +					   ALLOC_DEFAULT);
>  	if (page)
>  		set_page_refcounted(page);
>  	return page;
> @@ -7875,80 +7951,10 @@ static bool __free_unaccepted(struct page *page)
>  
>  struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int order)
>  {
> -	/*
> -	 * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
> -	 * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
> -	 * is not safe in arbitrary context.
> -	 *
> -	 * These two are the conditions for gfpflags_allow_spinning() being true.
> -	 *
> -	 * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
> -	 * to warn. Also warn would trigger printk() which is unsafe from
> -	 * various contexts. We cannot use printk_deferred_enter() to mitigate,
> -	 * since the running context is unknown.
> -	 *
> -	 * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
> -	 * is safe in any context. Also zeroing the page is mandatory for
> -	 * BPF use cases.
> -	 *
> -	 * Though __GFP_NOMEMALLOC is not checked in the code path below,
> -	 * specify it here to highlight that alloc_pages_nolock()
> -	 * doesn't want to deplete reserves.
> -	 */
> -	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
> -			| gfp_flags;
> -	unsigned int alloc_flags = ALLOC_NOLOCK;
> -	struct alloc_context ac = { };
> -	struct page *page;
> -
> -	VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
> -	/*
> -	 * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
> -	 * unsafe in NMI. If spin_trylock() is called from hard IRQ the current
> -	 * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will
> -	 * mark the task as the owner of another rt_spin_lock which will
> -	 * confuse PI logic, so return immediately if called from hard IRQ or
> -	 * NMI.
> -	 *
> -	 * Note, irqs_disabled() case is ok. This function can be called
> -	 * from raw_spin_lock_irqsave region.
> -	 */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> -		return NULL;
> -
> -	/* On UP, spin_trylock() always succeeds even when it is locked */
> -	if (!IS_ENABLED(CONFIG_SMP) && in_nmi())
> -		return NULL;
> -
> -	if (!pcp_allowed_order(order))
> -		return NULL;
> -
> -	/* Bailout, since _deferred_grow_zone() needs to take a lock */
> -	if (deferred_pages_enabled())
> -		return NULL;
> -
>  	if (nid == NUMA_NO_NODE)
>  		nid = numa_node_id();
>  
> -	prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac,
> -			    &alloc_gfp, &alloc_flags);
> -
> -	/*
> -	 * Best effort allocation from percpu free list.
> -	 * If it's empty attempt to spin_trylock zone->lock.
> -	 */
> -	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
> -
> -	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
> -
> -	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
> -	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
> -		__free_frozen_pages(page, order, FPI_TRYLOCK);
> -		page = NULL;
> -	}
> -	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
> -	kmsan_alloc_page(page, order, alloc_gfp);
> -	return page;
> +	return __alloc_frozen_pages_noprof(gfp_flags, order, nid, NULL, ALLOC_NOLOCK);
>  }
>  /**
>   * alloc_pages_nolock - opportunistic reentrant allocation from any context
> diff --git a/mm/page_alloc.h b/mm/page_alloc.h
> index 3250d44f96457..e16f905f859a7 100644
> --- a/mm/page_alloc.h
> +++ b/mm/page_alloc.h
> @@ -11,6 +11,7 @@
>  #include <linux/nodemask.h>
>  #include <linux/types.h>
>  
> +#define ALLOC_DEFAULT		0
>  /* The ALLOC_WMARK bits are used as an index to zone->watermark */
>  #define ALLOC_WMARK_MIN		WMARK_MIN
>  #define ALLOC_WMARK_LOW		WMARK_LOW
> @@ -219,7 +220,7 @@ extern bool free_pages_prepare(struct page *page, unsigned int order);
>  extern int user_min_free_kbytes;
>  
>  struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int nid,
> -		nodemask_t *nodemask);
> +		nodemask_t *nodemask, unsigned int alloc_flags);
>  #define __alloc_frozen_pages(...) \
>  	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
>  void free_frozen_pages(struct page *page, unsigned int order);
> @@ -230,7 +231,8 @@ struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order);
>  #else
>  static inline struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order)
>  {
> -	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL);
> +	return __alloc_frozen_pages_noprof(gfp, order, numa_node_id(), NULL,
> +					   0 /* ALLOC_DEFAULT */);

Can use ALLOC_DEFAULT now.

>  }
>  #endif
>  


  parent reply	other threads:[~2026-06-30 16:16 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 13:11 [PATCH v3 00/16] mm: Some cleanups for page allocator APIs Brendan Jackman
2026-06-29 13:11 ` [PATCH v3 01/16] mm/page_alloc: rename ALLOC_TRYLOCK -> ALLOC_NOLOCK Brendan Jackman
2026-06-30 12:27   ` Vlastimil Babka (SUSE)
2026-06-29 13:11 ` [PATCH v3 02/16] mm/page_alloc: some renames to clarify alloc_flags scopes Brendan Jackman
2026-06-30 12:38   ` Vlastimil Babka (SUSE)
2026-06-30 17:25     ` Brendan Jackman
2026-06-29 13:11 ` [PATCH v3 03/16] mm: name some args in a function declaration Brendan Jackman
2026-06-30 12:43   ` Vlastimil Babka (SUSE)
2026-06-29 13:11 ` [PATCH v3 04/16] mm: Split out internal page_alloc.h Brendan Jackman
2026-06-30 13:54   ` Vlastimil Babka (SUSE)
2026-06-29 13:11 ` [PATCH v3 05/16] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
2026-06-30 13:36   ` Harry Yoo
2026-06-30 15:34     ` Vlastimil Babka (SUSE)
2026-06-30 16:56       ` Brendan Jackman
2026-06-30 17:04     ` Brendan Jackman
2026-06-30 16:16   ` Vlastimil Babka (SUSE) [this message]
2026-06-30 18:47     ` Brendan Jackman
2026-06-29 13:11 ` [PATCH v3 06/16] mm/page_alloc: relax GFP WARN in nolock allocs Brendan Jackman
2026-06-30 13:52   ` Harry Yoo
2026-06-30 16:42   ` Vlastimil Babka (SUSE)
2026-06-29 13:11 ` [PATCH v3 07/16] mm: move some stuff to mm/page_alloc.h Brendan Jackman
2026-06-30 16:42   ` Vlastimil Babka (SUSE)
2026-06-29 13:11 ` [PATCH v3 08/16] perf/x86/intel: Use higher-level allocator API Brendan Jackman
2026-06-29 13:11 ` [PATCH v3 09/16] KVM: VMX: " Brendan Jackman
2026-06-29 15:31   ` -EXT-[PATCH " Soderlund, David
2026-06-29 13:11 ` [PATCH v3 10/16] x86/virt: " Brendan Jackman
2026-06-29 13:12 ` [PATCH v3 11/16] sgi-xp: " Brendan Jackman
2026-06-29 18:47   ` Steve Wahl
2026-06-29 13:12 ` [PATCH v3 12/16] net/funeth: Switch to " Brendan Jackman
2026-06-29 13:12 ` [PATCH v3 13/16] mm: Remove __alloc_pages_node() Brendan Jackman
2026-06-29 13:12 ` [PATCH v3 14/16] mm: Move __alloc_pages() to mm/page_alloc.h Brendan Jackman
2026-06-29 13:12 ` [PATCH v3 15/16] mm: replace __GFP_NO_CODETAG with ALLOC_NO_CODETAG Brendan Jackman
2026-06-30  1:55   ` Hao Ge
2026-06-30 10:10     ` Brendan Jackman
2026-06-30 12:01     ` Brendan Jackman
2026-06-29 13:12 ` [PATCH v3 16/16] mm: remove the __GFP_NO_OBJ_EXT flag Brendan Jackman
2026-06-29 14:00 ` [PATCH v3 00/16] mm: Some cleanups for page allocator APIs Mike Rapoport
2026-06-29 14:30   ` Brendan Jackman
2026-06-29 15:05     ` Brendan Jackman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=611bd3dc-95d4-45e0-ae5a-158c6cf1472f@kernel.org \
    --to=vbabka@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=byungchul@sk.com \
    --cc=cl@gentwo.org \
    --cc=clrkwllms@kernel.org \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hao.ge@linux.dev \
    --cc=hao.li@linux.dev \
    --cc=harry@kernel.org \
    --cc=jackmanb@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=ljs@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox