Linux real-time development
 help / color / mirror / Atom feed
From: "Brendan Jackman" <brendan.jackman@linux.dev>
To: "Matthew Wilcox" <willy@infradead.org>,
	"Brendan Jackman" <jackmanb@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Vlastimil Babka" <vbabka@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>, "Zi Yan" <ziy@nvidia.com>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Oscar Salvador" <osalvador@suse.de>,
	"David Hildenbrand" <david@kernel.org>,
	"Lorenzo Stoakes" <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Joshua Hahn" <joshua.hahnjy@gmail.com>,
	"Rakie Kim" <rakie.kim@sk.com>,
	"Byungchul Park" <byungchul@sk.com>,
	"Ying Huang" <ying.huang@linux.alibaba.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Hao Li" <hao.li@linux.dev>, "Christoph Lameter" <cl@gentwo.org>,
	"David Rientjes" <rientjes@google.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Clark Williams" <clrkwllms@kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Harry Yoo (Oracle)" <harry@kernel.org>,
	"Gregory Price" <gourry@gourry.net>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <linux-rt-devel@lists.linux.dev>
Subject: Re: [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof()
Date: Fri, 19 Jun 2026 08:17:31 +0000	[thread overview]
Message-ID: <DJCVLX0WK118.3V7CXOTLISFOQ@linux.dev> (raw)
In-Reply-To: <ajS96fWbG4dzP3u3@casper.infradead.org>

On Fri Jun 19, 2026 at 3:56 AM UTC, Matthew Wilcox wrote:
> On Wed, Jun 17, 2026 at 03:29:42PM +0000, Brendan Jackman wrote:
>> +++ b/mm/page_alloc.c
>> @@ -5253,24 +5253,98 @@ void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>>  	}
>>  }
>>  
>> -/*
>> - * This is the 'heart' of the zoned buddy allocator.
>> - */
>> -struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>> -		int preferred_nid, nodemask_t *nodemask)
>> +static inline bool alloc_order_allowed(gfp_t gfp, unsigned int order,
>> +				       unsigned int alloc_flags)
>>  {
>> -	struct page *page;
>> -	unsigned int alloc_flags = ALLOC_WMARK_LOW;
>> -	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
>> -	struct alloc_context ac = { };
>> +
>
> Spurious blank line?

Yep, thanks.

>> +	if (alloc_flags & ALLOC_TRYLOCK)
>> +		return pcp_allowed_order(order);
> [...]
>> +/*
>> + * GFP flags to set for ALLOC_TRYLOCK i.e. alloc_pages_nolock().
>> + *
>> + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed.
>> + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd
>> + * is not safe in arbitrary context.
>> + *
>> + * These two are the conditions for gfpflags_allow_spinning() being true.
>> + *
>> + * Specify __GFP_NOWARN since failing alloc_pages_nolock() is not a reason
>> + * to warn. Also warn would trigger printk() which is unsafe from
>> + * various contexts. We cannot use printk_deferred_enter() to mitigate,
>> + * since the running context is unknown.
>> + *
>> + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below
>> + * is safe in any context. Also zeroing the page is mandatory for
>> + * BPF use cases.
>
> It may be mandatory for BPF, but it seems wasteful for other uses.

True, don't see why we shouldn't push this out to the caller, I can do
it as part of this series.

>> + * Though __GFP_NOMEMALLOC is not checked in the code path below,
>> + * specify it here to highlight that alloc_pages_nolock()
>> + * doesn't want to deplete reserves.
>> + */
>> +static const gfp_t gfp_trylock = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC |
>> +				__GFP_COMP;

> I rather dislike this being turned into a file-scope variable, even a
> non-varying variable.  Can't it remain inside a function?

Um, we could put it into a function like `void add_gfp_trylock(gfp_t *gfp)` 
but that doesn't really reduce the scope in any meaninful way, right? 

We could also squash it into what's currently called
`alloc_trylock_allowed` but then it's a bit of a mush function, would it
be called `do_trlock_stuff`?

Putting it directly into __alloc_frozen_pages_noprof() would make that
function too big IMO, and its real estate would be dominated by trylock
stuff.

It's definitely understandable to find the large variable scope yucky
but IMO the real fix for that would be to break up page_alloc.c, which
I don't really want to do in this series. 

>> +/*
>> + * This is the 'heart' of the zoned buddy allocator.
>> + */
>> +struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
>> +		int preferred_nid, nodemask_t *nodemask, unsigned int alloc_flags)
>> +{
>> +	struct page *page;
>> +	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
>> +	struct alloc_context ac = { };
>> +
>> +	/* Other flags could be supported later if needed. */
>> +	if (WARN_ON(alloc_flags & ~ALLOC_TRYLOCK))
>>  		return NULL;
>>  
>> +	if (!alloc_order_allowed(gfp, order, alloc_flags))
>> +		return NULL;
>> +
>> +	if (alloc_flags & ALLOC_TRYLOCK) {
>> +		VM_WARN_ON_ONCE(gfp & ~__GFP_ACCOUNT);
>
> So the only GFP flag the user is allowed to specify is __GFP_ACCOUNT?
> That seems bogus; other flags would be reasonable including all the ones
> in gfp_trylock, as well as GFP_HIGHMEM, GFP_DMA, GFP_MOVABLE, GFP_HARDWALL.

Definitely makes sense for the ones in gfp_trylock.

For the others, I'm not sure - this "nolock" functionality is a bit
weird and sketchy, I suspect the reason for the WARN here is "let's make
sure we have a proper think before we allow it to grow usecases that are
meaningfully different from the other ones". I think I like that
conservatism here, I would lean towards keeping it? Not a passionately
held opinion though.

  reply	other threads:[~2026-06-19  8:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 15:29 [PATCH] mm/page_alloc: unify __alloc_frozen_pages[_nolock]_noprof() Brendan Jackman
2026-06-17 16:39 ` Vlastimil Babka (SUSE)
2026-06-17 16:49   ` Suren Baghdasaryan
2026-06-17 17:14     ` Brendan Jackman
2026-06-18  2:22       ` Hao Ge
2026-06-19 11:57         ` Brendan Jackman
2026-06-19 18:08           ` Suren Baghdasaryan
2026-06-18  6:56 ` Hao Ge
2026-06-19  8:03   ` Brendan Jackman
2026-06-19  3:56 ` Matthew Wilcox
2026-06-19  8:17   ` Brendan Jackman [this message]
2026-06-19  8:43     ` Brendan Jackman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DJCVLX0WK118.3V7CXOTLISFOQ@linux.dev \
    --to=brendan.jackman@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bigeasy@linutronix.de \
    --cc=byungchul@sk.com \
    --cc=cl@gentwo.org \
    --cc=clrkwllms@kernel.org \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hao.li@linux.dev \
    --cc=harry@kernel.org \
    --cc=jackmanb@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=ljs@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox