Linux cgroups development
 help / color / mirror / Atom feed
* Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: David Hildenbrand (Arm) @ 2026-06-15  8:08 UTC (permalink / raw)
  To: Farhad Alemi, Andrew Morton, Waiman Long
  Cc: Farhad Alemi, Gregory Price, Yury Norov, Joshua Hahn, Zi Yan,
	Matthew Brost, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Rasmus Villemoes, linux-mm, linux-kernel,
	cgroups, stable
In-Reply-To: <CA+0ovCgfHJHv5d1mzapWWvF-LhjppzDX8NPPLvCPZxPKg8RiYw@mail.gmail.com>

On 6/14/26 15:25, Farhad Alemi wrote:

Hi, thanks for your patch!

For the future, please don't submit new revisions as reply to previous submissions.

> Creating a child cpuset where cpuset.mems is never set leads to a div/0
> when a VMA mempolicy with MPOL_F_RELATIVE_NODES rebinds in response to a
> CPU hotplug event.
> 
> Reproduction steps:
>  1) Create a cgroup w/ cpuset controls (do not set cpuset.mems)
>  2) Move the task into the child cpuset
>  3) Create a VMA mempolicy for that task with MPOL_F_RELATIVE_NODES
>  4) unplug and hotplug a cpu
>       echo 0 > /sys/devices/system/cpu/cpu1/online
>       echo 1 > /sys/devices/system/cpu/cpu1/online
>  5) mempolicy rebind does a div/0 in mpol_relative_nodemask on the
>     call to __nodes_fold()
> 
> The cpuset code passes (cs->mems_allowed) which is not guaranteed to have
> nodes to the rebind routine.  Use cs->effective_mems instead, which is
> guaranteed to have a non-empty nodemask.

Probably worth mentioning here that this makes the linked reproducer happy.

> 
> Link: https://lore.kernel.org/linux-mm/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@mail.gmail.com/

This should be a

Closes:
https://lore.kernel.org/linux-mm/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@mail.gmail.com/

> Link: https://lore.kernel.org/all/CA+0ovCiEz6SP_sn3kN4Tb+_oC=eHMXy_Ffj=usV3wREdQrUtww@mail.gmail.com/
> Fixes: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}")
> Suggested-by: Gregory Price <gourry@gourry.net>
> Suggested-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Farhad Alemi <farhad.alemi@berkeley.edu>
> Cc: stable@vger.kernel.org
> ---
> v2: rebind to cs->effective_mems instead of newmems (Waiman Long);
>     condense the changelog.
> 
>  kernel/cgroup/cpuset.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
> 
>  		migrate = is_memory_migrate(cs);
> 
> -		mpol_rebind_mm(mm, &cs->mems_allowed);
> +		mpol_rebind_mm(mm, &cs->effective_mems);

God this is confusing.

So, we obtain newmems from guarantee_online_mems(), which guarantees that
newmems is non-empty.

In cpuset_change_task_nodemask(), we set tsk->mems_allowed to newmems, and call
mpol_rebind_task(tsk, newmems).

So at least tsk->mems_allowed should be non-empty.

Then we call mpol_rebind_mm(mm, &cs->mems_allowed);


Naturally I wonder: Why are we not using "task->mems_allowed" (maybe cs vs. tsk
was the original bug?), which is effectively just newmems?

guarantee_online_mems() computes newmems as "cs->effective_mems &
node_states[N_MEMORY]", but walks up to the parent if it would be empty.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v2 02/16] mm/slab: do not init any kfence objects on allocation
From: Vlastimil Babka (SUSE) @ 2026-06-15  8:52 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Harry Yoo, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm,
	linux-kernel, cgroups
In-Reply-To: <CAJuCfpH8g9mNGV_ke-mhVZ=J9J05PZg-ozPTA=5WQrm_eViVpA@mail.gmail.com>

On 6/15/26 03:28, Suren Baghdasaryan wrote:
> On Thu, Jun 11, 2026 at 9:37 AM Vlastimil Babka (SUSE)
> <vbabka@kernel.org> wrote:
>>
>> On 6/11/26 17:11, Harry Yoo wrote:
>> >
>> >> From 3a1c4398ce9f361a4e6f4d9946eab6237eea89c2 Mon Sep 17 00:00:00 2001
>> >> From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
>> >> Date: Wed, 10 Jun 2026 17:40:04 +0200
>> >> Subject: [PATCH] mm/slab: do not init any kfence objects on allocation
>> >>
>> >> When init (zeroing) on allocation is requested, for kmalloc() we
>> >> generally have to zero the full object size even if a smaller size is
>> >> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
>> >>
>> >> When we end up allocating a kfence object, kfence perfoms the zeroing on
>> >
>> > nit: perfoms -> performs
>>
>> Fixed.
>>
>> >> its own because has its own redzone beyond the requested size. Thus
> 
> nit: s/because has/because it has

Fixed.

> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks!

^ permalink raw reply

* Re: [PATCH v2 05/16] mm/slab: introduce alloc_flags and SLAB_ALLOC_TRYLOCK
From: Vlastimil Babka (SUSE) @ 2026-06-15  9:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Suren Baghdasaryan
  Cc: Hao Li, Harry Yoo, Christoph Lameter, David Rientjes,
	Roman Gushchin, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, LKML,
	open list:CONTROL GROUP (CGROUP)
In-Reply-To: <CAADnVQJPETYAOd9R9Bg2JuuF1q7grg8VtEnvdvr0fDFhxb9O6A@mail.gmail.com>

On 6/15/26 04:16, Alexei Starovoitov wrote:
> On Sun, Jun 14, 2026 at 7:01 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Thu, Jun 11, 2026 at 8:50 PM Hao Li <hao.li@linux.dev> wrote:
>> >
>> > On Wed, Jun 10, 2026 at 05:40:07PM +0200, Vlastimil Babka (SUSE) wrote:
>> > > Similarly to the page allocators, introduce slab-allocator specific
>> > > alloc flags that internally control allocation behavior in addition to
>> > > gfp_flags, without occupying the limited gfp flags space.
>> > >
>> > > Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
>> > > page allocator's ALLOC_TRYLOCK and will be used to reimplement
>> > > kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
>> > > gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
>> > > importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
>> > > e.g. in early boot with a restricted gfp_allowed_mask.
>> > >
>> > > Also introduce alloc_flags_allow_spinning() to replace the usage of
>> > > gfpflags_allow_spinning().
>> > >
>> > > Start using alloc_flags and the new check first in alloc_from_pcs() and
>> > > __pcs_replace_empty_main(). This means some slab allocations that were
>> > > falsely treated as kmalloc_nolock() due to their gfp flags will now have
>> > > higher chances of succeed, and this will further increase with followup
>>
>> nit: I think it should be either "higher chances of succeess" or
>> "higher chances to succeed".

success it is

>>
>> > > changes.
>> > >
>> > > Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
>> > > reach it from a slab allocation that's not _nolock() and yet lacks
>> > > __GFP_KSWAPD_RECLAIM for other reasons.
>> > >
>> > > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> > > ---
>> >
>> > Reviewed-by: Hao Li <hao.li@linux.dev>
>>
>> I would call SLAB_ALLOC_TRYLOCK something like SLAB_ALLOC_NOSPIN or
>> SLAB_ALLOC_NOLOCK but naming is hard and I don't claim myself to be
>> good at it. So, feel free to adopt my suggestion if you like it or
>> ignore it otherwise.
>>
>> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> 
> Just noticed "trylock" in the #define SLAB_ALLOC_TRYLOCK
> 
> Please call it SLAB_ALLOC_NOLOCK.
> 
> Initial api was using 'trylock' name and it was a mistake,
> since people assumed normal spin_trylock() like semantics.
> "trylock" implies that it fails under contention
> and retry is a normal next step. It's not the case.
> No one should be retrying. That's why the final api was kmalloc_nolock().
> So please keep this important distinction in the name.
> SLAB_ALLOC_NOLOCK should mean that spinning locks
> should not be taken. It should not mean "just go to trylock everywhere".

Eh, ok then, will change to SLAB_ALLOC_NOLOCK. Even though it's mostly internal.

So next thing we change page allocator's ALLOC_TRYLOCK to ALLOC_NOLOCK too?


^ permalink raw reply

* Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: Gregory Price @ 2026-06-15  9:38 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Farhad Alemi, Andrew Morton, Waiman Long, Farhad Alemi,
	Yury Norov, Joshua Hahn, Zi Yan, Matthew Brost, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Rasmus Villemoes,
	linux-mm, linux-kernel, cgroups, stable
In-Reply-To: <8d3b4561-92cd-4ebc-8462-5fb0fd659e8a@kernel.org>

On Mon, Jun 15, 2026 at 10:08:51AM +0200, David Hildenbrand (Arm) wrote:
> On 6/14/26 15:25, Farhad Alemi wrote:
> > 
> > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > --- a/kernel/cgroup/cpuset.c
> > +++ b/kernel/cgroup/cpuset.c
> > @@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
> > 
> >  		migrate = is_memory_migrate(cs);
> > 
> > -		mpol_rebind_mm(mm, &cs->mems_allowed);
> > +		mpol_rebind_mm(mm, &cs->effective_mems);
> 
> God this is confusing.
>

All interactions between mempolicy and cpuset are horrible and
confusing.  Much like Lorenzo's anon_vma work, I have to keep
notes on how this whole thing doesn't just spew SIGBUS constantly.

The short answer is: mempolicy is advisory and cpuset is strictly
followed - in a dispute cpuset wins... except for file backed memory,
then everyon loses and nothing is consistent.

> Naturally I wonder: Why are we not using "task->mems_allowed" (maybe cs vs. tsk
> was the original bug?), which is effectively just newmems?
>

Short answer: task->mems_allowed is protected by the task lock and we
don't hold the task lock for a foreign task (not-current) over mm
operations.

Long answer: Reasons and "Stop looking at the spaghetti, it's going to
break"

~Gregory

^ permalink raw reply

* Re: [PATCH v2 07/16] mm/slab: replace struct partial_context with slab_alloc_context
From: Vlastimil Babka (SUSE) @ 2026-06-15 10:01 UTC (permalink / raw)
  To: Suren Baghdasaryan, Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov,
	kasan-dev, linux-mm, linux-kernel, cgroups
In-Reply-To: <CAJuCfpE3XfxLmV-DzM5nLqYqGsFJThr-1i4bmEEqMpGZ28RLFQ@mail.gmail.com>

On 6/15/26 04:36, Suren Baghdasaryan wrote:
> On Wed, Jun 10, 2026 at 11:05 PM Harry Yoo <harry@kernel.org> wrote:
>>
>>
>>
>> On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
>> > Refactor get_from_partial_node(), get_from_any_partial(),
>> > get_from_partial() and ___slab_alloc().
>> >
>> > Remove struct partial_context, which used to be more substantial but
>> > shrank as part of the sheaves conversion. Instead pass gfp_flags and
>> > pointer to the new slab_alloc_context, which together is a superset of
>> > partial_context.
>> >
>> > This means alloc_flags are now available and we can use them to
>> > determine if spinning is allowed, further reducing false positive "not
>> > allowed" in the slow path due to gfp flags lacking __GFP_RECLAIM.
>> >
>> > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> > ---
>>
>> Looks good to me,
>> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> 
> Ah, nice! The conversion I was anticipating in the previous patch...
> I would do this removal of partial_context as patch 6 and then convert
> ___slab_alloc() and get_from_any_partial*() altogether in patch 7. I
> think that would keep the behavior of the ___slab_alloc() more robust
> throughout the patchset. But I would say it's nice to have, not a
> must-have.

OK, so I switched the order of 6 7 and all the changes from
gfpflags_allow_spinning() to alloc_flags_allow_spinning are now in the
newly-later patch; the "replace struct partial_context with
slab_alloc_context" part has no functional changes. Verified that the end
result is exactly the same, and only updated changelogs a bit.

> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks!

>>
>> --
>> Cheers,
>> Harry / Hyeonggon


^ permalink raw reply

* Re: [PATCH v2 08/16] mm/slab: pass alloc_flags to new slab allocation
From: Vlastimil Babka (SUSE) @ 2026-06-15 10:14 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups
In-Reply-To: <49ca905a-0303-4fad-8257-485b0ed47c8d@kernel.org>

On 6/11/26 09:52, Harry Yoo wrote:
> 
> 
> On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
>> Add the alloc_flags parameter to allocate_slab() and new_slab()
>> so it can be used to determine if spinning is allowed, independently
>> from gfp flags.
>> 
>> refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
>> reached from contexts that allow spinning.
>> 
>> Also change how trynode_flags are constructed in ___slab_alloc() to
>> achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
>> of a branch. It will now also not upgrade in cases where gfp is weaker
>> than GFP_NOWAIT (i.e. lacks __GFP_KSWAPD_RECLAIM) but doesn't come from
>> kmalloc_nolock() - which is more correct anyway.
> 
> Wait, debugobjects intentionally avoids __GFP_KSWAPD_RECLAIM,
> but we have been upgrading it to GFP_NOWAIT?

Actually, we have not been upgrading it until patch 6/16, which made the
upgrade trigger by starting to rely on alloc_flags? Because previously it
would be !allow_spin due to lack of __GFP_KSWAPD_RECLAIM.

So I will move that flags adjustment to 6/16 (now 7/16).

>> During the masking keep also existing __GFP_NOMEMALLOC (pointed out by
>> Sashiko) and __GFP_ACCOUNT. Previously the hardcoded GFP_NOWAIT would
>> eliminate them, but it's not a big problem that would need a separate
>> fix.
> 
> Ack.
> 
>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> ---
>>  mm/slub.c | 28 ++++++++++++++--------------
>>  1 file changed, 14 insertions(+), 14 deletions(-)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 98b79e5e7679..8f6ca3d5fdfa 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -4467,25 +4470,22 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  	 * 1) try to get a partial slab from target node only by having
>>  	 *    __GFP_THISNODE in pc.flags for get_from_partial()
>>  	 * 2) if 1) failed, try to allocate a new slab from target node with
>> -	 *    GPF_NOWAIT | __GFP_THISNODE opportunistically
>> +	 *    (at most) GFP_NOWAIT | __GFP_THISNODE opportunistically
>>  	 * 3) if 2) failed, retry with original gfpflags which will allow
>>  	 *    get_from_partial() try partial lists of other nodes before
>>  	 *    potentially allocating new page from other nodes
>>  	 */
>>  	if (unlikely(node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
>>  		     && try_thisnode)) {
>> -		if (unlikely(!allow_spin))
>> -			/* Do not upgrade gfp to NOWAIT from more restrictive mode */
>> -			trynode_flags = gfpflags | __GFP_THISNODE;
>> -		else
>> -			trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
>> +		trynode_flags &= GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_ACCOUNT;
>> +		trynode_flags |= __GFP_NOWARN | __GFP_THISNODE;
>>  	}
> 


^ permalink raw reply

* Re: [PATCH v2 12/16] mm/slab: pass slab_alloc_context to __do_kmalloc_node()
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:08 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Harry Yoo, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups
In-Reply-To: <CAJuCfpHOXkZFq8UKiSqXzG-RBFNw5gO-JR0bBCn9uRc3Oc5ZbA@mail.gmail.com>

On 6/15/26 06:58, Suren Baghdasaryan wrote:
> On Wed, Jun 10, 2026 at 8:41 AM Vlastimil Babka (SUSE)
> <vbabka@kernel.org> wrote:
>>
>> With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
>> alloc flag that prevents kmalloc recursion. For that we need a version
>> of kmalloc() that takes alloc_flags and use it in places that perform
>> these potentially recursive kmalloc allocations (of sheaves or obj_ext
>> arrays).
>>
>> As a preparatory step, make __do_kmalloc_node() take a pointer to
>> slab_alloc_context. This replaces the 'caller' parameter and includes
>> alloc_flags which we'll make use of.
> 
> I think you could also eliminate __do_kmalloc_node() function's "size"
> parameter as it's always the same as ac->orig_size.

OK, done.


^ permalink raw reply

* Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: David Hildenbrand (Arm) @ 2026-06-15 11:08 UTC (permalink / raw)
  To: Gregory Price
  Cc: Farhad Alemi, Andrew Morton, Waiman Long, Farhad Alemi,
	Yury Norov, Joshua Hahn, Zi Yan, Matthew Brost, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Rasmus Villemoes,
	linux-mm, linux-kernel, cgroups, stable
In-Reply-To: <ai_IHvyptWPcTD0y@gourry-fedora-PF4VCD3F>

On 6/15/26 11:38, Gregory Price wrote:
> On Mon, Jun 15, 2026 at 10:08:51AM +0200, David Hildenbrand (Arm) wrote:
>> On 6/14/26 15:25, Farhad Alemi wrote:
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
>>>
>>>  		migrate = is_memory_migrate(cs);
>>>
>>> -		mpol_rebind_mm(mm, &cs->mems_allowed);
>>> +		mpol_rebind_mm(mm, &cs->effective_mems);
>>
>> God this is confusing.
>>
> 
> All interactions between mempolicy and cpuset are horrible and
> confusing.  Much like Lorenzo's anon_vma work, I have to keep
> notes on how this whole thing doesn't just spew SIGBUS constantly.
> 
> The short answer is: mempolicy is advisory and cpuset is strictly
> followed - in a dispute cpuset wins... except for file backed memory,
> then everyon loses and nothing is consistent.
> 
>> Naturally I wonder: Why are we not using "task->mems_allowed" (maybe cs vs. tsk
>> was the original bug?), which is effectively just newmems?
>>
> 
> Short answer: task->mems_allowed is protected by the task lock and we
> don't hold the task lock for a foreign task (not-current) over mm
> operations.

Well, we can just use newmems, which cannot change? Again, that is based on
cs->effective_mems but is guaranteed to return something non-empty.

AI was not able to convince me (neither was I able to convince AI) that there is
not some obscure cgroup v1 scenario where the current fix would also be wrong.

With newmems it's clear that it is guaranteed to not be empty.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v2 15/16] mm/slab: remove __GFP_NO_OBJ_EXT usage from alloc_slab_obj_exts()
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:11 UTC (permalink / raw)
  To: Suren Baghdasaryan, Hao Li
  Cc: Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin,
	Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov,
	kasan-dev, linux-mm, linux-kernel, cgroups, Hao Ge
In-Reply-To: <CAJuCfpFNftMYw0XoHyN1QAWfm7NYmeuY1T_NGbYy8boGO48MOg@mail.gmail.com>

On 6/15/26 07:38, Suren Baghdasaryan wrote:
> On Fri, Jun 12, 2026 at 4:30 AM Hao Li <hao.li@linux.dev> wrote:
>>
>> On Fri, Jun 12, 2026 at 12:17:45PM +0200, Vlastimil Babka (SUSE) wrote:
>> > On 6/12/26 08:54, Hao Li wrote:
>> > > On Wed, Jun 10, 2026 at 05:40:17PM +0200, Vlastimil Babka (SUSE) wrote:
>> > >> __GFP_NO_OBJ_EXT has limited scope within the slab allocator itself and
>> > >> gfp flags are a scarce resource, unlike slab's alloc_flags.
>> > >>
>> > >> Introduce SLAB_ALLOC_NO_RECURSE alloc flag that has the same intent as
>> > >> __GFP_NO_OBJ_EXT but a more generic name, meaning that a kmalloc()
>> > >> family function should not recurse into another kmalloc*() for the
>> > >> purposes of allocating auxiliary structures (obj_ext arrays or sheaves).
>> > >>
>> > >> First, replace the __GFP_NO_OBJ_EXT for allocating obj_ext arrays in
>> > >> alloc_slab_obj_exts(). Make use of the newly added kmalloc_flags()
>> > >> function, where we can pass alloc_flags with SLAB_ALLOC_NO_RECURSE
>> > >> added. This will also pass through SLAB_ALLOC_TRYLOCK so we don't need
>> > >> to special case kmalloc_nolock() anymore.
>> > >>
>> > >> Note that until now the kmalloc_nolock() ignored the incoming gfp flags
>> > >> and hardcoded __GFP_ZERO | __GFP_NO_OBJ_EXT. But it's correct to pass on
>> > >> the incoming gfp flags (only augmented with __GFP_ZERO), because if
>> > >> alloc_flags contain SLAB_ALLOC_TRYLOCK, the incoming gfp flags have to
>> > >> be also compatible with it.
>> > >>
>> > >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> > >> ---
>> > >>  mm/slab.h |  1 +
>> > >>  mm/slub.c | 13 +++++--------
>> > >>  2 files changed, 6 insertions(+), 8 deletions(-)
>> > >>
>> > >> diff --git a/mm/slab.h b/mm/slab.h
>> > >> index 45bfcfb35a9c..509f330654b8 100644
>> > >> --- a/mm/slab.h
>> > >> +++ b/mm/slab.h
>> > >> @@ -21,6 +21,7 @@
>> > >>  #define SLAB_ALLOC_DEFAULT        0x00 /* no flags */
>> > >>  #define SLAB_ALLOC_TRYLOCK        0x01 /* a kmalloc_nolock() allocation */
>> > >>  #define SLAB_ALLOC_NEW_SLAB       0x02 /* a flag for alloc_slab_obj_exts() */
>> > >> +#define SLAB_ALLOC_NO_RECURSE     0x04 /* prevent kmalloc() recursion */
>> > >>
>> > >>  static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
>> > >>  {
>> > >> diff --git a/mm/slub.c b/mm/slub.c
>> > >> index cbb38bd01e46..7dfbd0251aa2 100644
>> > >> --- a/mm/slub.c
>> > >> +++ b/mm/slub.c
>> > >> @@ -2167,15 +2167,12 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>> > >>
>> > >>    gfp &= ~OBJCGS_CLEAR_MASK;
>> > >>    /* Prevent recursive extension vector allocation */
>> > >> -  gfp |= __GFP_NO_OBJ_EXT;
>> > >> +  alloc_flags |= SLAB_ALLOC_NO_RECURSE;
>> > >>
>> > >>    sz = obj_exts_alloc_size(s, slab, gfp);
>> > >>
>> > >
>> > > For the original calls to kmalloc_nolock and kmalloc_node, I notice a difference:
>> > >
>> > >> -  if (unlikely(!allow_spin))
>> > >> -          vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
>> > >> -                               slab_nid(slab));
>> > >
>> > > kmalloc_nolock completely discarded `gfp` flags.
>> > >
>> > >> -  else
>> > >> -          vec = kmalloc_node(sz, gfp | __GFP_ZERO, slab_nid(slab));
>> > >
>> > > while kmalloc_node preserved and passed it along.
>> > >
>> > >> +  /* This will use kmalloc_nolock() if alloc_flags say so */
>> > >> +  vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));
>> > >
>> > > Now both paths are merged into kmalloc_flags, the gfp flags are
>> > > unconditionally carried through. It seems this might carry some unwanted flags.
>> > >
>> > > I traced the call path and found that ___slab_alloc sets the __GFP_THISNODE
>> > > for trynode_flags. If this flag propagates all the way into
>> > > kmalloc_flags->...->__kmalloc_nolock_noprof, it will trigger the
>> > > VM_WARN_ON_ONCE warning. Maybe we need to strip the original gfp if
>> > > `!allow_spin`.
>> >
>> > Thanks. This should do the job in a more generic way I hope?
>> >
>>
>> Yeah, this is more elegant.
>>
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index f9b8dc56bb57..0bf53f70c9be 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2047,12 +2047,15 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node,
>> >  #endif /* CONFIG_SLUB_DEBUG */
>> >
>> >  /*
>> > - * The allocated objcg pointers array is not accounted directly.
>> > + * The allocated objcg pointers array or sheaf is not accounted directly.
>> >   * Moreover, it should not come from DMA buffer and is not readily
>> > - * reclaimable. So those GFP bits should be masked off.
>> > + * reclaimable. Node restriction for the parent allocation also should
>> > + * not apply to the slab's internal objects.
>> > + * So those GFP bits should be masked off.
>> >   */
>> >  #define OBJCGS_CLEAR_MASK      (__GFP_DMA | __GFP_RECLAIMABLE | \
>> > -                               __GFP_ACCOUNT | __GFP_NOFAIL)
>> > +                               __GFP_ACCOUNT | __GFP_NOFAIL |
>> > +                               __GFP_THISNODE )
>>
>> Good idea! Both code and comments make sense to me.
> 
> Makes sense. I see
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-next
> already implementing this and also keeping __GFP_NO_OBJ_EXT and
> SLAB_ALLOC_NO_RECURSE both used. That version looks good to me, so
> I'll wait for v3.

OK.

> At the end of this series, we end up with no users of __GFP_NO_OBJ_EXT
> but we still keep it defined. I'm guessing you leave it because of the
> new patch [1] which aliases __GFP_NO_OBJ_EXT? I will have to make that

Yeah.

> mechanism work without a GFP flag, possibly using a similar approach.
> CC'ing Hao Ge to be in the loop of these changes. I'll work with him
> on aliminating that __GFP_NO_OBJ_EXT alias.

Good, then we can remove the flag completely.

> [1] https://lore.kernel.org/all/20260604024008.46592-1-hao.ge@linux.dev/
> 
>>
>> >
>> >  #ifdef CONFIG_SLAB_OBJ_EXT
>> >
>> >
>>
>> --
>> Thanks,
>> Hao


^ permalink raw reply

* Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: Gregory Price @ 2026-06-15 11:19 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Farhad Alemi, Andrew Morton, Waiman Long, Farhad Alemi,
	Yury Norov, Joshua Hahn, Zi Yan, Matthew Brost, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Rasmus Villemoes,
	linux-mm, linux-kernel, cgroups, stable
In-Reply-To: <ec4b4b70-dc01-41fc-ad58-e1c877f6a7eb@kernel.org>

On Mon, Jun 15, 2026 at 01:08:16PM +0200, David Hildenbrand (Arm) wrote:
> With newmems it's clear that it is guaranteed to not be empty.

I hadn't noticed he switched the patch from newmems -> effective_mems.

This needs to be changed back to newmems, otherwise we're depending on
a derivative value set somewhere else in the code being correct instead
of using what we *know* is correct *at the moment we need to use it*.

So yes, go back to using newmems.

~Gregory

^ permalink raw reply

* Re: [PATCH v2 09/16] mm/slab: pass alloc_flags through slab_post_alloc_hook() chain
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:33 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Harry Yoo, Hao Li, Christoph Lameter, David Rientjes,
	Roman Gushchin, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups
In-Reply-To: <CAJuCfpF0mcV3TUCNi981YO=uT=5p_7OOY1S6zdgwm5PMMV3w8g@mail.gmail.com>

On 6/15/26 06:35, Suren Baghdasaryan wrote:
> On Wed, Jun 10, 2026 at 8:41 AM Vlastimil Babka (SUSE)
> <vbabka@kernel.org> wrote:
>> @@ -4568,9 +4577,8 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
>>  }
>>
>>  static __fastpath_inline
>> -bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
>> -                         gfp_t flags, size_t size, void **p,
>> -                         unsigned int orig_size)
>> +bool slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, size_t size,
>> +                         void **p, struct slab_alloc_context *ac)
> 
> Would if be possible to make this last parameter a ""const struct
> slab_alloc_context*" (here and in other functions accepting it)? I
> think these functions accept it as an input parameter only and are not
> supposed to change it, right? Makes it easy to veriy that
> slab_alloc_context is not changed between consequitive calls reusing
> it, for example inside slab_alloc_node().

Uh, ok, did that. Also changed orig_size to size_t.

>>  {
>>         bool init = slab_want_init_on_alloc(flags, s);
>>         unsigned int zero_size = s->object_size;
>> @@ -4590,7 +4598,7 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
>>          * orig_size if we track it.
>>          */
>>         if (slub_debug_orig_size(s))
>> -               zero_size = orig_size;
>> +               zero_size = ac->orig_size;
>>
>>         /*
>>          * When slab_debug is enabled, avoid memory initialization integrated
>> @@ -4616,14 +4624,14 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
>>                                      !kasan_has_integrated_init())
>>                                  && !is_kfence_address(p[i]))
>>                         memset(p[i], 0, zero_size);
>> -               if (gfpflags_allow_spinning(flags))
>> +               if (alloc_flags_allow_spinning(ac->alloc_flags))
>>                         kmemleak_alloc_recursive(p[i], s->object_size, 1,
>>                                                  s->flags, init_flags);
>>                 kmsan_slab_alloc(s, p[i], init_flags);
>> -               alloc_tagging_slab_alloc_hook(s, p[i], flags);
>> +               alloc_tagging_slab_alloc_hook(s, p[i], flags, ac->alloc_flags);
>>         }
>>
>> -       return memcg_slab_post_alloc_hook(s, lru, flags, size, p);
>> +       return memcg_slab_post_alloc_hook(s, flags, size, p, ac);
>>  }
>>
>>  /*
>> @@ -4918,6 +4926,12 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
>>  {
>>         const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
>>         void *object;
>> +       struct slab_alloc_context ac = {
>> +               .caller_addr = addr,
>> +               .orig_size = orig_size,
>> +               .alloc_flags = alloc_flags,
>> +               .lru = lru,
>> +       };
>>
>>         s = slab_pre_alloc_hook(s, gfpflags);
>>         if (unlikely(!s))
>> @@ -4929,14 +4943,8 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
>>
>>         object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
>>
>> -       if (unlikely(!object)) {
>> -               struct slab_alloc_context ac = {
>> -                       .caller_addr = addr,
>> -                       .orig_size = orig_size,
>> -                       .alloc_flags = alloc_flags,
>> -               };
>> +       if (!object)
> 
> Any reason "unlikely" is removed?

No, fixed, thanks!

>>                 object = __slab_alloc_node(s, gfpflags, node, &ac);
>> -       }
>>
>>         maybe_wipe_obj_freeptr(s, object);
>>

^ permalink raw reply

* Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: David Hildenbrand (Arm) @ 2026-06-15 11:39 UTC (permalink / raw)
  To: Gregory Price
  Cc: Farhad Alemi, Andrew Morton, Waiman Long, Farhad Alemi,
	Yury Norov, Joshua Hahn, Zi Yan, Matthew Brost, Rakie Kim,
	Byungchul Park, Ying Huang, Alistair Popple, Rasmus Villemoes,
	linux-mm, linux-kernel, cgroups, stable
In-Reply-To: <ai_fsVg51_GTtzT1@gourry-fedora-PF4VCD3F>

On 6/15/26 13:19, Gregory Price wrote:
> On Mon, Jun 15, 2026 at 01:08:16PM +0200, David Hildenbrand (Arm) wrote:
>> With newmems it's clear that it is guaranteed to not be empty.
> 
> I hadn't noticed he switched the patch from newmems -> effective_mems.
> 
> This needs to be changed back to newmems, otherwise we're depending on
> a derivative value set somewhere else in the code being correct instead
> of using what we *know* is correct *at the moment we need to use it*.
> 
> So yes, go back to using newmems.

Right, that's what v1 did looking at this now. Waiman requested the change, but
I don't think we want that.

So for v1:

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply

* [PATCH v3 00/15] mm/slab: introduce alloc_flags and slab_alloc_context
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)

This series is now in slab/for-next, based on the slab-for-7.2 tag that
was sent as first PR to Linus. Posting new version due to many
accumulated changes, for final rounds of review. The plan is to send a
second slab PR with this early next week, if nothing explodes.

Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags

The slab implementation currently relies on gfp flags to convey
some context information internally:

- The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
  on locks", and intended to be used by kmalloc_nolock(). But false
  positives are possible e.g. during early boot where gfp_allowed_mask
  clears __GFP_RECLAIM from all allocations. This leads to unnecessary
  allocation failures and workarounds such as fd3634312a04 ("debugobject:
  Make it work with deferred page initialization - again").

- __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
  space, only to prevent recursive kmalloc() allocations for obj_ext
  arrays and sheaves.

The page allocator uses its internal alloc_flags to convey various
context information, including ALLOC_TRYLOCK (meaning "cannot spin").
This series copies that concept for the slab allocator, with its own
slab-specific internal flags:

- SLAB_ALLOC_DEFAULT - no extra flags (the value is 0), but explicit
- SLAB_ALLOC_NOLOCK - do not spin on locks (used by kmalloc_nolock())
- SLAB_ALLOC_NEW_SLAB - replacing existing 'bool new_slab' parameter
			for allocating obj_ext arrays
- SLAB_ALLOC_NO_RECURSE - replacing usage of __GFP_NO_OBJ_EXT

To reduce the amount of parameters in various internal functions, we
additionally introduce slab_alloc_context (also inspired by page
allocator's alloc_context) for passing a number of existing arguments
and the new alloc_flags:

/* Structure holding extra parameters for slab allocations */
struct slab_alloc_context {
	unsigned long caller_addr;
	size_t orig_size;
	unsigned int alloc_flags;
	struct list_lru *lru;
};

This also replaces the existing struct partial_context.

The last necessary piece is kmalloc_flags() which can take the
alloc_flags in addition to gfp flags and is intended for the recursive
allocations of sheaves and obj_ext arrays, so that both
SLAB_ALLOC_NOLOCK and SLAB_ALLOC_NO_RECURSE can be communicated.
Internally it decides between kmalloc_nolock() and normal kmalloc()
depending on SLAB_ALLOC_NOLOCK.

The rest of the series is gradually expanding the usage of both
alloc_flags and slab_alloc_context as necessary, with bits of
refactoring. Then, __GFP_NO_OBJ_EXT is removed completely.

Note that some usage of gfpflags_allow_spinning() relying on absence of
__GFP_RECLAIM remains outside of slab (and page allocator) in memcg,
page_owner and stackdepot code. These can thus yield false-positive
decisions that spinning is not allowed, but should not result in
important allocations failing anymore.

Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
Changes in v3:
- Applied R-b tags from Harry, Hao, Suren (thanks!)
- Former Patch 1 "mm/slab: do not limit zeroing to orig_size when only
  red zoning is enabled" fast tracked as a fix to slab-for-7.2 PR.
- Patch 1: refactor kasan_init handling (Harry).
- Constify struct slab_alloc_context usage eveywhere (Suren)
- Rename SLAB_ALLOC_TRYLOCK to SLAB_ALLOC_NOLOCK (Suren, Alexei)
- Reorder patches 5 and 6 (formerly 6 7) (Suren)
- Move trynode_flags refactoring from 7 to 6 to avoid bisection
  hazard.
- In Patch 14, support temporarily both __GFP_NO_OBJ_EXT and
  SLAB_ALLOC_NO_RECURSE to prevent obj_ext -> sheaves -> obj_ext
  recursion (Sashiko)
- Expand OBJCGS_CLEAR_MASK to allow kmalloc_nolock() warnings
  (Hao Li, Shengming Hu).
- Link to v2: https://patch.msgid.link/20260610-slab_alloc_flags-v2-0-7190909db118@kernel.org

Changes in v2:
- Due to Sashiko review, drop the idea of zeroing orig_size
  unconditionally, as it can break krealloc(). Thanks to that found a
  pre-existing bug fixed by the new Patch 1. The kfence zeroing related
  cleanup is implemented differently in Patch 2.
- Prevent nested kmalloc_nolock warnings due to added gfp flags
  (Sashiko)
- Fix a pre-existing issue with opportunistic slab allocation from the
  target node only effectively dropping __GFP_NOMEMALLOC and __GFP_RECLAIM.
  (Sashiko)
- Move kmalloc_flags() definitions to mm/slab.h (per Harry).
- Link to v1: https://patch.msgid.link/20260609-slab_alloc_flags-v1-0-2bf4a4b9b526@kernel.org

---
Vlastimil Babka (SUSE) (15):
      mm/slab: do not init any kfence objects on allocation
      mm/slab: stop inlining __slab_alloc_node()
      mm/slab: introduce slab_alloc_context
      mm/slab: introduce alloc_flags and SLAB_ALLOC_NOLOCK
      mm/slab: replace struct partial_context with slab_alloc_context
      mm/slab: add alloc_flags to slab_alloc_context
      mm/slab: pass alloc_flags to new slab allocation
      mm/slab: pass alloc_flags through slab_post_alloc_hook() chain
      mm/slab: replace slab_alloc_node() parameters with slab_alloc_context
      mm/slab: allow kmem_cache_alloc_bulk() with any gfp flags
      mm/slab: pass slab_alloc_context to __do_kmalloc_node()
      mm/slab: allow __GFP_NOMEMALLOC and __GFP_NOWARN for kmalloc_nolock()
      mm/slab: introduce kmalloc_flags()
      mm/slab: remove __GFP_NO_OBJ_EXT usage from alloc_slab_obj_exts()
      mm/slab: replace __GFP_NO_OBJ_EXT with SLAB_ALLOC_NO_RECURSE for sheaves

 include/linux/slab.h |   5 +-
 mm/kfence/core.c     |   2 +-
 mm/memcontrol.c      |   5 +-
 mm/slab.h            |  29 ++-
 mm/slub.c            | 488 +++++++++++++++++++++++++++++++--------------------
 5 files changed, 329 insertions(+), 200 deletions(-)
---
base-commit: dfdfd58cce1c3f5df8733b64595448996c08e424
change-id: 20260601-slab_alloc_flags-25c782b0c57c

Best regards,
--  
Vlastimil Babka (SUSE) <vbabka@kernel.org>


^ permalink raw reply

* [PATCH v3 01/15] mm/slab: do not init any kfence objects on allocation
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

When init (zeroing) on allocation is requested, for kmalloc() we
generally have to zero the full object size even if a smaller size is
requested, in order to provide krealloc()'s __GFP_ZERO guarantees.

When we end up allocating a kfence object, kfence performs the zeroing
on its own because it has its own redzone beyond the requested size.
Thus slab_post_alloc_hook() has an 'init' parameter which has to be
evaluated in all callers (via slab_want_init_on_alloc()) and should be
false for kfence allocations.

For kfence allocations in slab_alloc_node() this is achieved by subtly
skipping over the slab_want_init_on_alloc() call. Other callers (i.e.
kmem_cache_alloc_bulk_noprof()) however evaluate it unconditionally even
if they do end up with a kfence allocation. This is only subtly not a
problem, as those are not kmalloc allocations and thus the "requested
size" equals s->object_size and thus it cannot interfere with kfence's
redzone. There's just a unnecessary double zeroing (in both kfence and
slab_post_alloc_hook()), but it's all very fragile and contradicts the
comment in kfence_guarded_alloc().

Remove this subtlety and simplify the code by eliminating the init
parameter from slab_post_alloc_hook() and make it call
slab_want_init_on_alloc() itself. Instead add a is_kfence_address()
check before performing the memset, which will start doing the right
thing for all callers of slab_post_alloc_hook().

This potentially adds overhead of the is_kfence_address() check to
allocation hotpath, but that one is designed to be as small as possible,
and it's only evaluated if zeroing is about to happen. This means (aside
from init_on_alloc hardening) only for __GFP_ZERO allocations, and the
zeroing itself comes with an overhead likely larger than the added
check.

While at it, refactor the handling of evaluating when KASAN does the
init instead of SLUB, with no intended functional changes. A
non-functional change is that we don't pass kasan_init as true to
kasan_slab_alloc() if kasan has no integrated init, but then the value
is ignored anyway, so it's theoretically more correct.

Thanks to Harry Yoo for the initial refactoring attempt, and for updated
comments that are used here.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-2-7190909db118@kernel.org
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/kfence/core.c |  2 +-
 mm/slub.c        | 60 ++++++++++++++++++++++++++------------------------------
 2 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 655dc5ce3240..5e0b406924e9 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -500,7 +500,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g
 
 	/*
 	 * We check slab_want_init_on_alloc() ourselves, rather than letting
-	 * SL*B do the initialization, as otherwise we might overwrite KFENCE's
+	 * slab do the initialization, as otherwise it might overwrite KFENCE's
 	 * redzone.
 	 */
 	if (unlikely(slab_want_init_on_alloc(gfp, cache)))
diff --git a/mm/slub.c b/mm/slub.c
index e2ee8f1aaccf..d762cbe5d040 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4565,13 +4565,13 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
 
 static __fastpath_inline
 bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
-			  gfp_t flags, size_t size, void **p, bool init,
+			  gfp_t flags, size_t size, void **p,
 			  unsigned int orig_size)
 {
+	bool init = slab_want_init_on_alloc(flags, s);
 	unsigned int zero_size = s->object_size;
-	bool kasan_init = init;
-	size_t i;
 	gfp_t init_flags = flags & gfp_allowed_mask;
+	bool kasan_init = false;
 
 	/*
 	 * For kmalloc object, the allocated size (object_size) can be larger
@@ -4588,28 +4588,33 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 		zero_size = orig_size;
 
 	/*
-	 * When slab_debug is enabled, avoid memory initialization integrated
-	 * into KASAN and instead zero out the memory via the memset below with
-	 * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
-	 * cause false-positive reports. This does not lead to a performance
+	 * ARM64 can set memory tags and zero the memory using a single
+	 * instruction. Since HW_TAGS KASAN uses that while tagging the object,
+	 * separate zeroing is unnecessary.
+	 *
+	 * However, KASAN never zeroes memory when slab_debug is enabled to
+	 * avoid overwriting SLUB redzones. This does not lead to a performance
 	 * penalty on production builds, as slab_debug is not intended to be
 	 * enabled there.
 	 */
-	if (__slub_debug_enabled())
-		kasan_init = false;
+	if (kasan_has_integrated_init() && !__slub_debug_enabled()) {
+		kasan_init = init;
+		init = false;
+	}
 
-	/*
-	 * As memory initialization might be integrated into KASAN,
-	 * kasan_slab_alloc and initialization memset must be
-	 * kept together to avoid discrepancies in behavior.
-	 *
-	 * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
-	 */
-	for (i = 0; i < size; i++) {
+	for (size_t i = 0; i < size; i++) {
 		p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
-		if (p[i] && init && (!kasan_init ||
-				     !kasan_has_integrated_init()))
+
+		/*
+		 * memset and hooks come after KASAN as p[i] might get tagged
+		 *
+		 * kfence zeroes the object instead of SLUB to avoid overwriting
+		 * its own redzone starting at orig_size, which could happen
+		 * with SLUB zeroing full s->object_size
+		 */
+		if (init && p[i] && !is_kfence_address(p[i]))
 			memset(p[i], 0, zero_size);
+
 		if (gfpflags_allow_spinning(flags))
 			kmemleak_alloc_recursive(p[i], s->object_size, 1,
 						 s->flags, init_flags);
@@ -4910,7 +4915,6 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 		gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
 {
 	void *object;
-	bool init = false;
 
 	s = slab_pre_alloc_hook(s, gfpflags);
 	if (unlikely(!s))
@@ -4926,16 +4930,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 		object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
 
 	maybe_wipe_obj_freeptr(s, object);
-	init = slab_want_init_on_alloc(gfpflags, s);
 
 out:
 	/*
-	 * When init equals 'true', like for kzalloc() family, only
-	 * @orig_size bytes might be zeroed instead of s->object_size
 	 * In case this fails due to memcg_slab_post_alloc_hook(),
 	 * object is set to NULL
 	 */
-	slab_post_alloc_hook(s, lru, gfpflags, 1, &object, init, orig_size);
+	slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);
 
 	return object;
 }
@@ -5230,7 +5231,6 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
 				   struct slab_sheaf *sheaf)
 {
 	void *ret = NULL;
-	bool init;
 
 	if (sheaf->size == 0)
 		goto out;
@@ -5240,10 +5240,8 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
 	if (likely(!ret))
 		ret = sheaf->objects[--sheaf->size];
 
-	init = slab_want_init_on_alloc(gfp, s);
-
 	/* add __GFP_NOFAIL to force successful memcg charging */
-	slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size);
+	slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
 out:
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);
 
@@ -5423,8 +5421,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 
 success:
 	maybe_wipe_obj_freeptr(s, ret);
-	slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret,
-			     slab_want_init_on_alloc(alloc_gfp, s), orig_size);
+	slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);
 
 	ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
 	return ret;
@@ -7339,8 +7336,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
 
 out:
 	/* memcg and kmem_cache debug support and memory initialization */
-	return likely(slab_post_alloc_hook(s, NULL, flags, size, p,
-			slab_want_init_on_alloc(flags, s), s->object_size));
+	return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);
 

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 02/15] mm/slab: stop inlining __slab_alloc_node()
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

With sheaves, this is no longer part of the allocation fastpath.  For
the same reason, also mark the call to it from slab_alloc_node() as
unlikely().

Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-3-7190909db118@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d762cbe5d040..8845e15cb152 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4519,8 +4519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	return object;
 }
 
-static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
-		gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
+			       unsigned long addr, size_t orig_size)
 {
 	void *object;
 
@@ -4926,7 +4926,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 
 	object = alloc_from_pcs(s, gfpflags, node);
 
-	if (!object)
+	if (unlikely(!object))
 		object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
 
 	maybe_wipe_obj_freeptr(s, object);

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 03/15] mm/slab: introduce slab_alloc_context
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

Similarly to page allocator's struct alloc_context, introduce a helper
struct to hold a part of the allocation arguments. This will allow
reducing the number of parameters in many functions of the
implementation, and extend them easily if needed.

For now, make it hold the caller address and the originally requested
allocation size.

Convert alloc_single_from_new_slab(), __slab_alloc_node() and
___slab_alloc(). No functional change intended.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-4-7190909db118@kernel.org
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8845e15cb152..8a0c5553876e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -213,6 +213,12 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
 static DEFINE_STATIC_KEY_FALSE(strict_numa);
 #endif
 
+/* Structure holding extra parameters for slab allocations */
+struct slab_alloc_context {
+	unsigned long caller_addr;
+	size_t orig_size;
+};
+
 /* Structure holding parameters for get_from_partial() call chain */
 struct partial_context {
 	gfp_t flags;
@@ -3687,7 +3693,8 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
  * and put the slab to the partial (or full) list.
  */
 static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
-					int orig_size, bool allow_spin)
+					const struct slab_alloc_context *ac,
+					bool allow_spin)
 {
 	struct kmem_cache_node *n;
 	struct slab_obj_iter iter;
@@ -3705,7 +3712,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
 	/* alloc_debug_processing() always expects a valid freepointer */
 	set_freepointer(s, object, slab->freelist);
 
-	if (!alloc_debug_processing(s, slab, object, orig_size)) {
+	if (!alloc_debug_processing(s, slab, object, ac->orig_size)) {
 		/*
 		 * It's not really expected that this would fail on a
 		 * freshly allocated slab, but a concurrent memory
@@ -4443,7 +4450,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
  * slab.
  */
 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
-			   unsigned long addr, unsigned int orig_size)
+			   const struct slab_alloc_context *ac)
 {
 	bool allow_spin = gfpflags_allow_spinning(gfpflags);
 	void *object;
@@ -4476,7 +4483,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			pc.flags = GFP_NOWAIT | __GFP_THISNODE;
 	}
 
-	pc.orig_size = orig_size;
+	pc.orig_size = ac->orig_size;
 	object = get_from_partial(s, node, &pc);
 	if (object)
 		goto success;
@@ -4496,7 +4503,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	stat(s, ALLOC_SLAB);
 
 	if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
-		object = alloc_single_from_new_slab(s, slab, orig_size, allow_spin);
+		object = alloc_single_from_new_slab(s, slab, ac, allow_spin);
 
 		if (likely(object))
 			goto success;
@@ -4514,13 +4521,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 success:
 	if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
-		set_track(s, object, TRACK_ALLOC, addr, gfpflags);
+		set_track(s, object, TRACK_ALLOC, ac->caller_addr, gfpflags);
 
 	return object;
 }
 
 static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
-			       unsigned long addr, size_t orig_size)
+			       const struct slab_alloc_context *ac)
 {
 	void *object;
 
@@ -4545,7 +4552,7 @@ static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
 	}
 #endif
 
-	object = ___slab_alloc(s, gfpflags, node, addr, orig_size);
+	object = ___slab_alloc(s, gfpflags, node, ac);
 
 	return object;
 }
@@ -4926,8 +4933,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 
 	object = alloc_from_pcs(s, gfpflags, node);
 
-	if (unlikely(!object))
-		object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
+	if (unlikely(!object)) {
+		const struct slab_alloc_context ac = {
+			.caller_addr = addr,
+			.orig_size = orig_size,
+		};
+		object = __slab_alloc_node(s, gfpflags, node, &ac);
+	}
 
 	maybe_wipe_obj_freeptr(s, object);
 
@@ -5353,6 +5365,10 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 	struct kmem_cache *s;
 	bool can_retry = true;
 	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = orig_size,
+	};
 
 	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
 				      __GFP_NO_OBJ_EXT));
@@ -5398,7 +5414,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 	 * kfence_alloc. Hence call __slab_alloc_node() (at most twice)
 	 * and slab_post_alloc_hook() directly.
 	 */
-	ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, orig_size);
+	ret = __slab_alloc_node(s, alloc_gfp, node, &ac);
 
 	/*
 	 * It's possible we failed due to trylock as we preempted someone with
@@ -7240,10 +7256,13 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
 	int i;
 
 	if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
+		const struct slab_alloc_context ac = {
+			.caller_addr = _RET_IP_,
+			.orig_size = s->object_size,
+		};
 		for (i = 0; i < size; i++) {
 
-			p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, _RET_IP_,
-					     s->object_size);
+			p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, &ac);
 			if (unlikely(!p[i]))
 				goto error;
 

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 04/15] mm/slab: introduce alloc_flags and SLAB_ALLOC_NOLOCK
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

Similarly to the page allocators, introduce slab-allocator specific
alloc flags that internally control allocation behavior in addition to
gfp_flags, without occupying the limited gfp flags space.

Introduce the first flag SLAB_ALLOC_NOLOCK that behaves similarly to
page allocator's ALLOC_TRYLOCK and will be used to reimplement
kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
e.g. in early boot with a restricted gfp_allowed_mask.

Also introduce alloc_flags_allow_spinning() to replace the usage of
gfpflags_allow_spinning().

Start using alloc_flags and the new check first in alloc_from_pcs() and
__pcs_replace_empty_main(). This means some slab allocations that were
falsely treated as kmalloc_nolock() due to their gfp flags will now have
higher chances of success, and this will further increase with followup
changes.

Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
reach it from a slab allocation that's not _nolock() and yet lacks
__GFP_KSWAPD_RECLAIM for other reasons.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-5-7190909db118@kernel.org
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slab.h |  9 +++++++++
 mm/slub.c | 17 ++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 1bf9c3021ae3..f1246f0c9f74 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -16,6 +16,15 @@
  * Internal slab definitions
  */
 
+/* slab's alloc_flags definitions */
+#define SLAB_ALLOC_DEFAULT	0x00 /* no flags */
+#define SLAB_ALLOC_NOLOCK	0x01 /* a kmalloc_nolock() allocation */
+
+static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
+{
+	return !(alloc_flags & SLAB_ALLOC_NOLOCK);
+}
+
 #ifdef CONFIG_64BIT
 # ifdef system_has_cmpxchg128
 # define system_has_freelist_aba()	system_has_cmpxchg128()
diff --git a/mm/slub.c b/mm/slub.c
index 8a0c5553876e..c53592baa027 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4641,7 +4641,8 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
  * unlocked.
  */
 static struct slub_percpu_sheaves *
-__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs, gfp_t gfp)
+__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
+			 gfp_t gfp, unsigned int alloc_flags)
 {
 	struct slab_sheaf *empty = NULL;
 	struct slab_sheaf *full;
@@ -4667,7 +4668,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 		return NULL;
 	}
 
-	allow_spin = gfpflags_allow_spinning(gfp);
+	allow_spin = alloc_flags_allow_spinning(alloc_flags);
 
 	full = barn_replace_empty_sheaf(barn, pcs->main, allow_spin);
 
@@ -4753,7 +4754,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 }
 
 static __fastpath_inline
-void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node)
+void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, unsigned int alloc_flags, int node)
 {
 	struct slub_percpu_sheaves *pcs;
 	bool node_requested;
@@ -4798,7 +4799,7 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node)
 	pcs = this_cpu_ptr(s->cpu_sheaves);
 
 	if (unlikely(pcs->main->size == 0)) {
-		pcs = __pcs_replace_empty_main(s, pcs, gfp);
+		pcs = __pcs_replace_empty_main(s, pcs, gfp, alloc_flags);
 		if (unlikely(!pcs))
 			return NULL;
 	}
@@ -4931,7 +4932,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 	if (unlikely(object))
 		goto out;
 
-	object = alloc_from_pcs(s, gfpflags, node);
+	object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);
 
 	if (unlikely(!object)) {
 		const struct slab_alloc_context ac = {
@@ -5362,6 +5363,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 {
 	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
 	size_t orig_size = size;
+	unsigned int alloc_flags = SLAB_ALLOC_NOLOCK;
 	struct kmem_cache *s;
 	bool can_retry = true;
 	void *ret;
@@ -5404,7 +5406,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 		 */
 		return NULL;
 
-	ret = alloc_from_pcs(s, alloc_gfp, node);
+	ret = alloc_from_pcs(s, alloc_gfp, alloc_flags, node);
 	if (ret)
 		goto success;
 
@@ -7218,9 +7220,6 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
 	unsigned int refilled;
 	struct slab *slab;
 
-	if (WARN_ON_ONCE(!gfpflags_allow_spinning(gfp)))
-		return 0;
-
 	refilled = __refill_objects_node(s, p, gfp, min, max,
 					 get_node(s, local_node),
 					 /* allow_spin = */ true);

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 05/15] mm/slab: replace struct partial_context with slab_alloc_context
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

Refactor get_from_partial_node(), get_from_any_partial(),
get_from_partial() and ___slab_alloc().

Remove struct partial_context, which used to be more substantial but
shrank as part of the sheaves conversion. Instead pass gfp_flags and
pointer to the new slab_alloc_context, which together is a superset of
partial_context, and alloc_flags are about to be added to
slab_alloc_context as well.

No functional change intended.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-7-7190909db118@kernel.org
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 54 +++++++++++++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c53592baa027..6f6c15d796e1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -219,12 +219,6 @@ struct slab_alloc_context {
 	size_t orig_size;
 };
 
-/* Structure holding parameters for get_from_partial() call chain */
-struct partial_context {
-	gfp_t flags;
-	unsigned int orig_size;
-};
-
 /* Structure holding parameters for get_partial_node_bulk() */
 struct partial_bulk_context {
 	gfp_t flags;
@@ -3825,7 +3819,8 @@ static bool get_partial_node_bulk(struct kmem_cache *s,
  */
 static void *get_from_partial_node(struct kmem_cache *s,
 				   struct kmem_cache_node *n,
-				   struct partial_context *pc)
+				   gfp_t gfp_flags,
+				   const struct slab_alloc_context *ac)
 {
 	struct slab *slab, *slab2;
 	unsigned long flags;
@@ -3840,7 +3835,7 @@ static void *get_from_partial_node(struct kmem_cache *s,
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	if (gfpflags_allow_spinning(pc->flags))
+	if (gfpflags_allow_spinning(gfp_flags))
 		spin_lock_irqsave(&n->list_lock, flags);
 	else if (!spin_trylock_irqsave(&n->list_lock, flags))
 		return NULL;
@@ -3848,12 +3843,12 @@ static void *get_from_partial_node(struct kmem_cache *s,
 
 		struct freelist_counters old, new;
 
-		if (!pfmemalloc_match(slab, pc->flags))
+		if (!pfmemalloc_match(slab, gfp_flags))
 			continue;
 
 		if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
 			object = alloc_single_from_partial(s, n, slab,
-							pc->orig_size);
+							ac->orig_size);
 			if (object)
 				break;
 			continue;
@@ -3887,15 +3882,16 @@ static void *get_from_partial_node(struct kmem_cache *s,
 /*
  * Get an object from somewhere. Search in increasing NUMA distances.
  */
-static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *pc)
+static void *get_from_any_partial(struct kmem_cache *s, gfp_t gfp_flags,
+				  const struct slab_alloc_context *ac)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
 	struct zoneref *z;
 	struct zone *zone;
-	enum zone_type highest_zoneidx = gfp_zone(pc->flags);
+	enum zone_type highest_zoneidx = gfp_zone(gfp_flags);
 	unsigned int cpuset_mems_cookie;
-	bool allow_spin = gfpflags_allow_spinning(pc->flags);
+	bool allow_spin = gfpflags_allow_spinning(gfp_flags);
 
 	/*
 	 * The defrag ratio allows a configuration of the tradeoffs between
@@ -3929,16 +3925,17 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
 		if (allow_spin)
 			cpuset_mems_cookie = read_mems_allowed_begin();
 
-		zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
+		zonelist = node_zonelist(mempolicy_slab_node(), gfp_flags);
 		for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
 			struct kmem_cache_node *n;
 
 			n = get_node(s, zone_to_nid(zone));
 
-			if (n && cpuset_zone_allowed(zone, pc->flags) &&
+			if (n && cpuset_zone_allowed(zone, gfp_flags) &&
 					n->nr_partial > s->min_partial) {
 
-				void *object = get_from_partial_node(s, n, pc);
+				void *object = get_from_partial_node(s, n,
+								gfp_flags, ac);
 
 				if (object) {
 					/*
@@ -3960,8 +3957,8 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
 /*
  * Get an object from a partial slab
  */
-static void *get_from_partial(struct kmem_cache *s, int node,
-			      struct partial_context *pc)
+static void *get_from_partial(struct kmem_cache *s, int node, gfp_t flags,
+			      const struct slab_alloc_context *ac)
 {
 	int searchnode = node;
 	void *object;
@@ -3969,11 +3966,11 @@ static void *get_from_partial(struct kmem_cache *s, int node,
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
 
-	object = get_from_partial_node(s, get_node(s, searchnode), pc);
-	if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
+	object = get_from_partial_node(s, get_node(s, searchnode), flags, ac);
+	if (object || (node != NUMA_NO_NODE && (flags & __GFP_THISNODE)))
 		return object;
 
-	return get_from_any_partial(s, pc);
+	return get_from_any_partial(s, flags, ac);
 }
 
 static bool has_pcs_used(int cpu, struct kmem_cache *s)
@@ -4453,21 +4450,21 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			   const struct slab_alloc_context *ac)
 {
 	bool allow_spin = gfpflags_allow_spinning(gfpflags);
+	gfp_t trynode_flags;
 	void *object;
 	struct slab *slab;
-	struct partial_context pc;
 	bool try_thisnode = true;
 
 	stat(s, ALLOC_SLOWPATH);
 
 new_objects:
 
-	pc.flags = gfpflags;
+	trynode_flags = gfpflags;
 	/*
 	 * When a preferred node is indicated but no __GFP_THISNODE
 	 *
 	 * 1) try to get a partial slab from target node only by having
-	 *    __GFP_THISNODE in pc.flags for get_from_partial()
+	 *    __GFP_THISNODE in trynode_flags for get_from_partial()
 	 * 2) if 1) failed, try to allocate a new slab from target node with
 	 *    GPF_NOWAIT | __GFP_THISNODE opportunistically
 	 * 3) if 2) failed, retry with original gfpflags which will allow
@@ -4478,17 +4475,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		     && try_thisnode)) {
 		if (unlikely(!allow_spin))
 			/* Do not upgrade gfp to NOWAIT from more restrictive mode */
-			pc.flags = gfpflags | __GFP_THISNODE;
+			trynode_flags = gfpflags | __GFP_THISNODE;
 		else
-			pc.flags = GFP_NOWAIT | __GFP_THISNODE;
+			trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
 	}
 
-	pc.orig_size = ac->orig_size;
-	object = get_from_partial(s, node, &pc);
+	object = get_from_partial(s, node, trynode_flags, ac);
 	if (object)
 		goto success;
 
-	slab = new_slab(s, pc.flags, node);
+	slab = new_slab(s, trynode_flags, node);
 
 	if (unlikely(!slab)) {
 		if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 06/15] mm/slab: add alloc_flags to slab_alloc_context
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

Add alloc_flags as a new field to the slab_alloc_context helper struct,
so we can pass it to more functions in the slab implementation without
adding another function parameter.

Start checking them via alloc_flags_allow_spinning() in
alloc_single_from_new_slab() (where we can drop the allow_spin
parameter), ___slab_alloc(), get_from_partial_node() and
get_from_any_partial(). This further reduces false-positive
spinning-not-allowed from allocations that are not kmalloc_nolock() but
lack __GFP_RECLAIM flags.

_kmalloc_nolock_noprof() initializes ac.alloc_flags using its flags that
are SLAB_ALLOC_NOLOCK. slab_alloc_node() and __kmem_cache_alloc_bulk()
are not reachable from kmalloc_nolock() and all their callers expect
spinning to be allowed, so they can use SLAB_ALLOC_DEFAULT. This is
temporary as the scope of slab_alloc_context will further move to the
callers, making the alloc_flags usage more obvious.

Also change how trynode_flags are constructed in ___slab_alloc() to
achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
of checking allow_spin. We need to do that because we now determine
allow_spin from alloc_flags, and would otherwise start to upgrade e.g.
kmalloc() allocations without __GFP_KSWAPD_RECLAIM (that however do
allow spinning) to GFP_NOWAIT, thus including __GFP_KSWAPD_RECLAIM.

During the masking keep also existing __GFP_NOMEMALLOC (pointed out by
Sashiko) and __GFP_ACCOUNT. Previously the hardcoded GFP_NOWAIT would
eliminate them, but it's not a big problem that would need a separate
fix.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-6-7190909db118@kernel.org
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 6f6c15d796e1..3a34907b881b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -217,6 +217,7 @@ static DEFINE_STATIC_KEY_FALSE(strict_numa);
 struct slab_alloc_context {
 	unsigned long caller_addr;
 	size_t orig_size;
+	unsigned int alloc_flags;
 };
 
 /* Structure holding parameters for get_partial_node_bulk() */
@@ -3687,9 +3688,9 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
  * and put the slab to the partial (or full) list.
  */
 static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
-					const struct slab_alloc_context *ac,
-					bool allow_spin)
+					const struct slab_alloc_context *ac)
 {
+	bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
 	struct kmem_cache_node *n;
 	struct slab_obj_iter iter;
 	bool needs_add_partial;
@@ -3835,7 +3836,7 @@ static void *get_from_partial_node(struct kmem_cache *s,
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	if (gfpflags_allow_spinning(gfp_flags))
+	if (alloc_flags_allow_spinning(ac->alloc_flags))
 		spin_lock_irqsave(&n->list_lock, flags);
 	else if (!spin_trylock_irqsave(&n->list_lock, flags))
 		return NULL;
@@ -3891,7 +3892,7 @@ static void *get_from_any_partial(struct kmem_cache *s, gfp_t gfp_flags,
 	struct zone *zone;
 	enum zone_type highest_zoneidx = gfp_zone(gfp_flags);
 	unsigned int cpuset_mems_cookie;
-	bool allow_spin = gfpflags_allow_spinning(gfp_flags);
+	bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
 
 	/*
 	 * The defrag ratio allows a configuration of the tradeoffs between
@@ -4449,7 +4450,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			   const struct slab_alloc_context *ac)
 {
-	bool allow_spin = gfpflags_allow_spinning(gfpflags);
+	bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
 	gfp_t trynode_flags;
 	void *object;
 	struct slab *slab;
@@ -4466,18 +4467,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	 * 1) try to get a partial slab from target node only by having
 	 *    __GFP_THISNODE in trynode_flags for get_from_partial()
 	 * 2) if 1) failed, try to allocate a new slab from target node with
-	 *    GPF_NOWAIT | __GFP_THISNODE opportunistically
+	 *    (at most) GFP_NOWAIT | __GFP_THISNODE opportunistically
 	 * 3) if 2) failed, retry with original gfpflags which will allow
 	 *    get_from_partial() try partial lists of other nodes before
 	 *    potentially allocating new page from other nodes
 	 */
 	if (unlikely(node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
 		     && try_thisnode)) {
-		if (unlikely(!allow_spin))
-			/* Do not upgrade gfp to NOWAIT from more restrictive mode */
-			trynode_flags = gfpflags | __GFP_THISNODE;
-		else
-			trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
+		trynode_flags &= GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_ACCOUNT;
+		trynode_flags |= __GFP_NOWARN | __GFP_THISNODE;
 	}
 
 	object = get_from_partial(s, node, trynode_flags, ac);
@@ -4499,7 +4497,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	stat(s, ALLOC_SLAB);
 
 	if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
-		object = alloc_single_from_new_slab(s, slab, ac, allow_spin);
+		object = alloc_single_from_new_slab(s, slab, ac);
 
 		if (likely(object))
 			goto success;
@@ -4918,6 +4916,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
 static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
 		gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
 {
+	const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
 	void *object;
 
 	s = slab_pre_alloc_hook(s, gfpflags);
@@ -4928,12 +4927,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 	if (unlikely(object))
 		goto out;
 
-	object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);
+	object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
 
 	if (unlikely(!object)) {
 		const struct slab_alloc_context ac = {
 			.caller_addr = addr,
 			.orig_size = orig_size,
+			.alloc_flags = alloc_flags,
 		};
 		object = __slab_alloc_node(s, gfpflags, node, &ac);
 	}
@@ -5366,6 +5366,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 	const struct slab_alloc_context ac = {
 		.caller_addr = _RET_IP_,
 		.orig_size = orig_size,
+		.alloc_flags = alloc_flags,
 	};
 
 	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
@@ -7254,6 +7255,7 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
 		const struct slab_alloc_context ac = {
 			.caller_addr = _RET_IP_,
 			.orig_size = s->object_size,
+			.alloc_flags = SLAB_ALLOC_DEFAULT,
 		};
 		for (i = 0; i < size; i++) {
 

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 07/15] mm/slab: pass alloc_flags to new slab allocation
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

Add the alloc_flags parameter to allocate_slab() and new_slab()
so it can be used to determine if spinning is allowed, independently
from gfp flags.

refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
reached from contexts that allow spinning.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-8-7190909db118@kernel.org
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 3a34907b881b..a975a2e727c8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3378,9 +3378,10 @@ static __always_inline void unaccount_slab(struct slab *slab, int order,
 }
 
 /* Allocate and initialize a slab without building its freelist. */
-static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
+				  unsigned int alloc_flags, int node)
 {
-	bool allow_spin = gfpflags_allow_spinning(flags);
+	bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
 	struct slab *slab;
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
@@ -3398,10 +3399,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
 		alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM;
 
-	/*
-	 * __GFP_RECLAIM could be cleared on the first allocation attempt,
-	 * so pass allow_spin flag directly.
-	 */
 	slab = alloc_slab_page(alloc_gfp, node, oo, allow_spin);
 	if (unlikely(!slab)) {
 		oo = s->min;
@@ -3438,15 +3435,17 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	return slab;
 }
 
-static struct slab *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *new_slab(struct kmem_cache *s, gfp_t flags,
+			     unsigned int alloc_flags, int node)
 {
 	if (unlikely(flags & GFP_SLAB_BUG_MASK))
 		flags = kmalloc_fix_flags(flags);
 
 	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
 
-	return allocate_slab(s,
-		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
+	flags &= GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK;
+
+	return allocate_slab(s, flags, alloc_flags, node);
 }
 
 static void __free_slab(struct kmem_cache *s, struct slab *slab, bool allow_spin)
@@ -4482,7 +4481,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	if (object)
 		goto success;
 
-	slab = new_slab(s, trynode_flags, node);
+	slab = new_slab(s, trynode_flags, ac->alloc_flags, node);
 
 	if (unlikely(!slab)) {
 		if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
@@ -7230,7 +7229,7 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
 
 new_slab:
 
-	slab = new_slab(s, gfp, local_node);
+	slab = new_slab(s, gfp, SLAB_ALLOC_DEFAULT, local_node);
 	if (!slab)
 		goto out;
 
@@ -7578,7 +7577,7 @@ static void early_kmem_cache_node_alloc(int node)
 
 	BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));
 
-	slab = new_slab(kmem_cache_node, GFP_NOWAIT, node);
+	slab = new_slab(kmem_cache_node, GFP_NOWAIT, SLAB_ALLOC_DEFAULT, node);
 
 	BUG_ON(!slab);
 	if (slab_nid(slab) != node) {

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 08/15] mm/slab: pass alloc_flags through slab_post_alloc_hook() chain
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

Convert the whole following call stack to pass either slab_alloc_context
(thus including alloc_flags) or just alloc_flags as necessary:

slab_post_alloc_hook()
  alloc_tagging_slab_alloc_hook()
    __alloc_tagging_slab_alloc_hook()
      prepare_slab_obj_exts_hook()
        alloc_slab_obj_exts()
  memcg_slab_post_alloc_hook()
    __memcg_slab_post_alloc_hook()
      alloc_slab_obj_exts()

Converting all these at once avoids unnecessary churn and is mostly
mechanical.

This ultimately allows to decide if spinning is allowed using
alloc_flags in alloc_slab_obj_exts(), as well as slab_post_alloc_hook().
Aside from alloc_from_pcs_bulk() (to be handled next) there is nothing
else in slab itself relying on gfpflags_allow_spinning() which can
be false even if not called from kmalloc_nolock().

A followup change will also use the alloc_flags availability in the call
stack above to remove the __GFP_NO_OBJ_EXT flag.

For alloc_slab_obj_exts(), also replace the suboptimal "bool new_slab"
parameter with a SLAB_ALLOC_NEW_SLAB flag with identical functionality.

To further reduce the number of parameters of slab_post_alloc_hook(),
also make 'struct list_lru *lru' (which is NULL for most callers) a new
field of slab_alloc_context.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-9-7190909db118@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/memcontrol.c |  5 +--
 mm/slab.h       |  6 ++--
 mm/slub.c       | 94 +++++++++++++++++++++++++++++++++------------------------
 3 files changed, 62 insertions(+), 43 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c03d4787d466..29390ba13baa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3424,7 +3424,8 @@ static inline size_t obj_full_size(struct kmem_cache *s)
 }
 
 bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
-				  gfp_t flags, size_t size, void **p)
+				  gfp_t flags, unsigned int slab_alloc_flags,
+				  size_t size, void **p)
 {
 	size_t obj_size = obj_full_size(s);
 	struct obj_cgroup *objcg;
@@ -3472,7 +3473,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 		slab = virt_to_slab(p[i]);
 
 		if (!slab_obj_exts(slab) &&
-		    alloc_slab_obj_exts(slab, s, flags, false)) {
+		    alloc_slab_obj_exts(slab, s, flags, slab_alloc_flags)) {
 			continue;
 		}
 
diff --git a/mm/slab.h b/mm/slab.h
index f1246f0c9f74..d86203131f58 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -19,6 +19,7 @@
 /* slab's alloc_flags definitions */
 #define SLAB_ALLOC_DEFAULT	0x00 /* no flags */
 #define SLAB_ALLOC_NOLOCK	0x01 /* a kmalloc_nolock() allocation */
+#define SLAB_ALLOC_NEW_SLAB	0x02 /* a flag for alloc_slab_obj_exts() */
 
 static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
 {
@@ -612,7 +613,7 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
-                        gfp_t gfp, bool new_slab);
+			gfp_t gfp, unsigned int alloc_flags);
 
 #else /* CONFIG_SLAB_OBJ_EXT */
 
@@ -642,7 +643,8 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
 
 #ifdef CONFIG_MEMCG
 bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
-				  gfp_t flags, size_t size, void **p);
+				  gfp_t flags, unsigned int slab_alloc_flags,
+				  size_t size, void **p);
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
 			    void **p, int objects, unsigned long obj_exts);
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index a975a2e727c8..465eb4db5770 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -218,6 +218,7 @@ struct slab_alloc_context {
 	unsigned long caller_addr;
 	size_t orig_size;
 	unsigned int alloc_flags;
+	struct list_lru *lru;
 };
 
 /* Structure holding parameters for get_partial_node_bulk() */
@@ -2155,9 +2156,9 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
-		        gfp_t gfp, bool new_slab)
+			gfp_t gfp, unsigned int alloc_flags)
 {
-	bool allow_spin = gfpflags_allow_spinning(gfp);
+	const bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
 	unsigned int objects = objs_per_slab(s, slab);
 	unsigned long new_exts;
 	unsigned long old_exts;
@@ -2206,7 +2207,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 	old_exts = READ_ONCE(slab->obj_exts);
 	handle_failed_objexts_alloc(old_exts, vec, objects);
 
-	if (new_slab) {
+	if (alloc_flags & SLAB_ALLOC_NEW_SLAB) {
 		/*
 		 * If the slab is brand new and nobody can yet access its
 		 * obj_exts, no synchronization is required and obj_exts can
@@ -2331,7 +2332,7 @@ static inline void init_slab_obj_exts(struct slab *slab)
 }
 
 static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
-			       gfp_t gfp, bool new_slab)
+			       gfp_t gfp, unsigned int alloc_flags)
 {
 	return 0;
 }
@@ -2351,10 +2352,10 @@ static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
 
 static inline unsigned long
 prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
-			   gfp_t flags, void *p)
+			   gfp_t flags, unsigned int alloc_flags, void *p)
 {
 	if (!slab_obj_exts(slab) &&
-	    alloc_slab_obj_exts(slab, s, flags, false)) {
+	    alloc_slab_obj_exts(slab, s, flags, alloc_flags)) {
 		pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
 			     __func__, s->name);
 		return 0;
@@ -2366,7 +2367,8 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
 
 /* Should be called only if mem_alloc_profiling_enabled() */
 static noinline void
-__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+				unsigned int alloc_flags)
 {
 	unsigned long obj_exts;
 	struct slabobj_ext *obj_ext;
@@ -2382,7 +2384,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 		return;
 
 	slab = virt_to_slab(object);
-	obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, object);
+	obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, alloc_flags, object);
 	/*
 	 * Currently obj_exts is used only for allocation profiling.
 	 * If other users appear then mem_alloc_profiling_enabled()
@@ -2401,10 +2403,11 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 }
 
 static inline void
-alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+			      unsigned int alloc_flags)
 {
 	if (mem_alloc_profiling_enabled())
-		__alloc_tagging_slab_alloc_hook(s, object, flags);
+		__alloc_tagging_slab_alloc_hook(s, object, flags, alloc_flags);
 }
 
 /* Should be called only if mem_alloc_profiling_enabled() */
@@ -2443,7 +2446,8 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
 static inline void
-alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+			      unsigned int alloc_flags)
 {
 }
 
@@ -2461,8 +2465,9 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 static void memcg_alloc_abort_single(struct kmem_cache *s, void *object);
 
 static __fastpath_inline
-bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
-				gfp_t flags, size_t size, void **p)
+bool memcg_slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
+				size_t size, void **p,
+				const struct slab_alloc_context *ac)
 {
 	if (likely(!memcg_kmem_online()))
 		return true;
@@ -2470,7 +2475,8 @@ bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 	if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)))
 		return true;
 
-	if (likely(__memcg_slab_post_alloc_hook(s, lru, flags, size, p)))
+	if (likely(__memcg_slab_post_alloc_hook(s, ac->lru, flags,
+						ac->alloc_flags, size, p)))
 		return true;
 
 	if (likely(size == 1)) {
@@ -2558,14 +2564,15 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
 		put_slab_obj_exts(obj_exts);
 	}
 
-	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
+	return __memcg_slab_post_alloc_hook(s, NULL, flags, SLAB_ALLOC_DEFAULT,
+					    1, &p);
 }
 
 #else /* CONFIG_MEMCG */
 static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
-					      struct list_lru *lru,
-					      gfp_t flags, size_t size,
-					      void **p)
+					      gfp_t flags,
+					      size_t size, void **p,
+					      const struct slab_alloc_context *ac)
 {
 	return true;
 }
@@ -3352,12 +3359,14 @@ static inline void init_freelist_randomization(void) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
 static __always_inline void account_slab(struct slab *slab, int order,
-					 struct kmem_cache *s, gfp_t gfp)
+					 struct kmem_cache *s, gfp_t gfp,
+					 unsigned int alloc_flags)
 {
 	if (memcg_kmem_online() &&
 			(s->flags & SLAB_ACCOUNT) &&
 			!slab_obj_exts(slab))
-		alloc_slab_obj_exts(slab, s, gfp, true);
+		alloc_slab_obj_exts(slab, s, gfp,
+				    alloc_flags | SLAB_ALLOC_NEW_SLAB);
 
 	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
 			    PAGE_SIZE << order);
@@ -3430,7 +3439,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
 	 * to prevent the array from being overwritten.
 	 */
 	alloc_slab_obj_exts_early(s, slab);
-	account_slab(slab, oo_order(oo), s, flags);
+	account_slab(slab, oo_order(oo), s, flags, alloc_flags);
 
 	return slab;
 }
@@ -4564,9 +4573,8 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
 }
 
 static __fastpath_inline
-bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
-			  gfp_t flags, size_t size, void **p,
-			  unsigned int orig_size)
+bool slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, size_t size,
+			  void **p, const struct slab_alloc_context *ac)
 {
 	bool init = slab_want_init_on_alloc(flags, s);
 	unsigned int zero_size = s->object_size;
@@ -4585,7 +4593,7 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 	 * orig_size if we track it.
 	 */
 	if (slub_debug_orig_size(s))
-		zero_size = orig_size;
+		zero_size = ac->orig_size;
 
 	/*
 	 * ARM64 can set memory tags and zero the memory using a single
@@ -4615,14 +4623,14 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 		if (init && p[i] && !is_kfence_address(p[i]))
 			memset(p[i], 0, zero_size);
 
-		if (gfpflags_allow_spinning(flags))
+		if (alloc_flags_allow_spinning(ac->alloc_flags))
 			kmemleak_alloc_recursive(p[i], s->object_size, 1,
 						 s->flags, init_flags);
 		kmsan_slab_alloc(s, p[i], init_flags);
-		alloc_tagging_slab_alloc_hook(s, p[i], flags);
+		alloc_tagging_slab_alloc_hook(s, p[i], flags, ac->alloc_flags);
 	}
 
-	return memcg_slab_post_alloc_hook(s, lru, flags, size, p);
+	return memcg_slab_post_alloc_hook(s, flags, size, p, ac);
 }
 
 /*
@@ -4917,6 +4925,12 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 {
 	const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
 	void *object;
+	const struct slab_alloc_context ac = {
+		.caller_addr = addr,
+		.orig_size = orig_size,
+		.alloc_flags = alloc_flags,
+		.lru = lru,
+	};
 
 	s = slab_pre_alloc_hook(s, gfpflags);
 	if (unlikely(!s))
@@ -4928,14 +4942,8 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 
 	object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
 
-	if (unlikely(!object)) {
-		const struct slab_alloc_context ac = {
-			.caller_addr = addr,
-			.orig_size = orig_size,
-			.alloc_flags = alloc_flags,
-		};
+	if (unlikely(!object))
 		object = __slab_alloc_node(s, gfpflags, node, &ac);
-	}
 
 	maybe_wipe_obj_freeptr(s, object);
 
@@ -4944,7 +4952,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 	 * In case this fails due to memcg_slab_post_alloc_hook(),
 	 * object is set to NULL
 	 */
-	slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);
+	slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);
 
 	return object;
 }
@@ -5239,6 +5247,10 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
 				   struct slab_sheaf *sheaf)
 {
 	void *ret = NULL;
+	const struct slab_alloc_context ac = {
+		.orig_size = s->object_size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
 
 	if (sheaf->size == 0)
 		goto out;
@@ -5249,7 +5261,7 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
 		ret = sheaf->objects[--sheaf->size];
 
 	/* add __GFP_NOFAIL to force successful memcg charging */
-	slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
+	slab_post_alloc_hook(s, gfp | __GFP_NOFAIL, 1, &ret, &ac);
 out:
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);
 
@@ -5435,7 +5447,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 
 success:
 	maybe_wipe_obj_freeptr(s, ret);
-	slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);
+	slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);
 
 	ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
 	return ret;
@@ -7301,6 +7313,10 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
 {
 	unsigned int i = 0;
 	void *kfence_obj;
+	const struct slab_alloc_context ac = {
+		.orig_size = s->object_size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
 
 	if (!size)
 		return false;
@@ -7351,7 +7367,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
 
 out:
 	/* memcg and kmem_cache debug support and memory initialization */
-	return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
+	return likely(slab_post_alloc_hook(s, flags, size, p, &ac));
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);
 

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 09/15] mm/slab: replace slab_alloc_node() parameters with slab_alloc_context
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

The function takes all the parameters that exist as fields in
slab_alloc_context, except alloc_flags. Replace them with a single
pointer.

This moves slab_alloc_context initialization to a number of callers,
which is more verbose, but arguably also more clear than a long list of
parameters, and most do not use the 'lru' field.

This will also allow kmalloc_nolock() to call slab_alloc_node() and
reduce the special open-coding it currently has.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-10-7190909db118@kernel.org
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 75 ++++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 465eb4db5770..562495b80d74 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4920,30 +4920,23 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
  *
  * Otherwise we can simply pick the next object from the lockless free list.
  */
-static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
-		gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s,
+		gfp_t gfpflags, int node, const struct slab_alloc_context *ac)
 {
-	const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
 	void *object;
-	const struct slab_alloc_context ac = {
-		.caller_addr = addr,
-		.orig_size = orig_size,
-		.alloc_flags = alloc_flags,
-		.lru = lru,
-	};
 
 	s = slab_pre_alloc_hook(s, gfpflags);
 	if (unlikely(!s))
 		return NULL;
 
-	object = kfence_alloc(s, orig_size, gfpflags);
+	object = kfence_alloc(s, ac->orig_size, gfpflags);
 	if (unlikely(object))
 		goto out;
 
-	object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
+	object = alloc_from_pcs(s, gfpflags, ac->alloc_flags, node);
 
 	if (unlikely(!object))
-		object = __slab_alloc_node(s, gfpflags, node, &ac);
+		object = __slab_alloc_node(s, gfpflags, node, ac);
 
 	maybe_wipe_obj_freeptr(s, object);
 
@@ -4952,15 +4945,21 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 	 * In case this fails due to memcg_slab_post_alloc_hook(),
 	 * object is set to NULL
 	 */
-	slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);
+	slab_post_alloc_hook(s, gfpflags, 1, &object, ac);
 
 	return object;
 }
 
 void *kmem_cache_alloc_noprof(struct kmem_cache *s, gfp_t gfpflags)
 {
-	void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
-				    s->object_size);
+	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = s->object_size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
+
+	ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
 
@@ -4971,8 +4970,15 @@ EXPORT_SYMBOL(kmem_cache_alloc_noprof);
 void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
 			   gfp_t gfpflags)
 {
-	void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
-				    s->object_size);
+	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = s->object_size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+		.lru = lru,
+	};
+
+	ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
 
@@ -5004,7 +5010,14 @@ EXPORT_SYMBOL(kmem_cache_charge);
  */
 void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
-	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
+	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = s->object_size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
+
+	ret = slab_alloc_node(s, gfpflags, node, &ac);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
 
@@ -5334,6 +5347,11 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 {
 	struct kmem_cache *s;
 	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = caller,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = __kmalloc_large_node_noprof(size, flags, node);
@@ -5347,7 +5365,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 
 	s = kmalloc_slab(size, b, flags, token);
 
-	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
+	ret = slab_alloc_node(s, flags, node, &ac);
 	ret = kasan_kmalloc(s, ret, size, flags);
 	trace_kmalloc(caller, ret, size, s->size, flags, node);
 	return ret;
@@ -5465,8 +5483,14 @@ EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
 
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
-	void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
-					    _RET_IP_, size);
+	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
+
+	ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);
 
 	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
 
@@ -5478,7 +5502,14 @@ EXPORT_SYMBOL(__kmalloc_cache_noprof);
 void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags,
 				  int node, size_t size)
 {
-	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
+	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
+
+	ret = slab_alloc_node(s, gfpflags, node, &ac);
 
 	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
 

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 10/15] mm/slab: allow kmem_cache_alloc_bulk() with any gfp flags
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

The last user of gfpflags_allow_spinning() in slab is
alloc_from_pcs_bulk(), which is only called from
kmem_cache_alloc_bulk().

It turns out that gfpflags_allow_spinning() is not necessary, because
kmem_cache_alloc_bulk() is only expected to be called from context that
does allow spinning, so simply replace it with 'true'. This means we can
also drop the gfp parameter from alloc_from_pcs_bulk().

With that, we can remove the "@flags must allow spinning" part of the
kernel doc, as there is no more connection to the gfp flags in the slab
implementation.

Also remove a comment in alloc_slab_obj_exts() because there should be
no more false positives possible due to gfp_allowed_mask during early
boot.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-11-7190909db118@kernel.org
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 562495b80d74..81938774098b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2171,12 +2171,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 
 	sz = obj_exts_alloc_size(s, slab, gfp);
 
-	/*
-	 * Note that allow_spin may be false during early boot and its
-	 * restricted GFP_BOOT_MASK. Due to kmalloc_nolock() only supporting
-	 * architectures with cmpxchg16b, early obj_exts will be missing for
-	 * very early allocations on those.
-	 */
 	if (unlikely(!allow_spin))
 		vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
 				     slab_nid(slab));
@@ -4830,8 +4824,7 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, unsigned int alloc_flags,
 }
 
 static __fastpath_inline
-unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
-				 void **p)
+unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 {
 	struct slub_percpu_sheaves *pcs;
 	struct slab_sheaf *main;
@@ -4866,7 +4859,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
 		}
 
 		full = barn_replace_empty_sheaf(barn, pcs->main,
-						gfpflags_allow_spinning(gfp));
+						/* allow_spin = */ true);
 
 		if (full) {
 			stat(s, BARN_GET);
@@ -7331,8 +7324,7 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
  * Allocate @size objects from @s and places them into @p.  @size must be larger
  * than 0.
  *
- * Interrupts must be enabled when calling this function and @flags must allow
- * spinning.
+ * Interrupts must be enabled when calling this function.
  *
  * Unlike alloc_pages_bulk(), this function does not check for already allocated
  * objects in @p, and thus the caller does not need to zero it.
@@ -7370,7 +7362,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
 		size--;
 	}
 
-	i = alloc_from_pcs_bulk(s, flags, size, p);
+	i = alloc_from_pcs_bulk(s, size, p);
 	if (i < size) {
 		/*
 		 * If we ran out of memory, don't bother with freeing back to

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 11/15] mm/slab: pass slab_alloc_context to __do_kmalloc_node()
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
alloc flag that prevents kmalloc recursion. For that we need a version
of kmalloc() that takes alloc_flags and use it in places that perform
these potentially recursive kmalloc allocations (of sheaves or obj_ext
arrays).

As a preparatory step, make __do_kmalloc_node() take a pointer to
slab_alloc_context. This replaces the 'size' and 'caller' parameters and
includes alloc_flags which we'll make use of.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-12-7190909db118@kernel.org
Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 54 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 81938774098b..537ea68f417b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5335,20 +5335,16 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
 EXPORT_SYMBOL(__kmalloc_large_node_noprof);
 
 static __always_inline
-void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
-			unsigned long caller, kmalloc_token_t token)
+void *__do_kmalloc_node(kmem_buckets *b, gfp_t flags, int node,
+			kmalloc_token_t token, const struct slab_alloc_context *ac)
 {
+	const size_t size = ac->orig_size;
 	struct kmem_cache *s;
 	void *ret;
-	const struct slab_alloc_context ac = {
-		.caller_addr = caller,
-		.orig_size = size,
-		.alloc_flags = SLAB_ALLOC_DEFAULT,
-	};
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = __kmalloc_large_node_noprof(size, flags, node);
-		trace_kmalloc(caller, ret, size,
+		trace_kmalloc(ac->caller_addr, ret, size,
 			      PAGE_SIZE << get_order(size), flags, node);
 		return ret;
 	}
@@ -5358,22 +5354,34 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 
 	s = kmalloc_slab(size, b, flags, token);
 
-	ret = slab_alloc_node(s, flags, node, &ac);
+	ret = slab_alloc_node(s, flags, node, ac);
 	ret = kasan_kmalloc(s, ret, size, flags);
-	trace_kmalloc(caller, ret, size, s->size, flags, node);
+	trace_kmalloc(ac->caller_addr, ret, size, s->size, flags, node);
 	return ret;
 }
 void *__kmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags, int node)
 {
-	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
-				 _RET_IP_, PASS_TOKEN_PARAM(token));
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
+
+	return __do_kmalloc_node(PASS_BUCKET_PARAM(b), flags, node,
+				 PASS_TOKEN_PARAM(token), &ac);
 }
 EXPORT_SYMBOL(__kmalloc_node_noprof);
 
 void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
 {
-	return __do_kmalloc_node(size, NULL, flags,  NUMA_NO_NODE, _RET_IP_,
-				 PASS_TOKEN_PARAM(token));
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
+
+	return __do_kmalloc_node(NULL, flags,  NUMA_NO_NODE,
+				 PASS_TOKEN_PARAM(token), &ac);
 }
 EXPORT_SYMBOL(__kmalloc_noprof);
 
@@ -5468,9 +5476,14 @@ EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);
 void *__kmalloc_node_track_caller_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags,
 					 int node, unsigned long caller)
 {
-	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
-				 caller, PASS_TOKEN_PARAM(token));
+	const struct slab_alloc_context ac = {
+		.caller_addr = caller,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
 
+	return __do_kmalloc_node(PASS_BUCKET_PARAM(b), flags, node,
+				 PASS_TOKEN_PARAM(token), &ac);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
 
@@ -6871,14 +6884,19 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
 {
 	bool allow_block;
 	void *ret;
+	const struct slab_alloc_context ac = {
+		.caller_addr = _RET_IP_,
+		.orig_size = size,
+		.alloc_flags = SLAB_ALLOC_DEFAULT,
+	};
 
 	/*
 	 * It doesn't really make sense to fallback to vmalloc for sub page
 	 * requests
 	 */
-	ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
+	ret = __do_kmalloc_node(PASS_BUCKET_PARAM(b),
 				kmalloc_gfp_adjust(flags, size),
-				node, _RET_IP_, PASS_TOKEN_PARAM(token));
+				node, PASS_TOKEN_PARAM(token), &ac);
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 

-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 12/15] mm/slab: allow __GFP_NOMEMALLOC and __GFP_NOWARN for kmalloc_nolock()
From: Vlastimil Babka (SUSE) @ 2026-06-15 11:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel,
	cgroups, Vlastimil Babka (SUSE)
In-Reply-To: <20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org>

The two flags are added internally so there's no point for warning if
they are passed by the caller as well, so allow them. This will allow
simplifying obj_ext allocation under kmalloc_nolock().

Also it's not necessary to have the extra alloc_gfp variable for adding
the two flags. The original gfp_flags parameter is not used anywhere
except for the warning. So remove alloc_gfp and directly modify and use
gfp_flags everywhere.

Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-13-7190909db118@kernel.org
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 include/linux/slab.h |  3 ++-
 mm/slub.c            | 19 ++++++++++---------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ce1c867dc0ba..b955f3cbb732 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1040,7 +1040,8 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
  * kmalloc_nolock - Allocate an object of given size from any context.
  * @size: size to allocate
  * @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO, __GFP_NO_OBJ_EXT
- * allowed.
+ * allowed. Also __GFP_NOWARN and __GFP_NOMEMALLOC are allowed but added
+ * internally thus not necessary.
  * @node: node number of the target node.
  *
  * Return: pointer to the new object or NULL in case of error.
diff --git a/mm/slub.c b/mm/slub.c
index 537ea68f417b..8769083bec81 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5387,7 +5387,6 @@ EXPORT_SYMBOL(__kmalloc_noprof);
 
 void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, int node)
 {
-	gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
 	size_t orig_size = size;
 	unsigned int alloc_flags = SLAB_ALLOC_NOLOCK;
 	struct kmem_cache *s;
@@ -5400,7 +5399,9 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 	};
 
 	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
-				      __GFP_NO_OBJ_EXT));
+			__GFP_NO_OBJ_EXT | __GFP_NOWARN | __GFP_NOMEMALLOC));
+
+	gfp_flags |= __GFP_NOWARN | __GFP_NOMEMALLOC;
 
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
@@ -5419,7 +5420,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 retry:
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
 		return NULL;
-	s = kmalloc_slab(size, NULL, alloc_gfp, PASS_TOKEN_PARAM(token));
+	s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token));
 
 	if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
 		/*
@@ -5433,7 +5434,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 		 */
 		return NULL;
 
-	ret = alloc_from_pcs(s, alloc_gfp, alloc_flags, node);
+	ret = alloc_from_pcs(s, gfp_flags, alloc_flags, node);
 	if (ret)
 		goto success;
 
@@ -5443,7 +5444,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 	 * kfence_alloc. Hence call __slab_alloc_node() (at most twice)
 	 * and slab_post_alloc_hook() directly.
 	 */
-	ret = __slab_alloc_node(s, alloc_gfp, node, &ac);
+	ret = __slab_alloc_node(s, gfp_flags, node, &ac);
 
 	/*
 	 * It's possible we failed due to trylock as we preempted someone with
@@ -5456,8 +5457,8 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 		size = s->object_size + 1;
 		/*
 		 * Another alternative is to
-		 * if (memcg) alloc_gfp &= ~__GFP_ACCOUNT;
-		 * else if (!memcg) alloc_gfp |= __GFP_ACCOUNT;
+		 * if (memcg) gfp_flags &= ~__GFP_ACCOUNT;
+		 * else if (!memcg) gfp_flags |= __GFP_ACCOUNT;
 		 * to retry from bucket of the same size.
 		 */
 		can_retry = false;
@@ -5466,9 +5467,9 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
 
 success:
 	maybe_wipe_obj_freeptr(s, ret);
-	slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);
+	slab_post_alloc_hook(s, gfp_flags, 1, &ret, &ac);
 
-	ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
+	ret = kasan_kmalloc(s, ret, orig_size, gfp_flags);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);

-- 
2.54.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox