From: Harry Yoo <harry@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Usama Arif <usama.arif@linux.dev>,
Meta kernel team <kernel-team@meta.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Danielle Costantino <dcostantino@meta.com>,
Kees Cook <kees@kernel.org>
Subject: Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
Date: Mon, 29 Jun 2026 12:57:47 +0900 [thread overview]
Message-ID: <eac2ed38-432a-4d95-8968-2663540d73d7@kernel.org> (raw)
In-Reply-To: <CAJuCfpFanDcpqq5B7eNZbw2HnWOXD-xzcnrdbz-e2gnSM2EYHA@mail.gmail.com>
[ Adding Kees Cook for SLAB_BUCKETS conversation ]
The thread:
https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@linux.dev/
On 6/29/26 8:37 AM, Suren Baghdasaryan wrote:
> On Sun, Jun 28, 2026 at 2:22 AM Harry Yoo <harry@kernel.org> wrote:
>> On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
>>> On 6/28/26 5:23 AM, Shakeel Butt wrote:
>>>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>>>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>>>>> [...]
>>>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>>>>> otherwise come from the same or a smaller cache. Every reference edge
>>>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>>>>
>>>>>>>> This will fix the problem.
>>>>>>>>
>>>>>>>> But this will waste memory as we need smaller obj_exts array
>>>>>>>> as the size gets larger.
>>>>>>>>
>>>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>>>>> (needed only when memory profiling is enabled, though)
>>>>>>>>
>>>>>>>> That would also prevent recursion even further.
>>>>>>>
>>>>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>>>>> correctly that you don't have any concerns with this approach?
>>>>>>
>>>>>> Umm, the memory waste is a concern?
>>>>>>
>>>>>> Minimally I'd now want to only do that size bumping when allocation
>>>>>> profiling is enabled. Ideally that means both configured in and not booted
>>>>>> with "never".
>>>>>>
>>>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>>>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>>>>> the strictly larger thing.
>>>>>
>>>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>>>>> charged objects.
>>>>
>>>> I am wrong here as I went back and see d8df600b67d7.
>>
>> I was confused too :)
>>
>>> (8dafa9f5900c upstream)
>>>
>>>>>
>>>>> So here we can do dedicated caches as Harry suggested or make this size bumping
>>>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
>>>
>>> Maybe long term we make kmem_buckets unconditional and use that.
>>>
>>>>> we do want this fix to be backported easily to older stable kernels. I will see
>>>>> how does this narrowed down size bumping looks like.
>>>>>
>>>>
>>>> BTW I think we need something like the following, right?
>>>>
>>>> if (mem_alloc_profiling_enabled()) {
>>>> if (obj_exts_cache->object_size <= s->object_size)
>>>> return s->object_size + 1;
>>>> } else {
>>>> if (obj_exts_cache->object_size == s->object_size)
>>>> return s->object_size + 1;
>>>> }
>>
>> We should not add mem_alloc_profiling_enabled() check because,
>> then we're not fixing this issue on SLUB_TINY, when the caller specifies
>> __GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.
>>
>> `if (!is_kmalloc_normal(s))` check already bails out when it doesn't
>> need to bump the size.
>>
>> So Shakeel's original code will work fine.
>>
>> We're only pessimizing memory allocation profiling and
>> SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
>> it wouldn't make much sense to enable MEMCG on memory restricted systems
>> anyway. (IIRC even raspberry pis don't enable the memory controller by
>> default...)
>>
>> I think it's okay to fix the bug first, but we need to address
>> the memory wastage issue sooner or later if companies (Meta and
>> Google I guess?) are deploying kernels with memory allocation profiling
>> on in production systems.
>
> Sorry for the delay folks. I just got a chance to read through this thread.
Hi Suren, no worries!
> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> this recursion problem once and for all. This size bumping and the
> special case of SLUB_TINY are quite confusing.
As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
infrastructure would be more straightforward than new KMALLOC_TYPE
because (I think) the kmalloc type is decided purely based on GFP
flags and we need to somehow work around that. SLAB_BUCKETS provides
a nice abstraction to do this.
Luckily, SLAB_BUCKETS is introduced in v6.11.
Unfortunately, SLAB_BUCKETS is optional.
> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> enabled to avoid new caches when not needed. Does not seem too complex
> but maybe I'm missing something? WDYT?
I think we need some enhancements to achieve that with SLAB_BUCKETS
1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
(w/ SLAB_BUCKETS being a transitional config for _HARDENING)
2. Make the SLAB_BUCKETS infrastructure unconditional,
but the decision is made at runtime:
1) actually creating a kmem_buckets vs.
2) falling back to kmalloc.
3. kmem_buckets_create() creates kmem_buckets only when
SLAB_BUCKETS_HARDENING is enabled.
4. SLUB decides (not) to create kmem_buckets for internal use
during the boot process. Use the kmem_buckets for obj_exts
array allocation.
Side note: this would unconditionally add the kmem_buckets parameter to
the kmalloc slowpath. Probably it'd be worth introducing a dedicated
entrypoint for kmem_buckets instead.
> If it is more complex than I imaging then I'm fine with Shakeel's
> approach as a temporary fix.
Since above requires quite some changes, I'd say let's proeed with
the fix (since it's one line of code change that fixes a bug),
and then see how we can make SLAB_BUCKETS changes as minimal
as possible for backporting?
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-06-29 3:57 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 23:00 [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache Shakeel Butt
2026-06-26 4:22 ` Harry Yoo
2026-06-26 16:49 ` Shakeel Butt
2026-06-26 17:11 ` Vlastimil Babka (SUSE)
2026-06-28 2:58 ` Shakeel Butt
2026-06-28 3:23 ` Shakeel Butt
2026-06-28 7:47 ` Vlastimil Babka (SUSE)
2026-06-28 9:22 ` Harry Yoo
2026-06-28 23:37 ` Suren Baghdasaryan
2026-06-29 3:57 ` Harry Yoo [this message]
2026-06-29 4:28 ` Suren Baghdasaryan
2026-06-29 19:52 ` Shakeel Butt
2026-06-30 2:03 ` Harry Yoo
2026-06-30 2:30 ` Harry Yoo
2026-06-30 4:38 ` Suren Baghdasaryan
2026-06-30 4:39 ` Suren Baghdasaryan
2026-06-30 4:42 ` Harry Yoo
2026-06-30 5:29 ` Suren Baghdasaryan
2026-06-30 6:12 ` Vlastimil Babka (SUSE)
2026-06-30 7:03 ` Harry Yoo
2026-06-30 14:35 ` Shakeel Butt
2026-06-30 14:52 ` Suren Baghdasaryan
2026-06-30 15:27 ` Harry Yoo
2026-06-30 23:55 ` Suren Baghdasaryan
2026-06-28 8:10 ` Harry Yoo
2026-06-28 8:36 ` Harry Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eac2ed38-432a-4d95-8968-2663540d73d7@kernel.org \
--to=harry@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=dcostantino@meta.com \
--cc=hao.li@linux.dev \
--cc=kees@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox