The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Harry Yoo <harry@kernel.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Suren Baghdasaryan <surenb@google.com>, Hao Ge <hao.ge@linux.dev>,
	Kees Cook <kees@kernel.org>, Pedro Falcato <pfalcato@suse.de>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Danielle Constantino <dcostantino@meta.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type
Date: Thu, 2 Jul 2026 22:20:34 +0900	[thread overview]
Message-ID: <b3f15af2-2c01-48bd-af87-e27df32388ea@kernel.org> (raw)
In-Reply-To: <de2e0d60-6954-4b54-bbff-ae78dbf9f789@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 4623 bytes --]



On 7/2/26 9:57 PM, Vlastimil Babka (SUSE) wrote:
> On 7/2/26 06:09, Harry Yoo (Oracle) wrote:
>> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
>> its own slab") avoided recursive allocation of obj_exts from kmalloc
>> caches of the same size, by bumping the obj_exts array's allocation
>> size whenever the array size equals the size of the object being
>> allocated.
>>
>> However, as reported by Danielle Costantino and Shakeel Butt,
>> even slabs from kmalloc caches of different sizes can form a cycle
>> by allocating obj_exts arrays from each other [1]:
>>
>>   What happened: a KMALLOC_NORMAL slab's obj_exts array (used by
>>   allocation profiling / memcg accounting) is itself kmalloc()'d from a
>>   KMALLOC_NORMAL cache, so the "slab holds another slab's obj_exts array"
>>   relation can form cycles. With sizeof(struct slabobj_ext) == 16 and
>>   the host's geometry:
>>
>>   - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
>>     served from kmalloc-1k;
>>   - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
>>     served from kmalloc-512.
>>
>>   A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
>>   obj_exts array.  Discarding one frees the other's array, which empties
>>   and discards that slab, which frees the first's array, and so on:
>>   __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
>>   __free_slab() recurses along the cycle until the stack is exhausted.
>>
>> With memory allocation profiling, this allows unbounded recursion
>> in the free path and led to a stack overflow on a production host in
>> the Meta fleet [1]:
>>
>>   BUG: TASK stack guard page was hit
>>   Oops: stack guard page
>>   RIP: 0010:kfree+0x8/0x5d0
>>   Call Trace:
>>    __free_slab+0x66/0xc0
>>    kfree+0x3f0/0x5d0
>>    ... ( ~125x __free_slab <-> kfree ) ...
>>    <kernel driver freeing a resource>
>>    do_syscall_64
>>
>> It is proposed [1] to resolve this issue by always serving the obj_exts
>> array allocation from kmalloc caches (or large kmalloc) of sizes larger
>> than the object size. However, as pointed out by Vlastimil Babka [2],
>> this can waste an excessive amount of memory as slabs from large
>> kmalloc sizes (e.g. kmalloc-8k) generally need obj_exts arrays much
>> smaller than the object size.
>>
>> Therefore, rather than bumping the size, let us take a different
>> approach; disallow formation of cycles between kmalloc types when
>> allocating obj_exts arrays. Currently, all obj_exts arrays are served
>> from normal kmalloc caches. Cycles cannot be created if obj_exts arrays
>> of normal kmalloc caches are served from a special kmalloc type that can
>> never have obj_exts arrays.
>>
>> To achieve this, create a new kmalloc type called KMALLOC_NO_OBJ_EXT.
>> KMALLOC_NO_OBJ_EXT caches are created when CONFIG_SLAB_OBJ_EXT is
>> enabled, and they have SLAB_NO_OBJ_EXT flag to prevent allocation
>> of obj_exts arrays. They remain unused until allocation of obj_exts
>> arrays for normal kmalloc caches happens.
> 
> I wonder if we should just use them always (not just for kmalloc_normal) if
> we already have them. Would there be any downside?

Good point!

That's more intuitive and sounds like it's good to separate them because
likely obj_exts will have longer lifetime than slab objects.

Not sure about impact on memory usage, need to check.
I'd say it's fine as long as it doesn't clearly increase memory usage.

But I guess that should not be part of bugfix as it's a functional
change that is not required to fix the bug.

>> @@ -426,6 +434,11 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
>>  {
>>  	if (!is_kmalloc_cache(s))
>>  		return false;
>> +
>> +	/* KMALLOC_NO_OBJ_EXT is not normal kmalloc */
>> +	if (s->flags & SLAB_NO_OBJ_EXT)
>> +		return false;
> 
> Could it just go the the test below?

Yes!

>> +
>>  	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));

>> @@ -7957,10 +7940,10 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
>>  		s->allocflags |= __GFP_RECLAIMABLE;
>>  
>>  	/*
>> -	 * For KMALLOC_NORMAL caches we enable sheaves later by
>> -	 * bootstrap_kmalloc_sheaves() to avoid recursion
>> +	 * For kmalloc caches we enable sheaves later by
>> +	 * bootstrap_kmalloc_sheaves() to avoid recursion.
>>  	 */
>> -	if (!is_kmalloc_normal(s))
>> +	if (!(s->flags & SLAB_KMALLOC))
> 
> is_kmalloc_cache()?

Will do, thanks!

-- 
Cheers,
Harry / Hyeonggon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      reply	other threads:[~2026-07-02 13:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-02  4:09 [PATCH RFC hotfixes 0/2] mm/slab: fix unbounded recursion in free path with memalloc profiling Harry Yoo (Oracle)
2026-07-02  4:09 ` [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT Harry Yoo (Oracle)
2026-07-02 12:49   ` Vlastimil Babka (SUSE)
2026-07-02  4:09 ` [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type Harry Yoo (Oracle)
2026-07-02 12:57   ` Vlastimil Babka (SUSE)
2026-07-02 13:20     ` Harry Yoo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3f15af2-2c01-48bd-af87-e27df32388ea@kernel.org \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=dcostantino@meta.com \
    --cc=hao.ge@linux.dev \
    --cc=hao.li@linux.dev \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pfalcato@suse.de \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox