Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Yoo <harry@kernel.org>
To: Shakeel Butt <shakeel.butt@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Usama Arif <usama.arif@linux.dev>,
	Meta kernel team <kernel-team@meta.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Danielle Costantino <dcostantino@meta.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
Date: Tue, 30 Jun 2026 14:11:45 +0900	[thread overview]
Message-ID: <f07168be-23c6-4ff2-bb74-60509454ee01@kernel.org> (raw)
In-Reply-To: <20260630024357.3591304-1-shakeel.butt@linux.dev>

On 6/30/26 11:43 AM, Shakeel Butt wrote:
> A production host in the Meta fleet (6.16 kernel, memory allocation
> profiling enabled) panicked with a kernel stack overflow while a kernel
> driver was freeing a resource:
> 
>   BUG: TASK stack guard page was hit
>   Oops: stack guard page
>   RIP: 0010:kfree+0x8/0x5d0
>   Call Trace:
>    __free_slab+0x66/0xc0
>    kfree+0x3f0/0x5d0
>    ... ( ~125x __free_slab <-> kfree ) ...
>    <kernel driver freeing a resource>
>    do_syscall_64
> 
> The crash dump shows a 125-deep __free_slab<->kfree recursion that
> overflowed the 16 KiB kernel stack.
> 
> What happened: a KMALLOC_NORMAL slab's obj_exts array (used by allocation
> profiling / memcg accounting) is itself kmalloc()'d from a KMALLOC_NORMAL
> cache, so the "slab holds another slab's obj_exts array" relation can form
> cycles.  With sizeof(struct slabobj_ext) == 16 and the host's geometry:
> 
>   - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
>     served from kmalloc-1k;
>   - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
>     served from kmalloc-512.
> 
> A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
> obj_exts array.  Discarding one frees the other's array, which empties and
> discards that slab, which frees the first's array, and so on:
> __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
> __free_slab() recurses along the cycle until the stack is exhausted.  The
> dump confirms it: the recursion's slabs strictly alternate kmalloc-512
> (obj_exts in kmalloc-1k) and kmalloc-1k (obj_exts in kmalloc-512), and
> mem_alloc_profiling_key was enabled.
> 
> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
> its own slab") is not sufficient: it bumps the allocation size only when
> the array would come from the *same* cache (object_size ==).  At the
> geometry above neither cache is self-referential (512 != 1024 and
> 1024 != 512), so the bump never triggers and the kmalloc-512 <-> kmalloc-1k
> cross cycle remains.
> 
> Fix it structurally by removing cycles of every shape: serve the array
> from a cache strictly larger than the one it describes whenever it would
> otherwise come from the same or a smaller cache.  Every reference edge
> then points from a smaller to a larger cache (here kmalloc-1k's array
> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> No slab can be self- or cross-pinned, the tear-down recursion is bounded
> by the number of kmalloc size classes (it terminates at the large-kmalloc
> path, which carries no obj_exts), and profiling/accounting coverage is
> unchanged - the array is still allocated, only relocated.
> 
> Reproduced on next-20260623 at the same geometry: churning
> kmalloc-512/kmalloc-1k under vm.mem_profiling and then shrinking leaves
> kmalloc-512 with thousands of unreclaimable objects without this patch
> (8056) and at baseline with it (847).
> 
> Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
> Reported-by: Danielle Costantino <dcostantino@meta.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Looks good to me so:
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>

and it also passed my test suite, so:
Tested-by: Harry Yoo (Oracle) <harry@kernel.org>

Interestingly, Sashiko pointed out one issue [1] that
doesn't sound completely wrong. But that's a pre-existing one
and although Sashiko (presumably) thinks this patch makes it easier
to trigger this, I think the scenario is unreachable.

[1]
https://sashiko.dev/#/patchset/20260630024357.3591304-1-shakeel.butt%40linux.dev

Here's why I don't think anybody would be hitting it:

It says if s->object_size == KMALLOC_MAX_CACHE_SIZE,
alloc_slab_obj_exts() will always fail with SLAB_ALLOC_NOLOCK because
kmalloc_nolock() does not support large kmalloc.

Then a later allocation of slab objects allocates obj_exts array
(with large kmalloc), and freeing of the slab in unknown context tries
to free the obj_exts array, which kfree_nolock() doesn't support and
leaks the obj_exts array.

However, freeing slab in unknown context is done only when trylock
fails after allocating a new slab. So it's unreachable.

-- 
Cheers,
Harry / Hyeonggon


      reply	other threads:[~2026-06-30  5:11 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-30  2:43 [PATCH v2] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache Shakeel Butt
2026-06-30  5:11 ` Harry Yoo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f07168be-23c6-4ff2-bb74-60509454ee01@kernel.org \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=dcostantino@meta.com \
    --cc=hao.li@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox