From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: "Harry Yoo (Oracle)" <harry@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Suren Baghdasaryan <surenb@google.com>, Hao Ge <hao.ge@linux.dev>,
Kees Cook <kees@kernel.org>, Pedro Falcato <pfalcato@suse.de>,
Shakeel Butt <shakeel.butt@linux.dev>,
Danielle Constantino <dcostantino@meta.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type
Date: Thu, 2 Jul 2026 14:57:34 +0200 [thread overview]
Message-ID: <de2e0d60-6954-4b54-bbff-ae78dbf9f789@kernel.org> (raw)
In-Reply-To: <20260702-kmalloc-no-objext-v1-2-167175008538@kernel.org>
On 7/2/26 06:09, Harry Yoo (Oracle) wrote:
> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
> its own slab") avoided recursive allocation of obj_exts from kmalloc
> caches of the same size, by bumping the obj_exts array's allocation
> size whenever the array size equals the size of the object being
> allocated.
>
> However, as reported by Danielle Costantino and Shakeel Butt,
> even slabs from kmalloc caches of different sizes can form a cycle
> by allocating obj_exts arrays from each other [1]:
>
> What happened: a KMALLOC_NORMAL slab's obj_exts array (used by
> allocation profiling / memcg accounting) is itself kmalloc()'d from a
> KMALLOC_NORMAL cache, so the "slab holds another slab's obj_exts array"
> relation can form cycles. With sizeof(struct slabobj_ext) == 16 and
> the host's geometry:
>
> - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
> served from kmalloc-1k;
> - kmalloc-1k has 32 objects/slab -> array is 32*16 == 512 bytes,
> served from kmalloc-512.
>
> A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
> obj_exts array. Discarding one frees the other's array, which empties
> and discards that slab, which frees the first's array, and so on:
> __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
> __free_slab() recurses along the cycle until the stack is exhausted.
>
> With memory allocation profiling, this allows unbounded recursion
> in the free path and led to a stack overflow on a production host in
> the Meta fleet [1]:
>
> BUG: TASK stack guard page was hit
> Oops: stack guard page
> RIP: 0010:kfree+0x8/0x5d0
> Call Trace:
> __free_slab+0x66/0xc0
> kfree+0x3f0/0x5d0
> ... ( ~125x __free_slab <-> kfree ) ...
> <kernel driver freeing a resource>
> do_syscall_64
>
> It is proposed [1] to resolve this issue by always serving the obj_exts
> array allocation from kmalloc caches (or large kmalloc) of sizes larger
> than the object size. However, as pointed out by Vlastimil Babka [2],
> this can waste an excessive amount of memory as slabs from large
> kmalloc sizes (e.g. kmalloc-8k) generally need obj_exts arrays much
> smaller than the object size.
>
> Therefore, rather than bumping the size, let us take a different
> approach; disallow formation of cycles between kmalloc types when
> allocating obj_exts arrays. Currently, all obj_exts arrays are served
> from normal kmalloc caches. Cycles cannot be created if obj_exts arrays
> of normal kmalloc caches are served from a special kmalloc type that can
> never have obj_exts arrays.
>
> To achieve this, create a new kmalloc type called KMALLOC_NO_OBJ_EXT.
> KMALLOC_NO_OBJ_EXT caches are created when CONFIG_SLAB_OBJ_EXT is
> enabled, and they have SLAB_NO_OBJ_EXT flag to prevent allocation
> of obj_exts arrays. They remain unused until allocation of obj_exts
> arrays for normal kmalloc caches happens.
I wonder if we should just use them always (not just for kmalloc_normal) if
we already have them. Would there be any downside?
> Sheaf boostrapping for KMALLOC_NO_OBJ_EXT caches now must be deferred
> because allocation of a barn can trigger obj_exts array allocation of
> normal kmalloc caches when the KMALLOC_NO_OBJ_EXT cache for that size
> is not ready yet. For simplicity, perform bootstrapping of sheaves for
> all kmalloc caches later.
>
> Introduce a new slab alloc flag, SLAB_ALLOC_NO_OBJ_EXT, to prevent
> allocation of obj_exts arrays, and let kmalloc_slab() override the type
> to KMALLOC_NO_OBJ_EXT when specified. Note that kmalloc_type() remains
> unchanged because kmalloc_flags() bypasses the kmalloc fastpath.
>
> Do not pass SLAB_ALLOC_NO_RECURSE to kmalloc_flags() in
> alloc_slab_obj_exts() and instead use SLAB_ALLOC_NO_OBJ_EXT only when
> the objects are allocated from normal kmalloc caches. While this
> prevents unbounded recursive allocation of obj_exts, it allows
> KMALLOC_NO_OBJ_EXT caches to have sheaves.
>
> Since sheaf allocations specify SLAB_ALLOC_NO_RECURSE that prevents
> allocation of both sheaves and obj_exts arrays, the recursion depth
> is bounded.
>
> Reported-by: Danielle Costantino <dcostantino@meta.com>
> Reported-by: Shakeel Butt <shakeel.butt@linux.dev>
> Closes: https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@linux.dev [1]
> Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/linux-mm/c5c4208d-a6f0-413e-bad9-49be12f12d55@kernel.org [2]
> Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
> ---
> include/linux/slab.h | 3 ++
> mm/slab.h | 17 +++++++++--
> mm/slab_common.c | 18 +++++++++++-
> mm/slub.c | 83 +++++++++++++++++++++-------------------------------
> 4 files changed, 68 insertions(+), 53 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 08d7b6c9c4d6..0c1d13773523 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -721,6 +721,9 @@ enum kmalloc_cache_type {
> #endif
> #ifdef CONFIG_MEMCG
> KMALLOC_CGROUP,
> +#endif
> +#ifdef CONFIG_SLAB_OBJ_EXT
> + KMALLOC_NO_OBJ_EXT,
> #endif
> NR_KMALLOC_TYPES
> };
> diff --git a/mm/slab.h b/mm/slab.h
> index 281a65233795..0428cd495191 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -22,6 +22,7 @@
> #define SLAB_ALLOC_NOLOCK 0x01 /* a kmalloc_nolock() allocation */
> #define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */
> #define SLAB_ALLOC_NO_RECURSE 0x04 /* prevent kmalloc() recursion */
> +#define SLAB_ALLOC_NO_OBJ_EXT 0x08 /* prevent obj_exts array allocation */
>
> static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
> {
> @@ -386,12 +387,19 @@ static inline unsigned int size_index_elem(unsigned int bytes)
> * KMALLOC_MAX_CACHE_SIZE and the caller must check that.
> */
> static inline struct kmem_cache *
> -kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token)
> +kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token,
> + unsigned int alloc_flags)
> {
> unsigned int index;
> + enum kmalloc_cache_type type = kmalloc_type(flags, token);
> +
> +#ifdef CONFIG_SLAB_OBJ_EXT
> + if (alloc_flags & SLAB_ALLOC_NO_OBJ_EXT)
> + type = KMALLOC_NO_OBJ_EXT;
> +#endif
>
> if (!b)
> - b = &kmalloc_caches[kmalloc_type(flags, token)];
> + b = &kmalloc_caches[type];
> if (size <= 192)
> index = kmalloc_size_index[size_index_elem(size)];
> else
> @@ -426,6 +434,11 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
> {
> if (!is_kmalloc_cache(s))
> return false;
> +
> + /* KMALLOC_NO_OBJ_EXT is not normal kmalloc */
> + if (s->flags & SLAB_NO_OBJ_EXT)
> + return false;
Could it just go the the test below?
> +
> return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
> }
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index b6426d7ceec9..7f262134d0f2 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -783,11 +783,15 @@ u8 kmalloc_size_index[24] __ro_after_init = {
> size_t kmalloc_size_roundup(size_t size)
> {
> if (size && size <= KMALLOC_MAX_CACHE_SIZE) {
> + struct kmem_cache *s;
> +
> /*
> * The flags don't matter since size_index is common to all.
> * Neither does the caller for just getting ->object_size.
> */
> - return kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0))->object_size;
> + s = kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0),
> + SLAB_ALLOC_DEFAULT);
> + return s->object_size;
> }
>
> /* Above the smaller buckets, size is a multiple of page size. */
> @@ -843,6 +847,12 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
> #define KMALLOC_PARTITION_NAME(N, sz)
> #endif
>
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +#define KMALLOC_NO_OBJ_EXT_NAME(sz) .name[KMALLOC_NO_OBJ_EXT] = "kmalloc-no-objext-" #sz,
> +#else
> +#define KMALLOC_NO_OBJ_EXT_NAME(sz)
> +#endif
> +
> #define INIT_KMALLOC_INFO(__size, __short_size) \
> { \
> .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \
> @@ -850,6 +860,7 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
> KMALLOC_CGROUP_NAME(__short_size) \
> KMALLOC_DMA_NAME(__short_size) \
> KMALLOC_PARTITION_NAME(KMALLOC_PARTITION_CACHES_NR, __short_size) \
> + KMALLOC_NO_OBJ_EXT_NAME(__short_size) \
> .size = __size, \
> }
>
> @@ -966,6 +977,11 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type)
> flags |= SLAB_NO_MERGE;
> #endif
>
> +#ifdef CONFIG_SLAB_OBJ_EXT
> + if (type == KMALLOC_NO_OBJ_EXT)
> + flags |= SLAB_NO_OBJ_EXT | SLAB_NO_MERGE;
> +#endif
> +
> /*
> * If CONFIG_MEMCG is enabled, disable cache merging for
> * KMALLOC_NORMAL caches.
> diff --git a/mm/slub.c b/mm/slub.c
> index efc85053ae84..8428b8308856 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2123,42 +2123,6 @@ static inline void init_slab_obj_exts(struct slab *slab)
> slab->obj_exts = 0;
> }
>
> -/*
> - * Calculate the allocation size for slabobj_ext array.
> - *
> - * When memory allocation profiling is enabled, the obj_exts array
> - * could be allocated from the same slab cache it's being allocated for.
> - * This would prevent the slab from ever being freed because it would
> - * always contain at least one allocated object (its own obj_exts array).
> - *
> - * To avoid this, increase the allocation size when we detect the array
> - * may come from the same cache, forcing it to use a different cache.
> - */
> -static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
> - struct slab *slab, gfp_t gfp)
> -{
> - size_t sz = sizeof(struct slabobj_ext) * slab->objects;
> - struct kmem_cache *obj_exts_cache;
> -
> - if (sz > KMALLOC_MAX_CACHE_SIZE)
> - return sz;
> -
> - if (!is_kmalloc_normal(s))
> - return sz;
> -
> - obj_exts_cache = kmalloc_slab(sz, NULL, gfp, __kmalloc_token(0));
> - /*
> - * We can't simply compare s with obj_exts_cache, because partitioned kmalloc
> - * caches have multiple caches per size, selected by caller address or type.
> - * Since caller address or type may differ between kmalloc_slab() and actual
> - * allocation, bump size when sizes are equal.
> - */
> - if (s->object_size == obj_exts_cache->object_size)
> - return obj_exts_cache->object_size + 1;
> -
> - return sz;
> -}
> -
> int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> gfp_t gfp, unsigned int alloc_flags)
> {
> @@ -2168,14 +2132,18 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> unsigned long new_exts;
> unsigned long old_exts;
> struct slabobj_ext *vec;
> - size_t sz;
> + size_t sz = sizeof(struct slabobj_ext) * slab->objects;
>
> gfp &= ~OBJCGS_CLEAR_MASK;
> - /* Prevent recursive extension vector allocation */
> - alloc_flags |= SLAB_ALLOC_NO_RECURSE;
> - alloc_flags &= ~SLAB_ALLOC_NEW_SLAB;
> + /*
> + * In most cases, obj_exts arrays are allocated from normal kmalloc.
> + * However, normal kmalloc caches must allocate them from
> + * KMALLOC_NO_OBJ_EXT to caches to prevent recursion.
> + */
> + if (is_kmalloc_normal(s))
> + alloc_flags |= SLAB_ALLOC_NO_OBJ_EXT;
>
> - sz = obj_exts_alloc_size(s, slab, gfp);
> + alloc_flags &= ~SLAB_ALLOC_NEW_SLAB;
>
> /* This will use kmalloc_nolock() if alloc_flags say so */
> vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));
> @@ -2193,8 +2161,21 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> return -ENOMEM;
> }
>
> - VM_WARN_ON_ONCE(virt_to_slab(vec) != NULL &&
> - virt_to_slab(vec)->slab_cache == s);
> + if (IS_ENABLED(CONFIG_DEBUG_VM)) {
> + struct kmem_cache *exts_cache;
> + struct slab *exts_slab;
> +
> + exts_slab = virt_to_slab(vec);
> + if (exts_slab) {
> + /*
> + * The vector must be allocated from either normal or
> + * KMALLOC_NO_OBJ_EXT kmalloc caches to avoid cycles.
> + */
> + exts_cache = virt_to_slab(vec)->slab_cache;
> + WARN_ON_ONCE(!is_kmalloc_normal(exts_cache) &&
> + !(exts_cache->flags & SLAB_NO_OBJ_EXT));
> + }
> + }
>
> new_exts = (unsigned long)vec;
> #ifdef CONFIG_MEMCG
> @@ -2254,7 +2235,7 @@ static inline void free_slab_obj_exts(struct slab *slab, bool allow_spin)
> }
>
> /*
> - * obj_exts was created with SLAB_ALLOC_NO_RECURSE flag, therefore its
> + * obj_exts was created with SLAB_ALLOC_NO_OBJ_EXT flag, therefore its
> * corresponding extension will be NULL. alloc_tag_sub() will throw a
> * warning if slab has extensions but the extension of an object is
> * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
> @@ -5330,7 +5311,7 @@ void *__do_kmalloc_node(kmem_buckets *b, gfp_t flags, int node,
> if (unlikely(!size))
> return ZERO_SIZE_PTR;
>
> - s = kmalloc_slab(size, b, flags, token);
> + s = kmalloc_slab(size, b, flags, token, ac->alloc_flags);
>
> ret = slab_alloc_node(s, flags, node, ac);
> ret = kasan_kmalloc(s, ret, size, flags);
> @@ -5395,7 +5376,9 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f
> retry:
> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> return NULL;
> - s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token));
> +
> + s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token),
> + ac->alloc_flags);
>
> if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
> /*
> @@ -7957,10 +7940,10 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> s->allocflags |= __GFP_RECLAIMABLE;
>
> /*
> - * For KMALLOC_NORMAL caches we enable sheaves later by
> - * bootstrap_kmalloc_sheaves() to avoid recursion
> + * For kmalloc caches we enable sheaves later by
> + * bootstrap_kmalloc_sheaves() to avoid recursion.
> */
> - if (!is_kmalloc_normal(s))
> + if (!(s->flags & SLAB_KMALLOC))
is_kmalloc_cache()?
> s->sheaf_capacity = calculate_sheaf_capacity(s, args);
>
> /*
> @@ -8524,7 +8507,7 @@ static void __init bootstrap_kmalloc_sheaves(void)
> {
> enum kmalloc_cache_type type;
>
> - for (type = KMALLOC_NORMAL; type <= KMALLOC_PARTITION_END; type++) {
> + for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) {
> for (int idx = 0; idx < KMALLOC_SHIFT_HIGH + 1; idx++) {
> if (kmalloc_caches[type][idx])
> bootstrap_cache_sheaves(kmalloc_caches[type][idx]);
>
next prev parent reply other threads:[~2026-07-02 12:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-02 4:09 [PATCH RFC hotfixes 0/2] mm/slab: fix unbounded recursion in free path with memalloc profiling Harry Yoo (Oracle)
2026-07-02 4:09 ` [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT Harry Yoo (Oracle)
2026-07-02 12:49 ` Vlastimil Babka (SUSE)
2026-07-02 4:09 ` [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type Harry Yoo (Oracle)
2026-07-02 12:57 ` Vlastimil Babka (SUSE) [this message]
2026-07-02 13:20 ` Harry Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de2e0d60-6954-4b54-bbff-ae78dbf9f789@kernel.org \
--to=vbabka@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=dcostantino@meta.com \
--cc=hao.ge@linux.dev \
--cc=hao.li@linux.dev \
--cc=harry@kernel.org \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pfalcato@suse.de \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox