The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH RFC hotfixes 0/2] mm/slab: fix unbounded recursion in free path with memalloc profiling
@ 2026-07-02  4:09 Harry Yoo (Oracle)
  2026-07-02  4:09 ` [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT Harry Yoo (Oracle)
  2026-07-02  4:09 ` [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type Harry Yoo (Oracle)
  0 siblings, 2 replies; 6+ messages in thread
From: Harry Yoo (Oracle) @ 2026-07-02  4:09 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Suren Baghdasaryan, Hao Ge,
	Kees Cook, Pedro Falcato, Shakeel Butt, Danielle Constantino
  Cc: linux-mm, linux-kernel, Harry Yoo (Oracle)

This is a follow-up fix after the recent discussion [1].
Based on slab/for-next (b0b6ec46e025f) and available at
git.kernel.org [2].

Instead preventing cycles by bumping up the allocation size of obj_exts
arrays, it introduces a new kmalloc type called KMALLOC_NO_OBJ_EXT and
disallow formation of cycles between kmalloc types when allocating
obj_exts arrays. obj_exts arrays of normal kmalloc caches are served
from KMALLOC_NO_OBJ_EXT caches (that don't have obj_exts), and all other
obj_exts arrays are served from normal kmalloc caches.

I tried to reuse SLAB_ALLOC_NO_RECURSE to make kmalloc_slab() select
KMALLOC_NO_OBJ_EXT, but it was not great because it does not allow
sheaves for those caches. So I introduced a new slab alloc flag
SLAB_ALLOC_NO_OBJ_EXT.

To avoid huge confusion, I had to decouple "disallowing sheaves"
semantics from SLAB_NO_OBJ_EXT and introduced SLAB_NO_SHEAVES.

While this cannot be directly backported to v6.18 and v6.12 due to lack
of SLAB_ALLOC_* flags and kmalloc_flags(), I don't this will be
particularily challenging to backport it. Instead of a new slab alloc
flag, we can use __GFP_NO_OBJ_EXT to select KMALLOC_NO_OBJ_EXT as
kmalloc caches don't have sheaves in v6.18 anyway.

[1] https://lore.kernel.org/linux-mm/9a139365-28e6-4f1e-b35b-7f6091e9aa14@kernel.org

[2] https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=kmalloc-no-objext-rfc-v1r3

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
Harry Yoo (Oracle) (2):
      mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT
      mm/slab: prevent unbounded recursion in free path with new kmalloc type

 include/linux/slab.h | 16 +++++++--
 mm/slab.h            | 17 ++++++++--
 mm/slab_common.c     | 18 +++++++++-
 mm/slub.c            | 93 ++++++++++++++++++++++------------------------------
 4 files changed, 85 insertions(+), 59 deletions(-)
---
base-commit: b0b6ec46e025fd46c344915a42bc535d9b15a1fb
change-id: 20260702-kmalloc-no-objext-2619c1e06083

Best regards,
-- 
Harry Yoo (Oracle) <harry@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT
  2026-07-02  4:09 [PATCH RFC hotfixes 0/2] mm/slab: fix unbounded recursion in free path with memalloc profiling Harry Yoo (Oracle)
@ 2026-07-02  4:09 ` Harry Yoo (Oracle)
  2026-07-02 12:49   ` Vlastimil Babka (SUSE)
  2026-07-02  4:09 ` [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type Harry Yoo (Oracle)
  1 sibling, 1 reply; 6+ messages in thread
From: Harry Yoo (Oracle) @ 2026-07-02  4:09 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Suren Baghdasaryan, Hao Ge,
	Kees Cook, Pedro Falcato, Shakeel Butt, Danielle Constantino
  Cc: linux-mm, linux-kernel, Harry Yoo (Oracle)

Bootstrap caches are created with SLAB_NO_OBJ_EXT to disallow sheaves
and obj_exts.

To allow disabling obj_exts while allowing sheaves, decouple
SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT. Bootstrap caches now have both
SLAB_NO_SHEAVES and SLAB_NO_OBJ_EXT.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: e47c897a2949 ("slab: add sheaves to most caches")
Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 include/linux/slab.h | 13 +++++++++++--
 mm/slub.c            | 10 ++++++----
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 51f03f18c9a7..08d7b6c9c4d6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -58,10 +58,13 @@ enum _slab_flag_bits {
 #endif
 	_SLAB_OBJECT_POISON,
 	_SLAB_CMPXCHG_DOUBLE,
+#ifdef CONFIG_SLAB_OBJ_EXT
 	_SLAB_NO_OBJ_EXT,
-#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+#ifdef CONFIG_64BIT
 	_SLAB_OBJ_EXT_IN_OBJ,
 #endif
+#endif
+	_SLAB_NO_SHEAVES,
 	_SLAB_FLAGS_LAST_BIT
 };
 
@@ -239,8 +242,14 @@ enum _slab_flag_bits {
 #endif
 #define SLAB_TEMPORARY		SLAB_RECLAIM_ACCOUNT	/* Objects are short-lived */
 
-/* Slab created using create_boot_cache */
+/* Slab caches without obj_exts array */
+#ifdef CONFIG_SLAB_OBJ_EXT
 #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_BIT(_SLAB_NO_OBJ_EXT)
+#else
+#define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
+#endif
+
+#define SLAB_NO_SHEAVES		__SLAB_FLAG_BIT(_SLAB_NO_SHEAVES)
 
 #if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
 #define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
diff --git a/mm/slub.c b/mm/slub.c
index 9f754cf1c187..efc85053ae84 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7777,12 +7777,12 @@ static unsigned int calculate_sheaf_capacity(struct kmem_cache *s,
 		return 0;
 
 	/*
-	 * Bootstrap caches can't have sheaves for now (SLAB_NO_OBJ_EXT).
+	 * Bootstrap caches can't have sheaves for now (SLAB_NO_SHEAVES).
 	 * SLAB_NOLEAKTRACE caches (e.g., kmemleak's object_cache) must not
 	 * have sheaves to avoid recursion when sheaf allocation triggers
 	 * kmemleak tracking.
 	 */
-	if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
+	if (s->flags & (SLAB_NO_SHEAVES | SLAB_NOLEAKTRACE))
 		return 0;
 
 	/*
@@ -8559,7 +8559,8 @@ void __init kmem_cache_init(void)
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
 			sizeof(struct kmem_cache_node),
-			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
+			SLAB_HWCACHE_ALIGN | SLAB_NO_SHEAVES | SLAB_NO_OBJ_EXT,
+			0, 0);
 
 	hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
 
@@ -8569,7 +8570,8 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, per_node) +
 				nr_node_ids * sizeof(struct kmem_cache_per_node_ptrs),
-			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
+			SLAB_HWCACHE_ALIGN | SLAB_NO_SHEAVES | SLAB_NO_OBJ_EXT,
+			0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 	kmem_cache_node = bootstrap(&boot_kmem_cache_node);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type
  2026-07-02  4:09 [PATCH RFC hotfixes 0/2] mm/slab: fix unbounded recursion in free path with memalloc profiling Harry Yoo (Oracle)
  2026-07-02  4:09 ` [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT Harry Yoo (Oracle)
@ 2026-07-02  4:09 ` Harry Yoo (Oracle)
  2026-07-02 12:57   ` Vlastimil Babka (SUSE)
  1 sibling, 1 reply; 6+ messages in thread
From: Harry Yoo (Oracle) @ 2026-07-02  4:09 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Suren Baghdasaryan, Hao Ge,
	Kees Cook, Pedro Falcato, Shakeel Butt, Danielle Constantino
  Cc: linux-mm, linux-kernel, Harry Yoo (Oracle)

Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
its own slab") avoided recursive allocation of obj_exts from kmalloc
caches of the same size, by bumping the obj_exts array's allocation
size whenever the array size equals the size of the object being
allocated.

However, as reported by Danielle Costantino and Shakeel Butt,
even slabs from kmalloc caches of different sizes can form a cycle
by allocating obj_exts arrays from each other [1]:

  What happened: a KMALLOC_NORMAL slab's obj_exts array (used by
  allocation profiling / memcg accounting) is itself kmalloc()'d from a
  KMALLOC_NORMAL cache, so the "slab holds another slab's obj_exts array"
  relation can form cycles. With sizeof(struct slabobj_ext) == 16 and
  the host's geometry:

  - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
    served from kmalloc-1k;
  - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
    served from kmalloc-512.

  A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
  obj_exts array.  Discarding one frees the other's array, which empties
  and discards that slab, which frees the first's array, and so on:
  __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
  __free_slab() recurses along the cycle until the stack is exhausted.

With memory allocation profiling, this allows unbounded recursion
in the free path and led to a stack overflow on a production host in
the Meta fleet [1]:

  BUG: TASK stack guard page was hit
  Oops: stack guard page
  RIP: 0010:kfree+0x8/0x5d0
  Call Trace:
   __free_slab+0x66/0xc0
   kfree+0x3f0/0x5d0
   ... ( ~125x __free_slab <-> kfree ) ...
   <kernel driver freeing a resource>
   do_syscall_64

It is proposed [1] to resolve this issue by always serving the obj_exts
array allocation from kmalloc caches (or large kmalloc) of sizes larger
than the object size. However, as pointed out by Vlastimil Babka [2],
this can waste an excessive amount of memory as slabs from large
kmalloc sizes (e.g. kmalloc-8k) generally need obj_exts arrays much
smaller than the object size.

Therefore, rather than bumping the size, let us take a different
approach; disallow formation of cycles between kmalloc types when
allocating obj_exts arrays. Currently, all obj_exts arrays are served
from normal kmalloc caches. Cycles cannot be created if obj_exts arrays
of normal kmalloc caches are served from a special kmalloc type that can
never have obj_exts arrays.

To achieve this, create a new kmalloc type called KMALLOC_NO_OBJ_EXT.
KMALLOC_NO_OBJ_EXT caches are created when CONFIG_SLAB_OBJ_EXT is
enabled, and they have SLAB_NO_OBJ_EXT flag to prevent allocation
of obj_exts arrays. They remain unused until allocation of obj_exts
arrays for normal kmalloc caches happens.

Sheaf boostrapping for KMALLOC_NO_OBJ_EXT caches now must be deferred
because allocation of a barn can trigger obj_exts array allocation of
normal kmalloc caches when the KMALLOC_NO_OBJ_EXT cache for that size
is not ready yet. For simplicity, perform bootstrapping of sheaves for
all kmalloc caches later.

Introduce a new slab alloc flag, SLAB_ALLOC_NO_OBJ_EXT, to prevent
allocation of obj_exts arrays, and let kmalloc_slab() override the type
to KMALLOC_NO_OBJ_EXT when specified. Note that kmalloc_type() remains
unchanged because kmalloc_flags() bypasses the kmalloc fastpath.

Do not pass SLAB_ALLOC_NO_RECURSE to kmalloc_flags() in
alloc_slab_obj_exts() and instead use SLAB_ALLOC_NO_OBJ_EXT only when
the objects are allocated from normal kmalloc caches. While this
prevents unbounded recursive allocation of obj_exts, it allows
KMALLOC_NO_OBJ_EXT caches to have sheaves.

Since sheaf allocations specify SLAB_ALLOC_NO_RECURSE that prevents
allocation of both sheaves and obj_exts arrays, the recursion depth
is bounded.

Reported-by: Danielle Costantino <dcostantino@meta.com>
Reported-by: Shakeel Butt <shakeel.butt@linux.dev>
Closes: https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@linux.dev [1]
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-mm/c5c4208d-a6f0-413e-bad9-49be12f12d55@kernel.org [2]
Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 include/linux/slab.h |  3 ++
 mm/slab.h            | 17 +++++++++--
 mm/slab_common.c     | 18 +++++++++++-
 mm/slub.c            | 83 +++++++++++++++++++++-------------------------------
 4 files changed, 68 insertions(+), 53 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 08d7b6c9c4d6..0c1d13773523 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -721,6 +721,9 @@ enum kmalloc_cache_type {
 #endif
 #ifdef CONFIG_MEMCG
 	KMALLOC_CGROUP,
+#endif
+#ifdef CONFIG_SLAB_OBJ_EXT
+	KMALLOC_NO_OBJ_EXT,
 #endif
 	NR_KMALLOC_TYPES
 };
diff --git a/mm/slab.h b/mm/slab.h
index 281a65233795..0428cd495191 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -22,6 +22,7 @@
 #define SLAB_ALLOC_NOLOCK	0x01 /* a kmalloc_nolock() allocation */
 #define SLAB_ALLOC_NEW_SLAB	0x02 /* a flag for alloc_slab_obj_exts() */
 #define SLAB_ALLOC_NO_RECURSE	0x04 /* prevent kmalloc() recursion */
+#define SLAB_ALLOC_NO_OBJ_EXT	0x08 /* prevent obj_exts array allocation */
 
 static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
 {
@@ -386,12 +387,19 @@ static inline unsigned int size_index_elem(unsigned int bytes)
  * KMALLOC_MAX_CACHE_SIZE and the caller must check that.
  */
 static inline struct kmem_cache *
-kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token)
+kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token,
+	     unsigned int alloc_flags)
 {
 	unsigned int index;
+	enum kmalloc_cache_type type = kmalloc_type(flags, token);
+
+#ifdef CONFIG_SLAB_OBJ_EXT
+	if (alloc_flags & SLAB_ALLOC_NO_OBJ_EXT)
+		type = KMALLOC_NO_OBJ_EXT;
+#endif
 
 	if (!b)
-		b = &kmalloc_caches[kmalloc_type(flags, token)];
+		b = &kmalloc_caches[type];
 	if (size <= 192)
 		index = kmalloc_size_index[size_index_elem(size)];
 	else
@@ -426,6 +434,11 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
 {
 	if (!is_kmalloc_cache(s))
 		return false;
+
+	/* KMALLOC_NO_OBJ_EXT is not normal kmalloc */
+	if (s->flags & SLAB_NO_OBJ_EXT)
+		return false;
+
 	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
 }
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index b6426d7ceec9..7f262134d0f2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -783,11 +783,15 @@ u8 kmalloc_size_index[24] __ro_after_init = {
 size_t kmalloc_size_roundup(size_t size)
 {
 	if (size && size <= KMALLOC_MAX_CACHE_SIZE) {
+		struct kmem_cache *s;
+
 		/*
 		 * The flags don't matter since size_index is common to all.
 		 * Neither does the caller for just getting ->object_size.
 		 */
-		return kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0))->object_size;
+		s = kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0),
+				 SLAB_ALLOC_DEFAULT);
+		return s->object_size;
 	}
 
 	/* Above the smaller buckets, size is a multiple of page size. */
@@ -843,6 +847,12 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
 #define KMALLOC_PARTITION_NAME(N, sz)
 #endif
 
+#ifdef CONFIG_SLAB_OBJ_EXT
+#define KMALLOC_NO_OBJ_EXT_NAME(sz) .name[KMALLOC_NO_OBJ_EXT] = "kmalloc-no-objext-" #sz,
+#else
+#define KMALLOC_NO_OBJ_EXT_NAME(sz)
+#endif
+
 #define INIT_KMALLOC_INFO(__size, __short_size)			\
 {								\
 	.name[KMALLOC_NORMAL]  = "kmalloc-" #__short_size,	\
@@ -850,6 +860,7 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
 	KMALLOC_CGROUP_NAME(__short_size)			\
 	KMALLOC_DMA_NAME(__short_size)				\
 	KMALLOC_PARTITION_NAME(KMALLOC_PARTITION_CACHES_NR, __short_size)	\
+	KMALLOC_NO_OBJ_EXT_NAME(__short_size)			\
 	.size = __size,						\
 }
 
@@ -966,6 +977,11 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type)
 		flags |= SLAB_NO_MERGE;
 #endif
 
+#ifdef CONFIG_SLAB_OBJ_EXT
+	if (type == KMALLOC_NO_OBJ_EXT)
+		flags |= SLAB_NO_OBJ_EXT | SLAB_NO_MERGE;
+#endif
+
 	/*
 	 * If CONFIG_MEMCG is enabled, disable cache merging for
 	 * KMALLOC_NORMAL caches.
diff --git a/mm/slub.c b/mm/slub.c
index efc85053ae84..8428b8308856 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2123,42 +2123,6 @@ static inline void init_slab_obj_exts(struct slab *slab)
 	slab->obj_exts = 0;
 }
 
-/*
- * Calculate the allocation size for slabobj_ext array.
- *
- * When memory allocation profiling is enabled, the obj_exts array
- * could be allocated from the same slab cache it's being allocated for.
- * This would prevent the slab from ever being freed because it would
- * always contain at least one allocated object (its own obj_exts array).
- *
- * To avoid this, increase the allocation size when we detect the array
- * may come from the same cache, forcing it to use a different cache.
- */
-static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
-					 struct slab *slab, gfp_t gfp)
-{
-	size_t sz = sizeof(struct slabobj_ext) * slab->objects;
-	struct kmem_cache *obj_exts_cache;
-
-	if (sz > KMALLOC_MAX_CACHE_SIZE)
-		return sz;
-
-	if (!is_kmalloc_normal(s))
-		return sz;
-
-	obj_exts_cache = kmalloc_slab(sz, NULL, gfp, __kmalloc_token(0));
-	/*
-	 * We can't simply compare s with obj_exts_cache, because partitioned kmalloc
-	 * caches have multiple caches per size, selected by caller address or type.
-	 * Since caller address or type may differ between kmalloc_slab() and actual
-	 * allocation, bump size when sizes are equal.
-	 */
-	if (s->object_size == obj_exts_cache->object_size)
-		return obj_exts_cache->object_size + 1;
-
-	return sz;
-}
-
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 			gfp_t gfp, unsigned int alloc_flags)
 {
@@ -2168,14 +2132,18 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 	unsigned long new_exts;
 	unsigned long old_exts;
 	struct slabobj_ext *vec;
-	size_t sz;
+	size_t sz = sizeof(struct slabobj_ext) * slab->objects;
 
 	gfp &= ~OBJCGS_CLEAR_MASK;
-	/* Prevent recursive extension vector allocation */
-	alloc_flags |= SLAB_ALLOC_NO_RECURSE;
-	alloc_flags &= ~SLAB_ALLOC_NEW_SLAB;
+	/*
+	 * In most cases, obj_exts arrays are allocated from normal kmalloc.
+	 * However, normal kmalloc caches must allocate them from
+	 * KMALLOC_NO_OBJ_EXT to caches to prevent recursion.
+	 */
+	if (is_kmalloc_normal(s))
+		alloc_flags |= SLAB_ALLOC_NO_OBJ_EXT;
 
-	sz = obj_exts_alloc_size(s, slab, gfp);
+	alloc_flags &= ~SLAB_ALLOC_NEW_SLAB;
 
 	/* This will use kmalloc_nolock() if alloc_flags say so */
 	vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));
@@ -2193,8 +2161,21 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 		return -ENOMEM;
 	}
 
-	VM_WARN_ON_ONCE(virt_to_slab(vec) != NULL &&
-			virt_to_slab(vec)->slab_cache == s);
+	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
+		struct kmem_cache *exts_cache;
+		struct slab *exts_slab;
+
+		exts_slab = virt_to_slab(vec);
+		if (exts_slab) {
+			/*
+			 * The vector must be allocated from either normal or
+			 * KMALLOC_NO_OBJ_EXT kmalloc caches to avoid cycles.
+			 */
+			exts_cache = virt_to_slab(vec)->slab_cache;
+			WARN_ON_ONCE(!is_kmalloc_normal(exts_cache) &&
+					!(exts_cache->flags & SLAB_NO_OBJ_EXT));
+		}
+	}
 
 	new_exts = (unsigned long)vec;
 #ifdef CONFIG_MEMCG
@@ -2254,7 +2235,7 @@ static inline void free_slab_obj_exts(struct slab *slab, bool allow_spin)
 	}
 
 	/*
-	 * obj_exts was created with SLAB_ALLOC_NO_RECURSE flag, therefore its
+	 * obj_exts was created with SLAB_ALLOC_NO_OBJ_EXT flag, therefore its
 	 * corresponding extension will be NULL. alloc_tag_sub() will throw a
 	 * warning if slab has extensions but the extension of an object is
 	 * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
@@ -5330,7 +5311,7 @@ void *__do_kmalloc_node(kmem_buckets *b, gfp_t flags, int node,
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
 
-	s = kmalloc_slab(size, b, flags, token);
+	s = kmalloc_slab(size, b, flags, token, ac->alloc_flags);
 
 	ret = slab_alloc_node(s, flags, node, ac);
 	ret = kasan_kmalloc(s, ret, size, flags);
@@ -5395,7 +5376,9 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f
 retry:
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
 		return NULL;
-	s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token));
+
+	s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token),
+			 ac->alloc_flags);
 
 	if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
 		/*
@@ -7957,10 +7940,10 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 		s->allocflags |= __GFP_RECLAIMABLE;
 
 	/*
-	 * For KMALLOC_NORMAL caches we enable sheaves later by
-	 * bootstrap_kmalloc_sheaves() to avoid recursion
+	 * For kmalloc caches we enable sheaves later by
+	 * bootstrap_kmalloc_sheaves() to avoid recursion.
 	 */
-	if (!is_kmalloc_normal(s))
+	if (!(s->flags & SLAB_KMALLOC))
 		s->sheaf_capacity = calculate_sheaf_capacity(s, args);
 
 	/*
@@ -8524,7 +8507,7 @@ static void __init bootstrap_kmalloc_sheaves(void)
 {
 	enum kmalloc_cache_type type;
 
-	for (type = KMALLOC_NORMAL; type <= KMALLOC_PARTITION_END; type++) {
+	for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) {
 		for (int idx = 0; idx < KMALLOC_SHIFT_HIGH + 1; idx++) {
 			if (kmalloc_caches[type][idx])
 				bootstrap_cache_sheaves(kmalloc_caches[type][idx]);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT
  2026-07-02  4:09 ` [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT Harry Yoo (Oracle)
@ 2026-07-02 12:49   ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 6+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-07-02 12:49 UTC (permalink / raw)
  To: Harry Yoo (Oracle), Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Suren Baghdasaryan, Hao Ge,
	Kees Cook, Pedro Falcato, Shakeel Butt, Danielle Constantino
  Cc: linux-mm, linux-kernel

On 7/2/26 06:09, Harry Yoo (Oracle) wrote:
> Bootstrap caches are created with SLAB_NO_OBJ_EXT to disallow sheaves
> and obj_exts.
> 
> To allow disabling obj_exts while allowing sheaves, decouple
> SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT. Bootstrap caches now have both
> SLAB_NO_SHEAVES and SLAB_NO_OBJ_EXT.
> 
> No functional change intended.
> 
> Cc: stable@vger.kernel.org
> Fixes: e47c897a2949 ("slab: add sheaves to most caches")
> Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>

Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> ---
>  include/linux/slab.h | 13 +++++++++++--
>  mm/slub.c            | 10 ++++++----
>  2 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 51f03f18c9a7..08d7b6c9c4d6 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -58,10 +58,13 @@ enum _slab_flag_bits {
>  #endif
>  	_SLAB_OBJECT_POISON,
>  	_SLAB_CMPXCHG_DOUBLE,
> +#ifdef CONFIG_SLAB_OBJ_EXT
>  	_SLAB_NO_OBJ_EXT,
> -#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +#ifdef CONFIG_64BIT
>  	_SLAB_OBJ_EXT_IN_OBJ,
>  #endif
> +#endif
> +	_SLAB_NO_SHEAVES,
>  	_SLAB_FLAGS_LAST_BIT
>  };
>  
> @@ -239,8 +242,14 @@ enum _slab_flag_bits {
>  #endif
>  #define SLAB_TEMPORARY		SLAB_RECLAIM_ACCOUNT	/* Objects are short-lived */
>  
> -/* Slab created using create_boot_cache */
> +/* Slab caches without obj_exts array */
> +#ifdef CONFIG_SLAB_OBJ_EXT
>  #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_BIT(_SLAB_NO_OBJ_EXT)
> +#else
> +#define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
> +#endif
> +
> +#define SLAB_NO_SHEAVES		__SLAB_FLAG_BIT(_SLAB_NO_SHEAVES)
>  
>  #if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
>  #define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
> diff --git a/mm/slub.c b/mm/slub.c
> index 9f754cf1c187..efc85053ae84 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -7777,12 +7777,12 @@ static unsigned int calculate_sheaf_capacity(struct kmem_cache *s,
>  		return 0;
>  
>  	/*
> -	 * Bootstrap caches can't have sheaves for now (SLAB_NO_OBJ_EXT).
> +	 * Bootstrap caches can't have sheaves for now (SLAB_NO_SHEAVES).
>  	 * SLAB_NOLEAKTRACE caches (e.g., kmemleak's object_cache) must not
>  	 * have sheaves to avoid recursion when sheaf allocation triggers
>  	 * kmemleak tracking.
>  	 */
> -	if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
> +	if (s->flags & (SLAB_NO_SHEAVES | SLAB_NOLEAKTRACE))
>  		return 0;
>  
>  	/*
> @@ -8559,7 +8559,8 @@ void __init kmem_cache_init(void)
>  
>  	create_boot_cache(kmem_cache_node, "kmem_cache_node",
>  			sizeof(struct kmem_cache_node),
> -			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
> +			SLAB_HWCACHE_ALIGN | SLAB_NO_SHEAVES | SLAB_NO_OBJ_EXT,
> +			0, 0);
>  
>  	hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
>  
> @@ -8569,7 +8570,8 @@ void __init kmem_cache_init(void)
>  	create_boot_cache(kmem_cache, "kmem_cache",
>  			offsetof(struct kmem_cache, per_node) +
>  				nr_node_ids * sizeof(struct kmem_cache_per_node_ptrs),
> -			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
> +			SLAB_HWCACHE_ALIGN | SLAB_NO_SHEAVES | SLAB_NO_OBJ_EXT,
> +			0, 0);
>  
>  	kmem_cache = bootstrap(&boot_kmem_cache);
>  	kmem_cache_node = bootstrap(&boot_kmem_cache_node);
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type
  2026-07-02  4:09 ` [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type Harry Yoo (Oracle)
@ 2026-07-02 12:57   ` Vlastimil Babka (SUSE)
  2026-07-02 13:20     ` Harry Yoo
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-07-02 12:57 UTC (permalink / raw)
  To: Harry Yoo (Oracle), Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Suren Baghdasaryan, Hao Ge,
	Kees Cook, Pedro Falcato, Shakeel Butt, Danielle Constantino
  Cc: linux-mm, linux-kernel

On 7/2/26 06:09, Harry Yoo (Oracle) wrote:
> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
> its own slab") avoided recursive allocation of obj_exts from kmalloc
> caches of the same size, by bumping the obj_exts array's allocation
> size whenever the array size equals the size of the object being
> allocated.
> 
> However, as reported by Danielle Costantino and Shakeel Butt,
> even slabs from kmalloc caches of different sizes can form a cycle
> by allocating obj_exts arrays from each other [1]:
> 
>   What happened: a KMALLOC_NORMAL slab's obj_exts array (used by
>   allocation profiling / memcg accounting) is itself kmalloc()'d from a
>   KMALLOC_NORMAL cache, so the "slab holds another slab's obj_exts array"
>   relation can form cycles. With sizeof(struct slabobj_ext) == 16 and
>   the host's geometry:
> 
>   - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
>     served from kmalloc-1k;
>   - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
>     served from kmalloc-512.
> 
>   A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
>   obj_exts array.  Discarding one frees the other's array, which empties
>   and discards that slab, which frees the first's array, and so on:
>   __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
>   __free_slab() recurses along the cycle until the stack is exhausted.
> 
> With memory allocation profiling, this allows unbounded recursion
> in the free path and led to a stack overflow on a production host in
> the Meta fleet [1]:
> 
>   BUG: TASK stack guard page was hit
>   Oops: stack guard page
>   RIP: 0010:kfree+0x8/0x5d0
>   Call Trace:
>    __free_slab+0x66/0xc0
>    kfree+0x3f0/0x5d0
>    ... ( ~125x __free_slab <-> kfree ) ...
>    <kernel driver freeing a resource>
>    do_syscall_64
> 
> It is proposed [1] to resolve this issue by always serving the obj_exts
> array allocation from kmalloc caches (or large kmalloc) of sizes larger
> than the object size. However, as pointed out by Vlastimil Babka [2],
> this can waste an excessive amount of memory as slabs from large
> kmalloc sizes (e.g. kmalloc-8k) generally need obj_exts arrays much
> smaller than the object size.
> 
> Therefore, rather than bumping the size, let us take a different
> approach; disallow formation of cycles between kmalloc types when
> allocating obj_exts arrays. Currently, all obj_exts arrays are served
> from normal kmalloc caches. Cycles cannot be created if obj_exts arrays
> of normal kmalloc caches are served from a special kmalloc type that can
> never have obj_exts arrays.
> 
> To achieve this, create a new kmalloc type called KMALLOC_NO_OBJ_EXT.
> KMALLOC_NO_OBJ_EXT caches are created when CONFIG_SLAB_OBJ_EXT is
> enabled, and they have SLAB_NO_OBJ_EXT flag to prevent allocation
> of obj_exts arrays. They remain unused until allocation of obj_exts
> arrays for normal kmalloc caches happens.

I wonder if we should just use them always (not just for kmalloc_normal) if
we already have them. Would there be any downside?

> Sheaf boostrapping for KMALLOC_NO_OBJ_EXT caches now must be deferred
> because allocation of a barn can trigger obj_exts array allocation of
> normal kmalloc caches when the KMALLOC_NO_OBJ_EXT cache for that size
> is not ready yet. For simplicity, perform bootstrapping of sheaves for
> all kmalloc caches later.
> 
> Introduce a new slab alloc flag, SLAB_ALLOC_NO_OBJ_EXT, to prevent
> allocation of obj_exts arrays, and let kmalloc_slab() override the type
> to KMALLOC_NO_OBJ_EXT when specified. Note that kmalloc_type() remains
> unchanged because kmalloc_flags() bypasses the kmalloc fastpath.
> 
> Do not pass SLAB_ALLOC_NO_RECURSE to kmalloc_flags() in
> alloc_slab_obj_exts() and instead use SLAB_ALLOC_NO_OBJ_EXT only when
> the objects are allocated from normal kmalloc caches. While this
> prevents unbounded recursive allocation of obj_exts, it allows
> KMALLOC_NO_OBJ_EXT caches to have sheaves.
> 
> Since sheaf allocations specify SLAB_ALLOC_NO_RECURSE that prevents
> allocation of both sheaves and obj_exts arrays, the recursion depth
> is bounded.
> 
> Reported-by: Danielle Costantino <dcostantino@meta.com>
> Reported-by: Shakeel Butt <shakeel.butt@linux.dev>
> Closes: https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@linux.dev [1]
> Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/linux-mm/c5c4208d-a6f0-413e-bad9-49be12f12d55@kernel.org [2]
> Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
> ---
>  include/linux/slab.h |  3 ++
>  mm/slab.h            | 17 +++++++++--
>  mm/slab_common.c     | 18 +++++++++++-
>  mm/slub.c            | 83 +++++++++++++++++++++-------------------------------
>  4 files changed, 68 insertions(+), 53 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 08d7b6c9c4d6..0c1d13773523 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -721,6 +721,9 @@ enum kmalloc_cache_type {
>  #endif
>  #ifdef CONFIG_MEMCG
>  	KMALLOC_CGROUP,
> +#endif
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +	KMALLOC_NO_OBJ_EXT,
>  #endif
>  	NR_KMALLOC_TYPES
>  };
> diff --git a/mm/slab.h b/mm/slab.h
> index 281a65233795..0428cd495191 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -22,6 +22,7 @@
>  #define SLAB_ALLOC_NOLOCK	0x01 /* a kmalloc_nolock() allocation */
>  #define SLAB_ALLOC_NEW_SLAB	0x02 /* a flag for alloc_slab_obj_exts() */
>  #define SLAB_ALLOC_NO_RECURSE	0x04 /* prevent kmalloc() recursion */
> +#define SLAB_ALLOC_NO_OBJ_EXT	0x08 /* prevent obj_exts array allocation */
>  
>  static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
>  {
> @@ -386,12 +387,19 @@ static inline unsigned int size_index_elem(unsigned int bytes)
>   * KMALLOC_MAX_CACHE_SIZE and the caller must check that.
>   */
>  static inline struct kmem_cache *
> -kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token)
> +kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, kmalloc_token_t token,
> +	     unsigned int alloc_flags)
>  {
>  	unsigned int index;
> +	enum kmalloc_cache_type type = kmalloc_type(flags, token);
> +
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +	if (alloc_flags & SLAB_ALLOC_NO_OBJ_EXT)
> +		type = KMALLOC_NO_OBJ_EXT;
> +#endif
>  
>  	if (!b)
> -		b = &kmalloc_caches[kmalloc_type(flags, token)];
> +		b = &kmalloc_caches[type];
>  	if (size <= 192)
>  		index = kmalloc_size_index[size_index_elem(size)];
>  	else
> @@ -426,6 +434,11 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
>  {
>  	if (!is_kmalloc_cache(s))
>  		return false;
> +
> +	/* KMALLOC_NO_OBJ_EXT is not normal kmalloc */
> +	if (s->flags & SLAB_NO_OBJ_EXT)
> +		return false;

Could it just go the the test below?

> +
>  	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
>  }
>  
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index b6426d7ceec9..7f262134d0f2 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -783,11 +783,15 @@ u8 kmalloc_size_index[24] __ro_after_init = {
>  size_t kmalloc_size_roundup(size_t size)
>  {
>  	if (size && size <= KMALLOC_MAX_CACHE_SIZE) {
> +		struct kmem_cache *s;
> +
>  		/*
>  		 * The flags don't matter since size_index is common to all.
>  		 * Neither does the caller for just getting ->object_size.
>  		 */
> -		return kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0))->object_size;
> +		s = kmalloc_slab(size, NULL, GFP_KERNEL, __kmalloc_token(0),
> +				 SLAB_ALLOC_DEFAULT);
> +		return s->object_size;
>  	}
>  
>  	/* Above the smaller buckets, size is a multiple of page size. */
> @@ -843,6 +847,12 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
>  #define KMALLOC_PARTITION_NAME(N, sz)
>  #endif
>  
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +#define KMALLOC_NO_OBJ_EXT_NAME(sz) .name[KMALLOC_NO_OBJ_EXT] = "kmalloc-no-objext-" #sz,
> +#else
> +#define KMALLOC_NO_OBJ_EXT_NAME(sz)
> +#endif
> +
>  #define INIT_KMALLOC_INFO(__size, __short_size)			\
>  {								\
>  	.name[KMALLOC_NORMAL]  = "kmalloc-" #__short_size,	\
> @@ -850,6 +860,7 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
>  	KMALLOC_CGROUP_NAME(__short_size)			\
>  	KMALLOC_DMA_NAME(__short_size)				\
>  	KMALLOC_PARTITION_NAME(KMALLOC_PARTITION_CACHES_NR, __short_size)	\
> +	KMALLOC_NO_OBJ_EXT_NAME(__short_size)			\
>  	.size = __size,						\
>  }
>  
> @@ -966,6 +977,11 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type)
>  		flags |= SLAB_NO_MERGE;
>  #endif
>  
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +	if (type == KMALLOC_NO_OBJ_EXT)
> +		flags |= SLAB_NO_OBJ_EXT | SLAB_NO_MERGE;
> +#endif
> +
>  	/*
>  	 * If CONFIG_MEMCG is enabled, disable cache merging for
>  	 * KMALLOC_NORMAL caches.
> diff --git a/mm/slub.c b/mm/slub.c
> index efc85053ae84..8428b8308856 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2123,42 +2123,6 @@ static inline void init_slab_obj_exts(struct slab *slab)
>  	slab->obj_exts = 0;
>  }
>  
> -/*
> - * Calculate the allocation size for slabobj_ext array.
> - *
> - * When memory allocation profiling is enabled, the obj_exts array
> - * could be allocated from the same slab cache it's being allocated for.
> - * This would prevent the slab from ever being freed because it would
> - * always contain at least one allocated object (its own obj_exts array).
> - *
> - * To avoid this, increase the allocation size when we detect the array
> - * may come from the same cache, forcing it to use a different cache.
> - */
> -static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
> -					 struct slab *slab, gfp_t gfp)
> -{
> -	size_t sz = sizeof(struct slabobj_ext) * slab->objects;
> -	struct kmem_cache *obj_exts_cache;
> -
> -	if (sz > KMALLOC_MAX_CACHE_SIZE)
> -		return sz;
> -
> -	if (!is_kmalloc_normal(s))
> -		return sz;
> -
> -	obj_exts_cache = kmalloc_slab(sz, NULL, gfp, __kmalloc_token(0));
> -	/*
> -	 * We can't simply compare s with obj_exts_cache, because partitioned kmalloc
> -	 * caches have multiple caches per size, selected by caller address or type.
> -	 * Since caller address or type may differ between kmalloc_slab() and actual
> -	 * allocation, bump size when sizes are equal.
> -	 */
> -	if (s->object_size == obj_exts_cache->object_size)
> -		return obj_exts_cache->object_size + 1;
> -
> -	return sz;
> -}
> -
>  int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>  			gfp_t gfp, unsigned int alloc_flags)
>  {
> @@ -2168,14 +2132,18 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>  	unsigned long new_exts;
>  	unsigned long old_exts;
>  	struct slabobj_ext *vec;
> -	size_t sz;
> +	size_t sz = sizeof(struct slabobj_ext) * slab->objects;
>  
>  	gfp &= ~OBJCGS_CLEAR_MASK;
> -	/* Prevent recursive extension vector allocation */
> -	alloc_flags |= SLAB_ALLOC_NO_RECURSE;
> -	alloc_flags &= ~SLAB_ALLOC_NEW_SLAB;
> +	/*
> +	 * In most cases, obj_exts arrays are allocated from normal kmalloc.
> +	 * However, normal kmalloc caches must allocate them from
> +	 * KMALLOC_NO_OBJ_EXT to caches to prevent recursion.
> +	 */
> +	if (is_kmalloc_normal(s))
> +		alloc_flags |= SLAB_ALLOC_NO_OBJ_EXT;
>  
> -	sz = obj_exts_alloc_size(s, slab, gfp);
> +	alloc_flags &= ~SLAB_ALLOC_NEW_SLAB;
>  
>  	/* This will use kmalloc_nolock() if alloc_flags say so */
>  	vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));
> @@ -2193,8 +2161,21 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>  		return -ENOMEM;
>  	}
>  
> -	VM_WARN_ON_ONCE(virt_to_slab(vec) != NULL &&
> -			virt_to_slab(vec)->slab_cache == s);
> +	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
> +		struct kmem_cache *exts_cache;
> +		struct slab *exts_slab;
> +
> +		exts_slab = virt_to_slab(vec);
> +		if (exts_slab) {
> +			/*
> +			 * The vector must be allocated from either normal or
> +			 * KMALLOC_NO_OBJ_EXT kmalloc caches to avoid cycles.
> +			 */
> +			exts_cache = virt_to_slab(vec)->slab_cache;
> +			WARN_ON_ONCE(!is_kmalloc_normal(exts_cache) &&
> +					!(exts_cache->flags & SLAB_NO_OBJ_EXT));
> +		}
> +	}
>  
>  	new_exts = (unsigned long)vec;
>  #ifdef CONFIG_MEMCG
> @@ -2254,7 +2235,7 @@ static inline void free_slab_obj_exts(struct slab *slab, bool allow_spin)
>  	}
>  
>  	/*
> -	 * obj_exts was created with SLAB_ALLOC_NO_RECURSE flag, therefore its
> +	 * obj_exts was created with SLAB_ALLOC_NO_OBJ_EXT flag, therefore its
>  	 * corresponding extension will be NULL. alloc_tag_sub() will throw a
>  	 * warning if slab has extensions but the extension of an object is
>  	 * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
> @@ -5330,7 +5311,7 @@ void *__do_kmalloc_node(kmem_buckets *b, gfp_t flags, int node,
>  	if (unlikely(!size))
>  		return ZERO_SIZE_PTR;
>  
> -	s = kmalloc_slab(size, b, flags, token);
> +	s = kmalloc_slab(size, b, flags, token, ac->alloc_flags);
>  
>  	ret = slab_alloc_node(s, flags, node, ac);
>  	ret = kasan_kmalloc(s, ret, size, flags);
> @@ -5395,7 +5376,9 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f
>  retry:
>  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
>  		return NULL;
> -	s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token));
> +
> +	s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token),
> +			 ac->alloc_flags);
>  
>  	if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
>  		/*
> @@ -7957,10 +7940,10 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
>  		s->allocflags |= __GFP_RECLAIMABLE;
>  
>  	/*
> -	 * For KMALLOC_NORMAL caches we enable sheaves later by
> -	 * bootstrap_kmalloc_sheaves() to avoid recursion
> +	 * For kmalloc caches we enable sheaves later by
> +	 * bootstrap_kmalloc_sheaves() to avoid recursion.
>  	 */
> -	if (!is_kmalloc_normal(s))
> +	if (!(s->flags & SLAB_KMALLOC))

is_kmalloc_cache()?

>  		s->sheaf_capacity = calculate_sheaf_capacity(s, args);
>  
>  	/*
> @@ -8524,7 +8507,7 @@ static void __init bootstrap_kmalloc_sheaves(void)
>  {
>  	enum kmalloc_cache_type type;
>  
> -	for (type = KMALLOC_NORMAL; type <= KMALLOC_PARTITION_END; type++) {
> +	for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) {
>  		for (int idx = 0; idx < KMALLOC_SHIFT_HIGH + 1; idx++) {
>  			if (kmalloc_caches[type][idx])
>  				bootstrap_cache_sheaves(kmalloc_caches[type][idx]);
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type
  2026-07-02 12:57   ` Vlastimil Babka (SUSE)
@ 2026-07-02 13:20     ` Harry Yoo
  0 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2026-07-02 13:20 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Suren Baghdasaryan, Hao Ge,
	Kees Cook, Pedro Falcato, Shakeel Butt, Danielle Constantino
  Cc: linux-mm, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4623 bytes --]



On 7/2/26 9:57 PM, Vlastimil Babka (SUSE) wrote:
> On 7/2/26 06:09, Harry Yoo (Oracle) wrote:
>> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
>> its own slab") avoided recursive allocation of obj_exts from kmalloc
>> caches of the same size, by bumping the obj_exts array's allocation
>> size whenever the array size equals the size of the object being
>> allocated.
>>
>> However, as reported by Danielle Costantino and Shakeel Butt,
>> even slabs from kmalloc caches of different sizes can form a cycle
>> by allocating obj_exts arrays from each other [1]:
>>
>>   What happened: a KMALLOC_NORMAL slab's obj_exts array (used by
>>   allocation profiling / memcg accounting) is itself kmalloc()'d from a
>>   KMALLOC_NORMAL cache, so the "slab holds another slab's obj_exts array"
>>   relation can form cycles. With sizeof(struct slabobj_ext) == 16 and
>>   the host's geometry:
>>
>>   - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
>>     served from kmalloc-1k;
>>   - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
>>     served from kmalloc-512.
>>
>>   A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
>>   obj_exts array.  Discarding one frees the other's array, which empties
>>   and discards that slab, which frees the first's array, and so on:
>>   __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
>>   __free_slab() recurses along the cycle until the stack is exhausted.
>>
>> With memory allocation profiling, this allows unbounded recursion
>> in the free path and led to a stack overflow on a production host in
>> the Meta fleet [1]:
>>
>>   BUG: TASK stack guard page was hit
>>   Oops: stack guard page
>>   RIP: 0010:kfree+0x8/0x5d0
>>   Call Trace:
>>    __free_slab+0x66/0xc0
>>    kfree+0x3f0/0x5d0
>>    ... ( ~125x __free_slab <-> kfree ) ...
>>    <kernel driver freeing a resource>
>>    do_syscall_64
>>
>> It is proposed [1] to resolve this issue by always serving the obj_exts
>> array allocation from kmalloc caches (or large kmalloc) of sizes larger
>> than the object size. However, as pointed out by Vlastimil Babka [2],
>> this can waste an excessive amount of memory as slabs from large
>> kmalloc sizes (e.g. kmalloc-8k) generally need obj_exts arrays much
>> smaller than the object size.
>>
>> Therefore, rather than bumping the size, let us take a different
>> approach; disallow formation of cycles between kmalloc types when
>> allocating obj_exts arrays. Currently, all obj_exts arrays are served
>> from normal kmalloc caches. Cycles cannot be created if obj_exts arrays
>> of normal kmalloc caches are served from a special kmalloc type that can
>> never have obj_exts arrays.
>>
>> To achieve this, create a new kmalloc type called KMALLOC_NO_OBJ_EXT.
>> KMALLOC_NO_OBJ_EXT caches are created when CONFIG_SLAB_OBJ_EXT is
>> enabled, and they have SLAB_NO_OBJ_EXT flag to prevent allocation
>> of obj_exts arrays. They remain unused until allocation of obj_exts
>> arrays for normal kmalloc caches happens.
> 
> I wonder if we should just use them always (not just for kmalloc_normal) if
> we already have them. Would there be any downside?

Good point!

That's more intuitive and sounds like it's good to separate them because
likely obj_exts will have longer lifetime than slab objects.

Not sure about impact on memory usage, need to check.
I'd say it's fine as long as it doesn't clearly increase memory usage.

But I guess that should not be part of bugfix as it's a functional
change that is not required to fix the bug.

>> @@ -426,6 +434,11 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
>>  {
>>  	if (!is_kmalloc_cache(s))
>>  		return false;
>> +
>> +	/* KMALLOC_NO_OBJ_EXT is not normal kmalloc */
>> +	if (s->flags & SLAB_NO_OBJ_EXT)
>> +		return false;
> 
> Could it just go the the test below?

Yes!

>> +
>>  	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));

>> @@ -7957,10 +7940,10 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
>>  		s->allocflags |= __GFP_RECLAIMABLE;
>>  
>>  	/*
>> -	 * For KMALLOC_NORMAL caches we enable sheaves later by
>> -	 * bootstrap_kmalloc_sheaves() to avoid recursion
>> +	 * For kmalloc caches we enable sheaves later by
>> +	 * bootstrap_kmalloc_sheaves() to avoid recursion.
>>  	 */
>> -	if (!is_kmalloc_normal(s))
>> +	if (!(s->flags & SLAB_KMALLOC))
> 
> is_kmalloc_cache()?

Will do, thanks!

-- 
Cheers,
Harry / Hyeonggon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-07-02 13:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02  4:09 [PATCH RFC hotfixes 0/2] mm/slab: fix unbounded recursion in free path with memalloc profiling Harry Yoo (Oracle)
2026-07-02  4:09 ` [PATCH RFC hotfixes 1/2] mm/slab: decouple SLAB_NO_SHEAVES from SLAB_NO_OBJ_EXT Harry Yoo (Oracle)
2026-07-02 12:49   ` Vlastimil Babka (SUSE)
2026-07-02  4:09 ` [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type Harry Yoo (Oracle)
2026-07-02 12:57   ` Vlastimil Babka (SUSE)
2026-07-02 13:20     ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox