[PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
@ 2026-06-25 23:00 Shakeel Butt
  2026-06-26  4:22 ` Harry Yoo
  0 siblings, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2026-06-25 23:00 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: Harry Yoo, Roman Gushchin, Hao Li, Christoph Lameter,
	David Rientjes, Suren Baghdasaryan, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino

A production host in the Meta fleet (6.16 kernel, memory allocation
profiling enabled) panicked with a kernel stack overflow while a kernel
driver was freeing a resource:

  BUG: TASK stack guard page was hit
  Oops: stack guard page
  RIP: 0010:kfree+0x8/0x5d0
  Call Trace:
   __free_slab+0x66/0xc0
   kfree+0x3f0/0x5d0
   ... ( ~125x __free_slab <-> kfree ) ...
   <kernel driver freeing a resource>
   do_syscall_64

The crash dump shows a 125-deep __free_slab<->kfree recursion that
overflowed the 16 KiB kernel stack.

What happened: a KMALLOC_NORMAL slab's obj_exts array (used by allocation
profiling / memcg accounting) is itself kmalloc()'d from a KMALLOC_NORMAL
cache, so the "slab holds another slab's obj_exts array" relation can form
cycles.  With sizeof(struct slabobj_ext) == 16 and the host's geometry:

  - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
    served from kmalloc-1k;
  - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
    served from kmalloc-512.

A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
obj_exts array.  Discarding one frees the other's array, which empties and
discards that slab, which frees the first's array, and so on:
__free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
__free_slab() recurses along the cycle until the stack is exhausted.  The
dump confirms it: the recursion's slabs strictly alternate kmalloc-512
(obj_exts in kmalloc-1k) and kmalloc-1k (obj_exts in kmalloc-512), and
mem_alloc_profiling_key was enabled.

Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
its own slab") is not sufficient: it bumps the allocation size only when
the array would come from the *same* cache (object_size ==).  At the
geometry above neither cache is self-referential (512 != 1024 and
1024 != 512), so the bump never triggers and the kmalloc-512 <-> kmalloc-1k
cross cycle remains.

Fix it structurally by removing cycles of every shape: serve the array
from a cache strictly larger than the one it describes whenever it would
otherwise come from the same or a smaller cache.  Every reference edge
then points from a smaller to a larger cache (here kmalloc-1k's array
moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
No slab can be self- or cross-pinned, the tear-down recursion is bounded
by the number of kmalloc size classes (it terminates at the large-kmalloc
path, which carries no obj_exts), and profiling/accounting coverage is
unchanged - the array is still allocated, only relocated.

Reproduced on next-20260623 at the same geometry: churning
kmalloc-512/kmalloc-1k under vm.mem_profiling and then shrinking leaves
kmalloc-512 with thousands of unreclaimable objects without this patch
(8056) and at baseline with it (847).

Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reported-by: Danielle Costantino <dcostantino@meta.com>
---
 mm/slub.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9ec774dc7009..48e54d340865 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2124,15 +2124,14 @@ static inline void init_slab_obj_exts(struct slab *slab)
 }

 /*
- * Calculate the allocation size for slabobj_ext array.
+ * Size of the slabobj_ext array for @slab.
  *
- * When memory allocation profiling is enabled, the obj_exts array
- * could be allocated from the same slab cache it's being allocated for.
- * This would prevent the slab from ever being freed because it would
- * always contain at least one allocated object (its own obj_exts array).
- *
- * To avoid this, increase the allocation size when we detect the array
- * may come from the same cache, forcing it to use a different cache.
+ * The array is itself kmalloc()'d. If it came from the same or a smaller
+ * kmalloc cache than @s, the "slab holds another slab's array" relation could
+ * form a cycle (self, or e.g. kmalloc-512 <-> kmalloc-1k) that pins the slabs
+ * forever and recurses via free_slab_obj_exts() -> kfree() -> discard_slab()
+ * at teardown. Force it into a strictly larger cache to keep that relation a
+ * DAG (acyclic).
  */
 static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
 					 struct slab *slab, gfp_t gfp)
@@ -2147,14 +2146,9 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
 		return sz;

 	obj_exts_cache = kmalloc_slab(sz, NULL, gfp, __kmalloc_token(0));
-	/*
-	 * We can't simply compare s with obj_exts_cache, because partitioned kmalloc
-	 * caches have multiple caches per size, selected by caller address or type.
-	 * Since caller address or type may differ between kmalloc_slab() and actual
-	 * allocation, bump size when sizes are equal.
-	 */
-	if (s->object_size == obj_exts_cache->object_size)
-		return obj_exts_cache->object_size + 1;
+	/* compare object_size, not the cache pointer (partitioned kmalloc caches) */
+	if (obj_exts_cache->object_size <= s->object_size)
+		return s->object_size + 1;

 	return sz;
 }
-- 
2.53.0-Meta

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-25 23:00 [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache Shakeel Butt
@ 2026-06-26  4:22 ` Harry Yoo
  2026-06-26 16:49   ` Shakeel Butt
  0 siblings, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-26  4:22 UTC (permalink / raw)
  To: Shakeel Butt, Vlastimil Babka, Andrew Morton
  Cc: Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Suren Baghdasaryan, Usama Arif, Meta kernel team, linux-mm,
	linux-kernel, Danielle Costantino


[-- Attachment #1.1: Type: text/plain, Size: 6242 bytes --]


Hi Shakeel,

On 6/26/26 8:00 AM, Shakeel Butt wrote:
> A production host in the Meta fleet (6.16 kernel, memory allocation
> profiling enabled) panicked with a kernel stack overflow while a kernel
> driver was freeing a resource:
> 
>   BUG: TASK stack guard page was hit
>   Oops: stack guard page
>   RIP: 0010:kfree+0x8/0x5d0
>   Call Trace:
>    __free_slab+0x66/0xc0
>    kfree+0x3f0/0x5d0
>    ... ( ~125x __free_slab <-> kfree ) ...
>    <kernel driver freeing a resource>
>    do_syscall_64
> 
> The crash dump shows a 125-deep __free_slab<->kfree recursion that
> overflowed the 16 KiB kernel stack.

Ouch!

> What happened: a KMALLOC_NORMAL slab's obj_exts array (used by allocation
> profiling / memcg accounting) is itself kmalloc()'d from a KMALLOC_NORMAL
> cache,

Usually KMALLOC_NORMAL caches don't need obj_exts array, but yes,
this could happen if memory allocation profiling is enabled.

> so the "slab holds another slab's obj_exts array" relation can form
> cycles.  With sizeof(struct slabobj_ext) == 16 and the host's geometry:
>
>   - kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
>     served from kmalloc-1k;
>   - kmalloc-1k  has 32 objects/slab -> array is 32*16 ==  512 bytes,
>     served from kmalloc-512.

Right.

> A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
> obj_exts array.  Discarding one frees the other's array, which empties and
> discards that slab, which frees the first's array, and so on:
> __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
> __free_slab() recurses along the cycle until the stack is exhausted.

Right.

> The
> dump confirms it: the recursion's slabs strictly alternate kmalloc-512
> (obj_exts in kmalloc-1k) and kmalloc-1k (obj_exts in kmalloc-512), and
> mem_alloc_profiling_key was enabled.
> 
> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
> its own slab") is not sufficient: it bumps the allocation size only when
> the array would come from the *same* cache (object_size ==).  At the
> geometry above neither cache is self-referential (512 != 1024 and
> 1024 != 512), so the bump never triggers and the kmalloc-512 <-> kmalloc-1k
> cross cycle remains.

Right.

> Fix it structurally by removing cycles of every shape: serve the array
> from a cache strictly larger than the one it describes whenever it would
> otherwise come from the same or a smaller cache.  Every reference edge
> then points from a smaller to a larger cache (here kmalloc-1k's array
> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.

This will fix the problem.

But this will waste memory as we need smaller obj_exts array
as the size gets larger.

We should probably create a new kmalloc type to avoid cycles instead?
(needed only when memory profiling is enabled, though)

That would also prevent recursion even further.

> No slab can be self- or cross-pinned, the tear-down recursion is bounded
> by the number of kmalloc size classes (it terminates at the large-kmalloc
> path, which carries no obj_exts), and profiling/accounting coverage is
> unchanged - the array is still allocated, only relocated.
> 
> Reproduced on next-20260623 at the same geometry: churning
> kmalloc-512/kmalloc-1k under vm.mem_profiling and then shrinking leaves
> kmalloc-512 with thousands of unreclaimable objects without this patch
> (8056) and at baseline with it (847).
> 
> Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")

Perhaps Cc: stable? v6.12 and v6.18 are affected.

> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reported-by: Danielle Costantino <dcostantino@meta.com>
> ---
>  mm/slub.c | 26 ++++++++++----------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 9ec774dc7009..48e54d340865 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2124,15 +2124,14 @@ static inline void init_slab_obj_exts(struct slab *slab)
>  }
>  
>  /*
> - * Calculate the allocation size for slabobj_ext array.
> + * Size of the slabobj_ext array for @slab.
>   *
> - * When memory allocation profiling is enabled, the obj_exts array
> - * could be allocated from the same slab cache it's being allocated for.
> - * This would prevent the slab from ever being freed because it would
> - * always contain at least one allocated object (its own obj_exts array).
> - *
> - * To avoid this, increase the allocation size when we detect the array
> - * may come from the same cache, forcing it to use a different cache.
> + * The array is itself kmalloc()'d. If it came from the same or a smaller
> + * kmalloc cache than @s, the "slab holds another slab's array" relation could
> + * form a cycle (self, or e.g. kmalloc-512 <-> kmalloc-1k) that pins the slabs
> + * forever and recurses via free_slab_obj_exts() -> kfree() -> discard_slab()
> + * at teardown. Force it into a strictly larger cache to keep that relation a
> + * DAG (acyclic).
>   */
>  static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
>  					 struct slab *slab, gfp_t gfp)
> @@ -2147,14 +2146,9 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
>  		return sz;
>  
>  	obj_exts_cache = kmalloc_slab(sz, NULL, gfp, __kmalloc_token(0));
> -	/*
> -	 * We can't simply compare s with obj_exts_cache, because partitioned kmalloc
> -	 * caches have multiple caches per size, selected by caller address or type.
> -	 * Since caller address or type may differ between kmalloc_slab() and actual
> -	 * allocation, bump size when sizes are equal.
> -	 */
> -	if (s->object_size == obj_exts_cache->object_size)
> -		return obj_exts_cache->object_size + 1;
> +	/* compare object_size, not the cache pointer (partitioned kmalloc caches) */

This comment is no longer relevant, by the way.

"compare object_size instead of cache pointers because there can be
 multiple caches of the same size" doesn't apply anymore.

> +	if (obj_exts_cache->object_size <= s->object_size)
> +		return s->object_size + 1;
>  
>  	return sz;
>  }

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-26  4:22 ` Harry Yoo
@ 2026-06-26 16:49   ` Shakeel Butt
  2026-06-26 17:11     ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2026-06-26 16:49 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka, Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Suren Baghdasaryan, Usama Arif,
	Meta kernel team, linux-mm, linux-kernel, Danielle Costantino

On Fri, Jun 26, 2026 at 01:22:09PM +0900, Harry Yoo wrote:
> 
> Hi Shakeel,
> 

[...]

> > What happened: a KMALLOC_NORMAL slab's obj_exts array (used by allocation
> > profiling / memcg accounting) is itself kmalloc()'d from a KMALLOC_NORMAL
> > cache,
> 
> Usually KMALLOC_NORMAL caches don't need obj_exts array, but yes,
> this could happen if memory allocation profiling is enabled.

Yes, we have enabled memory allocation profiling fleet wide.

[...]

> 
> > Fix it structurally by removing cycles of every shape: serve the array
> > from a cache strictly larger than the one it describes whenever it would
> > otherwise come from the same or a smaller cache.  Every reference edge
> > then points from a smaller to a larger cache (here kmalloc-1k's array
> > moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> 
> This will fix the problem.
> 
> But this will waste memory as we need smaller obj_exts array
> as the size gets larger.
> 
> We should probably create a new kmalloc type to avoid cycles instead?
> (needed only when memory profiling is enabled, though)
> 
> That would also prevent recursion even further.

Yes but I assume that would add kmem caches even for users not using memory
profiling. Anyways, I think that is a separate discussion. Am I understanding
correctly that you don't have any concerns with this approach?

> 
> > No slab can be self- or cross-pinned, the tear-down recursion is bounded
> > by the number of kmalloc size classes (it terminates at the large-kmalloc
> > path, which carries no obj_exts), and profiling/accounting coverage is
> > unchanged - the array is still allocated, only relocated.
> > 
> > Reproduced on next-20260623 at the same geometry: churning
> > kmalloc-512/kmalloc-1k under vm.mem_profiling and then shrinking leaves
> > kmalloc-512 with thousands of unreclaimable objects without this patch
> > (8056) and at baseline with it (847).
> > 
> > Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
> 
> Perhaps Cc: stable? v6.12 and v6.18 are affected.

Ack.

[...]

> > -	if (s->object_size == obj_exts_cache->object_size)
> > -		return obj_exts_cache->object_size + 1;
> > +	/* compare object_size, not the cache pointer (partitioned kmalloc caches) */
> 
> This comment is no longer relevant, by the way.
> 
> "compare object_size instead of cache pointers because there can be
>  multiple caches of the same size" doesn't apply anymore.
> 

I will remove the comment in next version.

Thanks for the review.

> > +	if (obj_exts_cache->object_size <= s->object_size)
> > +		return s->object_size + 1;
> >  
> >  	return sz;
> >  }
> 
> -- 
> Cheers,
> Harry / Hyeonggon





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-26 16:49   ` Shakeel Butt
@ 2026-06-26 17:11     ` Vlastimil Babka (SUSE)
  2026-06-28  2:58       ` Shakeel Butt
  2026-06-28  8:10       ` Harry Yoo
  0 siblings, 2 replies; 25+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-26 17:11 UTC (permalink / raw)
  To: Shakeel Butt, Harry Yoo
  Cc: Andrew Morton, Roman Gushchin, Hao Li, Christoph Lameter,
	David Rientjes, Suren Baghdasaryan, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino

On 6/26/26 18:49, Shakeel Butt wrote:
> On Fri, Jun 26, 2026 at 01:22:09PM +0900, Harry Yoo wrote:
>> 
>> Hi Shakeel,
>> 
> 
> [...]
> 
>> > What happened: a KMALLOC_NORMAL slab's obj_exts array (used by allocation
>> > profiling / memcg accounting) is itself kmalloc()'d from a KMALLOC_NORMAL
>> > cache,
>> 
>> Usually KMALLOC_NORMAL caches don't need obj_exts array, but yes,
>> this could happen if memory allocation profiling is enabled.
> 
> Yes, we have enabled memory allocation profiling fleet wide.
> 
> [...]
> 
>> 
>> > Fix it structurally by removing cycles of every shape: serve the array
>> > from a cache strictly larger than the one it describes whenever it would
>> > otherwise come from the same or a smaller cache.  Every reference edge
>> > then points from a smaller to a larger cache (here kmalloc-1k's array
>> > moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>> 
>> This will fix the problem.
>> 
>> But this will waste memory as we need smaller obj_exts array
>> as the size gets larger.
>> 
>> We should probably create a new kmalloc type to avoid cycles instead?
>> (needed only when memory profiling is enabled, though)
>> 
>> That would also prevent recursion even further.
> 
> Yes but I assume that would add kmem caches even for users not using memory
> profiling. Anyways, I think that is a separate discussion. Am I understanding
> correctly that you don't have any concerns with this approach?

Umm, the memory waste is a concern?

Minimally I'd now want to only do that size bumping when allocation
profiling is enabled. Ideally that means both configured in and not booted
with "never".

We probably should have done that already in 280ea9c3154b2. Because AFAIU
memcg-only obj_exts array don't have this issue (or maybe they do have the
[1] issue? Harry?). But if memcg-only should keep avoiding the same size
bucket, it can keep what it was doing and only memalloc profiling would do
the strictly larger thing.

Suren's input would be also nice to have.

Thanks!

[1] https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com

>> 
>> > No slab can be self- or cross-pinned, the tear-down recursion is bounded
>> > by the number of kmalloc size classes (it terminates at the large-kmalloc
>> > path, which carries no obj_exts), and profiling/accounting coverage is
>> > unchanged - the array is still allocated, only relocated.
>> > 
>> > Reproduced on next-20260623 at the same geometry: churning
>> > kmalloc-512/kmalloc-1k under vm.mem_profiling and then shrinking leaves
>> > kmalloc-512 with thousands of unreclaimable objects without this patch
>> > (8056) and at baseline with it (847).
>> > 
>> > Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
>> 
>> Perhaps Cc: stable? v6.12 and v6.18 are affected.
> 
> Ack.
> 
> [...]
> 
>> > -	if (s->object_size == obj_exts_cache->object_size)
>> > -		return obj_exts_cache->object_size + 1;
>> > +	/* compare object_size, not the cache pointer (partitioned kmalloc caches) */
>> 
>> This comment is no longer relevant, by the way.
>> 
>> "compare object_size instead of cache pointers because there can be
>>  multiple caches of the same size" doesn't apply anymore.
>> 
> 
> I will remove the comment in next version.
> 
> Thanks for the review.
> 
>> > +	if (obj_exts_cache->object_size <= s->object_size)
>> > +		return s->object_size + 1;
>> >  
>> >  	return sz;
>> >  }
>> 
>> -- 
>> Cheers,
>> Harry / Hyeonggon
> 
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-26 17:11     ` Vlastimil Babka (SUSE)
@ 2026-06-28  2:58       ` Shakeel Butt
  2026-06-28  3:23         ` Shakeel Butt
  2026-06-28  8:10       ` Harry Yoo
  1 sibling, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2026-06-28  2:58 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Harry Yoo, Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Suren Baghdasaryan, Usama Arif,
	Meta kernel team, linux-mm, linux-kernel, Danielle Costantino

On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
[...]
> >> > Fix it structurally by removing cycles of every shape: serve the array
> >> > from a cache strictly larger than the one it describes whenever it would
> >> > otherwise come from the same or a smaller cache.  Every reference edge
> >> > then points from a smaller to a larger cache (here kmalloc-1k's array
> >> > moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> >> 
> >> This will fix the problem.
> >> 
> >> But this will waste memory as we need smaller obj_exts array
> >> as the size gets larger.
> >> 
> >> We should probably create a new kmalloc type to avoid cycles instead?
> >> (needed only when memory profiling is enabled, though)
> >> 
> >> That would also prevent recursion even further.
> > 
> > Yes but I assume that would add kmem caches even for users not using memory
> > profiling. Anyways, I think that is a separate discussion. Am I understanding
> > correctly that you don't have any concerns with this approach?
> 
> Umm, the memory waste is a concern?
> 
> Minimally I'd now want to only do that size bumping when allocation
> profiling is enabled. Ideally that means both configured in and not booted
> with "never".
> 
> We probably should have done that already in 280ea9c3154b2. Because AFAIU
> memcg-only obj_exts array don't have this issue (or maybe they do have the
> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
> bucket, it can keep what it was doing and only memalloc profiling would do
> the strictly larger thing.

memcg should not have this issue as normal kmalloc caches do not serve memcg
charged objects. 

So here we can do dedicated caches as Harry suggested or make this size bumping
very specialized as Vlastimil suggested. What do we want long term? Orthogonally
we do want this fix to be backported easily to older stable kernels. I will see
how does this narrowed down size bumping looks like.

> 
> Suren's input would be also nice to have.
> 
> Thanks!
> 
> [1] https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-28  2:58       ` Shakeel Butt
@ 2026-06-28  3:23         ` Shakeel Butt
  2026-06-28  7:47           ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2026-06-28  3:23 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Harry Yoo, Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Suren Baghdasaryan, Usama Arif,
	Meta kernel team, linux-mm, linux-kernel, Danielle Costantino

On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
> [...]
> > >> > Fix it structurally by removing cycles of every shape: serve the array
> > >> > from a cache strictly larger than the one it describes whenever it would
> > >> > otherwise come from the same or a smaller cache.  Every reference edge
> > >> > then points from a smaller to a larger cache (here kmalloc-1k's array
> > >> > moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> > >> 
> > >> This will fix the problem.
> > >> 
> > >> But this will waste memory as we need smaller obj_exts array
> > >> as the size gets larger.
> > >> 
> > >> We should probably create a new kmalloc type to avoid cycles instead?
> > >> (needed only when memory profiling is enabled, though)
> > >> 
> > >> That would also prevent recursion even further.
> > > 
> > > Yes but I assume that would add kmem caches even for users not using memory
> > > profiling. Anyways, I think that is a separate discussion. Am I understanding
> > > correctly that you don't have any concerns with this approach?
> > 
> > Umm, the memory waste is a concern?
> > 
> > Minimally I'd now want to only do that size bumping when allocation
> > profiling is enabled. Ideally that means both configured in and not booted
> > with "never".
> > 
> > We probably should have done that already in 280ea9c3154b2. Because AFAIU
> > memcg-only obj_exts array don't have this issue (or maybe they do have the
> > [1] issue? Harry?). But if memcg-only should keep avoiding the same size
> > bucket, it can keep what it was doing and only memalloc profiling would do
> > the strictly larger thing.
> 
> memcg should not have this issue as normal kmalloc caches do not serve memcg
> charged objects. 

I am wrong here as I went back and see d8df600b67d7.

> 
> So here we can do dedicated caches as Harry suggested or make this size bumping
> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> we do want this fix to be backported easily to older stable kernels. I will see
> how does this narrowed down size bumping looks like.
> 

BTW I think we need something like the following, right?

	if (mem_alloc_profiling_enabled()) {
		if (obj_exts_cache->object_size <= s->object_size)
			return s->object_size + 1;
	} else {
		if (obj_exts_cache->object_size == s->object_size)
			return s->object_size + 1;
	}

> > 
> > Suren's input would be also nice to have.
> > 
> > Thanks!
> > 
> > [1] https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com
> > 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-28  3:23         ` Shakeel Butt
@ 2026-06-28  7:47           ` Vlastimil Babka (SUSE)
  2026-06-28  9:22             ` Harry Yoo
  0 siblings, 1 reply; 25+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-28  7:47 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Harry Yoo, Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Suren Baghdasaryan, Usama Arif,
	Meta kernel team, linux-mm, linux-kernel, Danielle Costantino

On 6/28/26 5:23 AM, Shakeel Butt wrote:
> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>> [...]
>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>
>>>>> This will fix the problem.
>>>>>
>>>>> But this will waste memory as we need smaller obj_exts array
>>>>> as the size gets larger.
>>>>>
>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>> (needed only when memory profiling is enabled, though)
>>>>>
>>>>> That would also prevent recursion even further.
>>>>
>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>> correctly that you don't have any concerns with this approach?
>>>
>>> Umm, the memory waste is a concern?
>>>
>>> Minimally I'd now want to only do that size bumping when allocation
>>> profiling is enabled. Ideally that means both configured in and not booted
>>> with "never".
>>>
>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>> the strictly larger thing.
>>
>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>> charged objects. 
> 
> I am wrong here as I went back and see d8df600b67d7.

(8dafa9f5900c upstream)

>>
>> So here we can do dedicated caches as Harry suggested or make this size bumping
>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally

Maybe long term we make kmem_buckets unconditional and use that.

>> we do want this fix to be backported easily to older stable kernels. I will see
>> how does this narrowed down size bumping looks like.
>>
> 
> BTW I think we need something like the following, right?
> 
> 	if (mem_alloc_profiling_enabled()) {
> 		if (obj_exts_cache->object_size <= s->object_size)
> 			return s->object_size + 1;
> 	} else {
> 		if (obj_exts_cache->object_size == s->object_size)
> 			return s->object_size + 1;
> 	}

Yeah.

>>>
>>> Suren's input would be also nice to have.
>>>
>>> Thanks!
>>>
>>> [1] https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com
>>>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-28  7:47           ` Vlastimil Babka (SUSE)
@ 2026-06-28  9:22             ` Harry Yoo
  2026-06-28 23:37               ` Suren Baghdasaryan
  0 siblings, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-28  9:22 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Shakeel Butt
  Cc: Andrew Morton, Roman Gushchin, Hao Li, Christoph Lameter,
	David Rientjes, Suren Baghdasaryan, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino


[-- Attachment #1.1: Type: text/plain, Size: 4115 bytes --]



On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
> On 6/28/26 5:23 AM, Shakeel Butt wrote:
>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>>> [...]
>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>>
>>>>>> This will fix the problem.
>>>>>>
>>>>>> But this will waste memory as we need smaller obj_exts array
>>>>>> as the size gets larger.
>>>>>>
>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>>> (needed only when memory profiling is enabled, though)
>>>>>>
>>>>>> That would also prevent recursion even further.
>>>>>
>>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>>> correctly that you don't have any concerns with this approach?
>>>>
>>>> Umm, the memory waste is a concern?
>>>>
>>>> Minimally I'd now want to only do that size bumping when allocation
>>>> profiling is enabled. Ideally that means both configured in and not booted
>>>> with "never".
>>>>
>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>>> the strictly larger thing.
>>>
>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>>> charged objects. 
>>
>> I am wrong here as I went back and see d8df600b67d7.

I was confused too :)

> (8dafa9f5900c upstream)
> 
>>>
>>> So here we can do dedicated caches as Harry suggested or make this size bumping
>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> 
> Maybe long term we make kmem_buckets unconditional and use that.
> 
>>> we do want this fix to be backported easily to older stable kernels. I will see
>>> how does this narrowed down size bumping looks like.
>>>
>>
>> BTW I think we need something like the following, right?
>>
>> 	if (mem_alloc_profiling_enabled()) {
>> 		if (obj_exts_cache->object_size <= s->object_size)
>> 			return s->object_size + 1;
>> 	} else {
>> 		if (obj_exts_cache->object_size == s->object_size)
>> 			return s->object_size + 1;
>> 	}

We should not add mem_alloc_profiling_enabled() check because,
then we're not fixing this issue on SLUB_TINY, when the caller specifies
__GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.

`if (!is_kmalloc_normal(s))` check already bails out when it doesn't
need to bump the size.

So Shakeel's original code will work fine.

We're only pessimizing memory allocation profiling and
SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
it wouldn't make much sense to enable MEMCG on memory restricted systems
anyway. (IIRC even raspberry pis don't enable the memory controller by
default...)

I think it's okay to fix the bug first, but we need to address
the memory wastage issue sooner or later if companies (Meta and
Google I guess?) are deploying kernels with memory allocation profiling
on in production systems.

Perhaps it's worth adding a comment like this, though:

/*
 * Only bump the size when the object (not the obj_exts array) is
 * allocated from KMALLOC_NORMAL, either by memory allocation profiling
 * or memcg on SLUB_TINY with __GFP_RECLAIMABLE|__GFP_ACCOUNT.
 * Otherwise, obj_exts allocations cannot form a cycle between
 * kmalloc caches.
 */
if (!is_kmalloc_normal(s))
        return sz;

Thanks!

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-28  9:22             ` Harry Yoo
@ 2026-06-28 23:37               ` Suren Baghdasaryan
  2026-06-29  3:57                 ` Harry Yoo
  0 siblings, 1 reply; 25+ messages in thread
From: Suren Baghdasaryan @ 2026-06-28 23:37 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino

On Sun, Jun 28, 2026 at 2:22 AM Harry Yoo <harry@kernel.org> wrote:
>
>
>
> On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
> > On 6/28/26 5:23 AM, Shakeel Butt wrote:
> >> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
> >>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
> >>> [...]
> >>>>>>> Fix it structurally by removing cycles of every shape: serve the array
> >>>>>>> from a cache strictly larger than the one it describes whenever it would
> >>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
> >>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
> >>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> >>>>>>
> >>>>>> This will fix the problem.
> >>>>>>
> >>>>>> But this will waste memory as we need smaller obj_exts array
> >>>>>> as the size gets larger.
> >>>>>>
> >>>>>> We should probably create a new kmalloc type to avoid cycles instead?
> >>>>>> (needed only when memory profiling is enabled, though)
> >>>>>>
> >>>>>> That would also prevent recursion even further.
> >>>>>
> >>>>> Yes but I assume that would add kmem caches even for users not using memory
> >>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
> >>>>> correctly that you don't have any concerns with this approach?
> >>>>
> >>>> Umm, the memory waste is a concern?
> >>>>
> >>>> Minimally I'd now want to only do that size bumping when allocation
> >>>> profiling is enabled. Ideally that means both configured in and not booted
> >>>> with "never".
> >>>>
> >>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
> >>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
> >>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
> >>>> bucket, it can keep what it was doing and only memalloc profiling would do
> >>>> the strictly larger thing.
> >>>
> >>> memcg should not have this issue as normal kmalloc caches do not serve memcg
> >>> charged objects.
> >>
> >> I am wrong here as I went back and see d8df600b67d7.
>
> I was confused too :)
>
> > (8dafa9f5900c upstream)
> >
> >>>
> >>> So here we can do dedicated caches as Harry suggested or make this size bumping
> >>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> >
> > Maybe long term we make kmem_buckets unconditional and use that.
> >
> >>> we do want this fix to be backported easily to older stable kernels. I will see
> >>> how does this narrowed down size bumping looks like.
> >>>
> >>
> >> BTW I think we need something like the following, right?
> >>
> >>      if (mem_alloc_profiling_enabled()) {
> >>              if (obj_exts_cache->object_size <= s->object_size)
> >>                      return s->object_size + 1;
> >>      } else {
> >>              if (obj_exts_cache->object_size == s->object_size)
> >>                      return s->object_size + 1;
> >>      }
>
> We should not add mem_alloc_profiling_enabled() check because,
> then we're not fixing this issue on SLUB_TINY, when the caller specifies
> __GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.
>
> `if (!is_kmalloc_normal(s))` check already bails out when it doesn't
> need to bump the size.
>
> So Shakeel's original code will work fine.
>
> We're only pessimizing memory allocation profiling and
> SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
> it wouldn't make much sense to enable MEMCG on memory restricted systems
> anyway. (IIRC even raspberry pis don't enable the memory controller by
> default...)
>
> I think it's okay to fix the bug first, but we need to address
> the memory wastage issue sooner or later if companies (Meta and
> Google I guess?) are deploying kernels with memory allocation profiling
> on in production systems.

Sorry for the delay folks. I just got a chance to read through this thread.

I think adding a new KMALLOC_TYPE would be the cleanest way to fix
this recursion problem once and for all. This size bumping and the
special case of SLUB_TINY are quite confusing. We could define that
new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
enabled to avoid new caches when not needed. Does not seem too complex
but maybe I'm missing something? WDYT?

If it is more complex than I imaging then I'm fine with Shakeel's
approach as a temporary fix.

>
> Perhaps it's worth adding a comment like this, though:
>
> /*
>  * Only bump the size when the object (not the obj_exts array) is
>  * allocated from KMALLOC_NORMAL, either by memory allocation profiling
>  * or memcg on SLUB_TINY with __GFP_RECLAIMABLE|__GFP_ACCOUNT.
>  * Otherwise, obj_exts allocations cannot form a cycle between
>  * kmalloc caches.
>  */
> if (!is_kmalloc_normal(s))
>         return sz;
>
> Thanks!
>
> --
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-28 23:37               ` Suren Baghdasaryan
@ 2026-06-29  3:57                 ` Harry Yoo
  2026-06-29  4:28                   ` Suren Baghdasaryan
  0 siblings, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-29  3:57 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook


[ Adding Kees Cook for SLAB_BUCKETS conversation ]

The thread:
https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@linux.dev/

On 6/29/26 8:37 AM, Suren Baghdasaryan wrote:
> On Sun, Jun 28, 2026 at 2:22 AM Harry Yoo <harry@kernel.org> wrote:
>> On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
>>> On 6/28/26 5:23 AM, Shakeel Butt wrote:
>>>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>>>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>>>>> [...]
>>>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
>>>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>>>>
>>>>>>>> This will fix the problem.
>>>>>>>>
>>>>>>>> But this will waste memory as we need smaller obj_exts array
>>>>>>>> as the size gets larger.
>>>>>>>>
>>>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>>>>> (needed only when memory profiling is enabled, though)
>>>>>>>>
>>>>>>>> That would also prevent recursion even further.
>>>>>>>
>>>>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>>>>> correctly that you don't have any concerns with this approach?
>>>>>>
>>>>>> Umm, the memory waste is a concern?
>>>>>>
>>>>>> Minimally I'd now want to only do that size bumping when allocation
>>>>>> profiling is enabled. Ideally that means both configured in and not booted
>>>>>> with "never".
>>>>>>
>>>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>>>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>>>>> the strictly larger thing.
>>>>>
>>>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>>>>> charged objects.
>>>>
>>>> I am wrong here as I went back and see d8df600b67d7.
>>
>> I was confused too :)
>>
>>> (8dafa9f5900c upstream)
>>>
>>>>>
>>>>> So here we can do dedicated caches as Harry suggested or make this size bumping
>>>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
>>>
>>> Maybe long term we make kmem_buckets unconditional and use that.
>>>
>>>>> we do want this fix to be backported easily to older stable kernels. I will see
>>>>> how does this narrowed down size bumping looks like.
>>>>>
>>>>
>>>> BTW I think we need something like the following, right?
>>>>
>>>>      if (mem_alloc_profiling_enabled()) {
>>>>              if (obj_exts_cache->object_size <= s->object_size)
>>>>                      return s->object_size + 1;
>>>>      } else {
>>>>              if (obj_exts_cache->object_size == s->object_size)
>>>>                      return s->object_size + 1;
>>>>      }
>>
>> We should not add mem_alloc_profiling_enabled() check because,
>> then we're not fixing this issue on SLUB_TINY, when the caller specifies
>> __GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.
>>
>> `if (!is_kmalloc_normal(s))` check already bails out when it doesn't
>> need to bump the size.
>>
>> So Shakeel's original code will work fine.
>>
>> We're only pessimizing memory allocation profiling and
>> SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
>> it wouldn't make much sense to enable MEMCG on memory restricted systems
>> anyway. (IIRC even raspberry pis don't enable the memory controller by
>> default...)
>>
>> I think it's okay to fix the bug first, but we need to address
>> the memory wastage issue sooner or later if companies (Meta and
>> Google I guess?) are deploying kernels with memory allocation profiling
>> on in production systems.
> 
> Sorry for the delay folks. I just got a chance to read through this thread.

Hi Suren, no worries!

> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> this recursion problem once and for all. This size bumping and the
> special case of SLUB_TINY are quite confusing.

As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
infrastructure would be more straightforward than new KMALLOC_TYPE
because (I think) the kmalloc type is decided purely based on GFP
flags and we need to somehow work around that. SLAB_BUCKETS provides
a nice abstraction to do this.

Luckily, SLAB_BUCKETS is introduced in v6.11.
Unfortunately, SLAB_BUCKETS is optional.

> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> enabled to avoid new caches when not needed. Does not seem too complex
> but maybe I'm missing something? WDYT?

I think we need some enhancements to achieve that with SLAB_BUCKETS

1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
   (w/ SLAB_BUCKETS being a transitional config for _HARDENING)

2. Make the SLAB_BUCKETS infrastructure unconditional,
   but the decision is made at runtime:

   1) actually creating a kmem_buckets vs.
   2) falling back to kmalloc.

3. kmem_buckets_create() creates kmem_buckets only when
   SLAB_BUCKETS_HARDENING is enabled.

4. SLUB decides (not) to create kmem_buckets for internal use
   during the boot process. Use the kmem_buckets for obj_exts
   array allocation.

Side note: this would unconditionally add the kmem_buckets parameter to
the kmalloc slowpath. Probably it'd be worth introducing a dedicated
entrypoint for kmem_buckets instead.

> If it is more complex than I imaging then I'm fine with Shakeel's
> approach as a temporary fix.

Since above requires quite some changes, I'd say let's proeed with
the fix (since it's one line of code change that fixes a bug),
and then see how we can make SLAB_BUCKETS changes as minimal
as possible for backporting?

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-29  3:57                 ` Harry Yoo
@ 2026-06-29  4:28                   ` Suren Baghdasaryan
  2026-06-29 19:52                     ` Shakeel Butt
  2026-06-30  2:30                     ` Harry Yoo
  0 siblings, 2 replies; 25+ messages in thread
From: Suren Baghdasaryan @ 2026-06-29  4:28 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook

On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
>
>
> [ Adding Kees Cook for SLAB_BUCKETS conversation ]
>
> The thread:
> https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@linux.dev/
>
> On 6/29/26 8:37 AM, Suren Baghdasaryan wrote:
> > On Sun, Jun 28, 2026 at 2:22 AM Harry Yoo <harry@kernel.org> wrote:
> >> On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
> >>> On 6/28/26 5:23 AM, Shakeel Butt wrote:
> >>>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
> >>>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
> >>>>> [...]
> >>>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
> >>>>>>>>> from a cache strictly larger than the one it describes whenever it would
> >>>>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
> >>>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
> >>>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> >>>>>>>>
> >>>>>>>> This will fix the problem.
> >>>>>>>>
> >>>>>>>> But this will waste memory as we need smaller obj_exts array
> >>>>>>>> as the size gets larger.
> >>>>>>>>
> >>>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
> >>>>>>>> (needed only when memory profiling is enabled, though)
> >>>>>>>>
> >>>>>>>> That would also prevent recursion even further.
> >>>>>>>
> >>>>>>> Yes but I assume that would add kmem caches even for users not using memory
> >>>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
> >>>>>>> correctly that you don't have any concerns with this approach?
> >>>>>>
> >>>>>> Umm, the memory waste is a concern?
> >>>>>>
> >>>>>> Minimally I'd now want to only do that size bumping when allocation
> >>>>>> profiling is enabled. Ideally that means both configured in and not booted
> >>>>>> with "never".
> >>>>>>
> >>>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
> >>>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
> >>>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
> >>>>>> bucket, it can keep what it was doing and only memalloc profiling would do
> >>>>>> the strictly larger thing.
> >>>>>
> >>>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
> >>>>> charged objects.
> >>>>
> >>>> I am wrong here as I went back and see d8df600b67d7.
> >>
> >> I was confused too :)
> >>
> >>> (8dafa9f5900c upstream)
> >>>
> >>>>>
> >>>>> So here we can do dedicated caches as Harry suggested or make this size bumping
> >>>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> >>>
> >>> Maybe long term we make kmem_buckets unconditional and use that.
> >>>
> >>>>> we do want this fix to be backported easily to older stable kernels. I will see
> >>>>> how does this narrowed down size bumping looks like.
> >>>>>
> >>>>
> >>>> BTW I think we need something like the following, right?
> >>>>
> >>>>      if (mem_alloc_profiling_enabled()) {
> >>>>              if (obj_exts_cache->object_size <= s->object_size)
> >>>>                      return s->object_size + 1;
> >>>>      } else {
> >>>>              if (obj_exts_cache->object_size == s->object_size)
> >>>>                      return s->object_size + 1;
> >>>>      }
> >>
> >> We should not add mem_alloc_profiling_enabled() check because,
> >> then we're not fixing this issue on SLUB_TINY, when the caller specifies
> >> __GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.
> >>
> >> `if (!is_kmalloc_normal(s))` check already bails out when it doesn't
> >> need to bump the size.
> >>
> >> So Shakeel's original code will work fine.
> >>
> >> We're only pessimizing memory allocation profiling and
> >> SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
> >> it wouldn't make much sense to enable MEMCG on memory restricted systems
> >> anyway. (IIRC even raspberry pis don't enable the memory controller by
> >> default...)
> >>
> >> I think it's okay to fix the bug first, but we need to address
> >> the memory wastage issue sooner or later if companies (Meta and
> >> Google I guess?) are deploying kernels with memory allocation profiling
> >> on in production systems.
> >
> > Sorry for the delay folks. I just got a chance to read through this thread.
>
> Hi Suren, no worries!
>
> > I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> > this recursion problem once and for all. This size bumping and the
> > special case of SLUB_TINY are quite confusing.
>
> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
> infrastructure would be more straightforward than new KMALLOC_TYPE
> because (I think) the kmalloc type is decided purely based on GFP
> flags and we need to somehow work around that. SLAB_BUCKETS provides
> a nice abstraction to do this.
>
> Luckily, SLAB_BUCKETS is introduced in v6.11.
> Unfortunately, SLAB_BUCKETS is optional.
>
> > We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> > enabled to avoid new caches when not needed. Does not seem too complex
> > but maybe I'm missing something? WDYT?
>
> I think we need some enhancements to achieve that with SLAB_BUCKETS
>
> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
>    (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
>
> 2. Make the SLAB_BUCKETS infrastructure unconditional,
>    but the decision is made at runtime:
>
>    1) actually creating a kmem_buckets vs.
>    2) falling back to kmalloc.
>
> 3. kmem_buckets_create() creates kmem_buckets only when
>    SLAB_BUCKETS_HARDENING is enabled.
>
> 4. SLUB decides (not) to create kmem_buckets for internal use
>    during the boot process. Use the kmem_buckets for obj_exts
>    array allocation.
>
> Side note: this would unconditionally add the kmem_buckets parameter to
> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
> entrypoint for kmem_buckets instead.

Yeah, this sounds quite complex. Maybe we could use the new
kmalloc_flags() introduced by Vlastimil in [1] to avoid using GFP
flags to indicate that we want to use this new KMALLOC_TYPE? That
seems simpler, though it's not backportable because kmalloc_flags() is
brand new.

[1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org/

>
> > If it is more complex than I imaging then I'm fine with Shakeel's
> > approach as a temporary fix.
>
> Since above requires quite some changes, I'd say let's proeed with
> the fix (since it's one line of code change that fixes a bug),
> and then see how we can make SLAB_BUCKETS changes as minimal
> as possible for backporting?

I was thinking Shakeel's approach for backports and
kmalloc_flags()+KMALLOC_TYPE going forward. Just throwing this as an
option. I haven't looked closely into SLAB_BUCKETS yet, so that might
be indeed a better direction.

>
> --
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-29  4:28                   ` Suren Baghdasaryan
@ 2026-06-29 19:52                     ` Shakeel Butt
  2026-06-30  2:03                       ` Harry Yoo
  2026-06-30  2:30                     ` Harry Yoo
  1 sibling, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2026-06-29 19:52 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Harry Yoo, Vlastimil Babka (SUSE), Andrew Morton, Roman Gushchin,
	Hao Li, Christoph Lameter, David Rientjes, Usama Arif,
	Meta kernel team, linux-mm, linux-kernel, Danielle Costantino,
	Kees Cook

On Sun, Jun 28, 2026 at 09:28:51PM -0700, Suren Baghdasaryan wrote:
> On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
> >
> >

[...]

Thanks all for great discussion. Let me summarize the conclusion and please
correct me if I missed something.

Let's keep the original one-line fix (serve the obj_exts array from a
strictly larger cache, making the relation a DAG). We will NOT gate it on
mem_alloc_profiling_enabled() as I floated earlier -- per Harry,
is_kmalloc_normal(s) is already the right condition, and gating on
profiling would miss the SLUB_TINY + __GFP_RECLAIMABLE|__GFP_ACCOUNT memcg
case. So the bump stays unconditional for is_kmalloc_normal() caches.

This over-allocates the array for larger caches, but only for profiling and
SLUB_TINY+MEMCG users (the latter unrealistic). Acceptable for a small,
backportable fix.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-29 19:52                     ` Shakeel Butt
@ 2026-06-30  2:03                       ` Harry Yoo
  0 siblings, 0 replies; 25+ messages in thread
From: Harry Yoo @ 2026-06-30  2:03 UTC (permalink / raw)
  To: Shakeel Butt, Suren Baghdasaryan
  Cc: Vlastimil Babka (SUSE), Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino, Kees Cook



On 6/30/26 4:52 AM, Shakeel Butt wrote:
> On Sun, Jun 28, 2026 at 09:28:51PM -0700, Suren Baghdasaryan wrote:
>> On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
>>>
>>>
> 
> [...]
> 
> Thanks all for great discussion. Let me summarize the conclusion and please
> correct me if I missed something.

I think you didn't miss anything.
> Let's keep the original one-line fix (serve the obj_exts array from a
> strictly larger cache, making the relation a DAG).

Right.

 > We will NOT gate it on> mem_alloc_profiling_enabled() as I floated 
earlier

Right.
> -- per Harry, is_kmalloc_normal(s) is already the right condition,

Right.

 > and gating on> profiling would miss the SLUB_TINY + 
__GFP_RECLAIMABLE|__GFP_ACCOUNT memcg
> case.

Right.

 > So the bump stays unconditional for is_kmalloc_normal() caches.

Right.

> This over-allocates the array for larger caches, but only for profiling and
> SLUB_TINY+MEMCG users (the latter unrealistic). Acceptable for a small,
> backportable fix.

Yes.

Thanks!

-- 
Cheers,
Harry / Hyeonggon



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-29  4:28                   ` Suren Baghdasaryan
  2026-06-29 19:52                     ` Shakeel Butt
@ 2026-06-30  2:30                     ` Harry Yoo
  2026-06-30  4:38                       ` Suren Baghdasaryan
  1 sibling, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-30  2:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook



On 6/29/26 1:28 PM, Suren Baghdasaryan wrote:
> On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
>>> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
>>> this recursion problem once and for all. This size bumping and the
>>> special case of SLUB_TINY are quite confusing.
>>
>> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
>> infrastructure would be more straightforward than new KMALLOC_TYPE
>> because (I think) the kmalloc type is decided purely based on GFP
>> flags and we need to somehow work around that. SLAB_BUCKETS provides
>> a nice abstraction to do this.
>>
>> Luckily, SLAB_BUCKETS is introduced in v6.11.
>> Unfortunately, SLAB_BUCKETS is optional.
>>
>>> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
>>> enabled to avoid new caches when not needed. Does not seem too complex
>>> but maybe I'm missing something? WDYT?
>>
>> I think we need some enhancements to achieve that with SLAB_BUCKETS
>>
>> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
>>     (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
>>
>> 2. Make the SLAB_BUCKETS infrastructure unconditional,
>>     but the decision is made at runtime:
>>
>>     1) actually creating a kmem_buckets vs.
>>     2) falling back to kmalloc.
>>
>> 3. kmem_buckets_create() creates kmem_buckets only when
>>     SLAB_BUCKETS_HARDENING is enabled.
>>
>> 4. SLUB decides (not) to create kmem_buckets for internal use
>>     during the boot process. Use the kmem_buckets for obj_exts
>>     array allocation.
>>
>> Side note: this would unconditionally add the kmem_buckets parameter to
>> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
>> entrypoint for kmem_buckets instead.
> 
> Yeah, this sounds quite complex.

I think it's not that complex, but quite some churn, yeah :)

> Maybe we could use the new> kmalloc_flags() introduced by Vlastimil 
 > in [1] to avoid using GFP
> flags to indicate that we want to use this new KMALLOC_TYPE? That
> seems simpler,

That indeed would be smaller changes.

> though it's not backportable because kmalloc_flags() is> brand new.

Right, I didn't seriously consider that option as I was (mistakenly) 
assuming you or Shakeel would want to backport it.

> [1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org/
> 
>>
>>> If it is more complex than I imaging then I'm fine with Shakeel's
>>> approach as a temporary fix.
>>
>> Since above requires quite some changes, I'd say let's proeed with
>> the fix (since it's one line of code change that fixes a bug),
>> and then see how we can make SLAB_BUCKETS changes as minimal
>> as possible for backporting?
> 
> I was thinking Shakeel's approach for backports and
> kmalloc_flags()+KMALLOC_TYPE going forward.

Oh, I misread it then.
I was assuming it's critical enough to bother backporting.

> Just throwing this as an> option. I haven't looked closely into 
 > SLAB_BUCKETS yet, so that might
> be indeed a better direction.
-- 
Cheers,
Harry / Hyeonggon



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  2:30                     ` Harry Yoo
@ 2026-06-30  4:38                       ` Suren Baghdasaryan
  2026-06-30  4:39                         ` Suren Baghdasaryan
  0 siblings, 1 reply; 25+ messages in thread
From: Suren Baghdasaryan @ 2026-06-30  4:38 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook

On Mon, Jun 29, 2026 at 7:31 PM Harry Yoo <harry@kernel.org> wrote:
>
>
>
> On 6/29/26 1:28 PM, Suren Baghdasaryan wrote:
> > On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
> >>> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> >>> this recursion problem once and for all. This size bumping and the
> >>> special case of SLUB_TINY are quite confusing.
> >>
> >> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
> >> infrastructure would be more straightforward than new KMALLOC_TYPE
> >> because (I think) the kmalloc type is decided purely based on GFP
> >> flags and we need to somehow work around that. SLAB_BUCKETS provides
> >> a nice abstraction to do this.
> >>
> >> Luckily, SLAB_BUCKETS is introduced in v6.11.
> >> Unfortunately, SLAB_BUCKETS is optional.
> >>
> >>> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> >>> enabled to avoid new caches when not needed. Does not seem too complex
> >>> but maybe I'm missing something? WDYT?
> >>
> >> I think we need some enhancements to achieve that with SLAB_BUCKETS
> >>
> >> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
> >>     (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
> >>
> >> 2. Make the SLAB_BUCKETS infrastructure unconditional,
> >>     but the decision is made at runtime:
> >>
> >>     1) actually creating a kmem_buckets vs.
> >>     2) falling back to kmalloc.
> >>
> >> 3. kmem_buckets_create() creates kmem_buckets only when
> >>     SLAB_BUCKETS_HARDENING is enabled.
> >>
> >> 4. SLUB decides (not) to create kmem_buckets for internal use
> >>     during the boot process. Use the kmem_buckets for obj_exts
> >>     array allocation.
> >>
> >> Side note: this would unconditionally add the kmem_buckets parameter to
> >> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
> >> entrypoint for kmem_buckets instead.
> >
> > Yeah, this sounds quite complex.
>
> I think it's not that complex, but quite some churn, yeah :)
>
> > Maybe we could use the new> kmalloc_flags() introduced by Vlastimil
>  > in [1] to avoid using GFP
> > flags to indicate that we want to use this new KMALLOC_TYPE? That
> > seems simpler,
>
> That indeed would be smaller changes.
>
> > though it's not backportable because kmalloc_flags() is> brand new.
>
> Right, I didn't seriously consider that option as I was (mistakenly)
> assuming you or Shakeel would want to backport it.
>
> > [1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org/
> >
> >>
> >>> If it is more complex than I imaging then I'm fine with Shakeel's
> >>> approach as a temporary fix.
> >>
> >> Since above requires quite some changes, I'd say let's proeed with
> >> the fix (since it's one line of code change that fixes a bug),
> >> and then see how we can make SLAB_BUCKETS changes as minimal
> >> as possible for backporting?
> >
> > I was thinking Shakeel's approach for backports and
> > kmalloc_flags()+KMALLOC_TYPE going forward.
>
> Oh, I misread it then.
> I was assuming it's critical enough to bother backporting.

Yes, it's worth backporting, so we can merge Shakeel's change as is
and then once Vlastimil's patch is merged we can implement the new
KMALLOC_TYPE as a replacement.

>
> > Just throwing this as an> option. I haven't looked closely into
>  > SLAB_BUCKETS yet, so that might
> > be indeed a better direction.
> --
> Cheers,
> Harry / Hyeonggon
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  4:38                       ` Suren Baghdasaryan
@ 2026-06-30  4:39                         ` Suren Baghdasaryan
  2026-06-30  4:42                           ` Harry Yoo
  0 siblings, 1 reply; 25+ messages in thread
From: Suren Baghdasaryan @ 2026-06-30  4:39 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook

On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Jun 29, 2026 at 7:31 PM Harry Yoo <harry@kernel.org> wrote:
> >
> >
> >
> > On 6/29/26 1:28 PM, Suren Baghdasaryan wrote:
> > > On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
> > >>> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> > >>> this recursion problem once and for all. This size bumping and the
> > >>> special case of SLUB_TINY are quite confusing.
> > >>
> > >> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
> > >> infrastructure would be more straightforward than new KMALLOC_TYPE
> > >> because (I think) the kmalloc type is decided purely based on GFP
> > >> flags and we need to somehow work around that. SLAB_BUCKETS provides
> > >> a nice abstraction to do this.
> > >>
> > >> Luckily, SLAB_BUCKETS is introduced in v6.11.
> > >> Unfortunately, SLAB_BUCKETS is optional.
> > >>
> > >>> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> > >>> enabled to avoid new caches when not needed. Does not seem too complex
> > >>> but maybe I'm missing something? WDYT?
> > >>
> > >> I think we need some enhancements to achieve that with SLAB_BUCKETS
> > >>
> > >> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
> > >>     (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
> > >>
> > >> 2. Make the SLAB_BUCKETS infrastructure unconditional,
> > >>     but the decision is made at runtime:
> > >>
> > >>     1) actually creating a kmem_buckets vs.
> > >>     2) falling back to kmalloc.
> > >>
> > >> 3. kmem_buckets_create() creates kmem_buckets only when
> > >>     SLAB_BUCKETS_HARDENING is enabled.
> > >>
> > >> 4. SLUB decides (not) to create kmem_buckets for internal use
> > >>     during the boot process. Use the kmem_buckets for obj_exts
> > >>     array allocation.
> > >>
> > >> Side note: this would unconditionally add the kmem_buckets parameter to
> > >> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
> > >> entrypoint for kmem_buckets instead.
> > >
> > > Yeah, this sounds quite complex.
> >
> > I think it's not that complex, but quite some churn, yeah :)
> >
> > > Maybe we could use the new> kmalloc_flags() introduced by Vlastimil
> >  > in [1] to avoid using GFP
> > > flags to indicate that we want to use this new KMALLOC_TYPE? That
> > > seems simpler,
> >
> > That indeed would be smaller changes.
> >
> > > though it's not backportable because kmalloc_flags() is> brand new.
> >
> > Right, I didn't seriously consider that option as I was (mistakenly)
> > assuming you or Shakeel would want to backport it.
> >
> > > [1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org/
> > >
> > >>
> > >>> If it is more complex than I imaging then I'm fine with Shakeel's
> > >>> approach as a temporary fix.
> > >>
> > >> Since above requires quite some changes, I'd say let's proeed with
> > >> the fix (since it's one line of code change that fixes a bug),
> > >> and then see how we can make SLAB_BUCKETS changes as minimal
> > >> as possible for backporting?
> > >
> > > I was thinking Shakeel's approach for backports and
> > > kmalloc_flags()+KMALLOC_TYPE going forward.
> >
> > Oh, I misread it then.
> > I was assuming it's critical enough to bother backporting.
>
> Yes, it's worth backporting, so we can merge Shakeel's change as is
> and then once Vlastimil's patch is merged we can implement the new
> KMALLOC_TYPE as a replacement.

And Shakeel's patch is easily backportable.

>
> >
> > > Just throwing this as an> option. I haven't looked closely into
> >  > SLAB_BUCKETS yet, so that might
> > > be indeed a better direction.
> > --
> > Cheers,
> > Harry / Hyeonggon
> >


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  4:39                         ` Suren Baghdasaryan
@ 2026-06-30  4:42                           ` Harry Yoo
  2026-06-30  5:29                             ` Suren Baghdasaryan
  0 siblings, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-30  4:42 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook



On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
> On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Mon, Jun 29, 2026 at 7:31 PM Harry Yoo <harry@kernel.org> wrote:
>>>
>>>
>>>
>>> On 6/29/26 1:28 PM, Suren Baghdasaryan wrote:
>>>> On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
>>>>>> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
>>>>>> this recursion problem once and for all. This size bumping and the
>>>>>> special case of SLUB_TINY are quite confusing.
>>>>>
>>>>> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
>>>>> infrastructure would be more straightforward than new KMALLOC_TYPE
>>>>> because (I think) the kmalloc type is decided purely based on GFP
>>>>> flags and we need to somehow work around that. SLAB_BUCKETS provides
>>>>> a nice abstraction to do this.
>>>>>
>>>>> Luckily, SLAB_BUCKETS is introduced in v6.11.
>>>>> Unfortunately, SLAB_BUCKETS is optional.
>>>>>
>>>>>> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
>>>>>> enabled to avoid new caches when not needed. Does not seem too complex
>>>>>> but maybe I'm missing something? WDYT?
>>>>>
>>>>> I think we need some enhancements to achieve that with SLAB_BUCKETS
>>>>>
>>>>> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
>>>>>     (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
>>>>>
>>>>> 2. Make the SLAB_BUCKETS infrastructure unconditional,
>>>>>     but the decision is made at runtime:
>>>>>
>>>>>     1) actually creating a kmem_buckets vs.
>>>>>     2) falling back to kmalloc.
>>>>>
>>>>> 3. kmem_buckets_create() creates kmem_buckets only when
>>>>>     SLAB_BUCKETS_HARDENING is enabled.
>>>>>
>>>>> 4. SLUB decides (not) to create kmem_buckets for internal use
>>>>>     during the boot process. Use the kmem_buckets for obj_exts
>>>>>     array allocation.
>>>>>
>>>>> Side note: this would unconditionally add the kmem_buckets parameter to
>>>>> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
>>>>> entrypoint for kmem_buckets instead.
>>>>
>>>> Yeah, this sounds quite complex.
>>>
>>> I think it's not that complex, but quite some churn, yeah :)
>>>
>>>> Maybe we could use the new> kmalloc_flags() introduced by Vlastimil
>>>  > in [1] to avoid using GFP
>>>> flags to indicate that we want to use this new KMALLOC_TYPE? That
>>>> seems simpler,
>>>
>>> That indeed would be smaller changes.
>>>
>>>> though it's not backportable because kmalloc_flags() is> brand new.
>>>
>>> Right, I didn't seriously consider that option as I was (mistakenly)
>>> assuming you or Shakeel would want to backport it.
>>>
>>>> [1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org/
>>>>
>>>>>
>>>>>> If it is more complex than I imaging then I'm fine with Shakeel's
>>>>>> approach as a temporary fix.
>>>>>
>>>>> Since above requires quite some changes, I'd say let's proeed with
>>>>> the fix (since it's one line of code change that fixes a bug),
>>>>> and then see how we can make SLAB_BUCKETS changes as minimal
>>>>> as possible for backporting?
>>>>
>>>> I was thinking Shakeel's approach for backports and
>>>> kmalloc_flags()+KMALLOC_TYPE going forward.
>>>
>>> Oh, I misread it then.
>>> I was assuming it's critical enough to bother backporting.

Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
SLAB_BUCKETS approach.

>> Yes, it's worth backporting, so we can merge Shakeel's change as is

Right.

>> and then once Vlastimil's patch is merged we can implement the new

Vlastimil's patch has already landed mainline, by the way :)

>> KMALLOC_TYPE as a replacement.
> 
> And Shakeel's patch is easily backportable.

Yes, of course!

-- 
Cheers,
Harry / Hyeonggon



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  4:42                           ` Harry Yoo
@ 2026-06-30  5:29                             ` Suren Baghdasaryan
  2026-06-30  6:12                               ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 25+ messages in thread
From: Suren Baghdasaryan @ 2026-06-30  5:29 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka (SUSE), Shakeel Butt, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook

On Mon, Jun 29, 2026 at 9:42 PM Harry Yoo <harry@kernel.org> wrote:
>
>
>
> On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
> > On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> On Mon, Jun 29, 2026 at 7:31 PM Harry Yoo <harry@kernel.org> wrote:
> >>>
> >>>
> >>>
> >>> On 6/29/26 1:28 PM, Suren Baghdasaryan wrote:
> >>>> On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@kernel.org> wrote:
> >>>>>> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> >>>>>> this recursion problem once and for all. This size bumping and the
> >>>>>> special case of SLUB_TINY are quite confusing.
> >>>>>
> >>>>> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
> >>>>> infrastructure would be more straightforward than new KMALLOC_TYPE
> >>>>> because (I think) the kmalloc type is decided purely based on GFP
> >>>>> flags and we need to somehow work around that. SLAB_BUCKETS provides
> >>>>> a nice abstraction to do this.
> >>>>>
> >>>>> Luckily, SLAB_BUCKETS is introduced in v6.11.
> >>>>> Unfortunately, SLAB_BUCKETS is optional.
> >>>>>
> >>>>>> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> >>>>>> enabled to avoid new caches when not needed. Does not seem too complex
> >>>>>> but maybe I'm missing something? WDYT?
> >>>>>
> >>>>> I think we need some enhancements to achieve that with SLAB_BUCKETS
> >>>>>
> >>>>> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
> >>>>>     (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
> >>>>>
> >>>>> 2. Make the SLAB_BUCKETS infrastructure unconditional,
> >>>>>     but the decision is made at runtime:
> >>>>>
> >>>>>     1) actually creating a kmem_buckets vs.
> >>>>>     2) falling back to kmalloc.
> >>>>>
> >>>>> 3. kmem_buckets_create() creates kmem_buckets only when
> >>>>>     SLAB_BUCKETS_HARDENING is enabled.
> >>>>>
> >>>>> 4. SLUB decides (not) to create kmem_buckets for internal use
> >>>>>     during the boot process. Use the kmem_buckets for obj_exts
> >>>>>     array allocation.
> >>>>>
> >>>>> Side note: this would unconditionally add the kmem_buckets parameter to
> >>>>> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
> >>>>> entrypoint for kmem_buckets instead.
> >>>>
> >>>> Yeah, this sounds quite complex.
> >>>
> >>> I think it's not that complex, but quite some churn, yeah :)
> >>>
> >>>> Maybe we could use the new> kmalloc_flags() introduced by Vlastimil
> >>>  > in [1] to avoid using GFP
> >>>> flags to indicate that we want to use this new KMALLOC_TYPE? That
> >>>> seems simpler,
> >>>
> >>> That indeed would be smaller changes.
> >>>
> >>>> though it's not backportable because kmalloc_flags() is> brand new.
> >>>
> >>> Right, I didn't seriously consider that option as I was (mistakenly)
> >>> assuming you or Shakeel would want to backport it.
> >>>
> >>>> [1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@kernel.org/
> >>>>
> >>>>>
> >>>>>> If it is more complex than I imaging then I'm fine with Shakeel's
> >>>>>> approach as a temporary fix.
> >>>>>
> >>>>> Since above requires quite some changes, I'd say let's proeed with
> >>>>> the fix (since it's one line of code change that fixes a bug),
> >>>>> and then see how we can make SLAB_BUCKETS changes as minimal
> >>>>> as possible for backporting?
> >>>>
> >>>> I was thinking Shakeel's approach for backports and
> >>>> kmalloc_flags()+KMALLOC_TYPE going forward.
> >>>
> >>> Oh, I misread it then.
> >>> I was assuming it's critical enough to bother backporting.
>
> Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
> SLAB_BUCKETS approach.
>
> >> Yes, it's worth backporting, so we can merge Shakeel's change as is
>
> Right.
>
> >> and then once Vlastimil's patch is merged we can implement the new
>
> Vlastimil's patch has already landed mainline, by the way :)

Nice! I suggest posting Shakeel's patch CC'ing stable for backports
and then following up with the fix using KMALLOC_TYPE. Vlastimil,
WDYT?

>
> >> KMALLOC_TYPE as a replacement.
> >
> > And Shakeel's patch is easily backportable.
>
> Yes, of course!
>
> --
> Cheers,
> Harry / Hyeonggon
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  5:29                             ` Suren Baghdasaryan
@ 2026-06-30  6:12                               ` Vlastimil Babka (SUSE)
  2026-06-30  7:03                                 ` Harry Yoo
  0 siblings, 1 reply; 25+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-30  6:12 UTC (permalink / raw)
  To: Suren Baghdasaryan, Harry Yoo
  Cc: Shakeel Butt, Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino, Kees Cook

On 6/30/26 07:29, Suren Baghdasaryan wrote:
> On Mon, Jun 29, 2026 at 9:42 PM Harry Yoo <harry@kernel.org> wrote:
>>
>>
>>
>> On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
>> > On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
>> >>
>>
>> Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
>> SLAB_BUCKETS approach.
>>
>> >> Yes, it's worth backporting, so we can merge Shakeel's change as is
>>
>> Right.
>>
>> >> and then once Vlastimil's patch is merged we can implement the new
>>
>> Vlastimil's patch has already landed mainline, by the way :)
> 
> Nice! I suggest posting Shakeel's patch CC'ing stable for backports
> and then following up with the fix using KMALLOC_TYPE. Vlastimil,
> WDYT?

Sounded like a plan, but then I realized I misunderstood the amount of the
wastage. E.g. on my system kmalloc-8k with 4 objects per slab would have
obj_ext size of 64, but now it's 16k? That's ridiculous. I think it will
even self-amplify to some extent? kmalloc-8 would have 512 objects per slab,
so its obj_ext is 8k. It will not recursively create an obj_ext for the
obj_ext, but other 8k allocation in the same kmalloc-8k slab could then
trigger it, right?

We could say it's for a debugging feature, but also it's running in
production fleets (and Android?), so probably not that easy to dismiss.
Sudden memory increase in a LTS due to this backport doesn't sound nice to me.

>>
>> >> KMALLOC_TYPE as a replacement.
>> >
>> > And Shakeel's patch is easily backportable.
>>
>> Yes, of course!
>>
>> --
>> Cheers,
>> Harry / Hyeonggon
>>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  6:12                               ` Vlastimil Babka (SUSE)
@ 2026-06-30  7:03                                 ` Harry Yoo
  2026-06-30 14:35                                   ` Shakeel Butt
  0 siblings, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-30  7:03 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Suren Baghdasaryan
  Cc: Shakeel Butt, Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino, Kees Cook



On 6/30/26 3:12 PM, Vlastimil Babka (SUSE) wrote:
> On 6/30/26 07:29, Suren Baghdasaryan wrote:
>> On Mon, Jun 29, 2026 at 9:42 PM Harry Yoo <harry@kernel.org> wrote:
>>>
>>>
>>>
>>> On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
>>>> On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>
>>>
>>> Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
>>> SLAB_BUCKETS approach.
>>>
>>>>> Yes, it's worth backporting, so we can merge Shakeel's change as is
>>>
>>> Right.
>>>
>>>>> and then once Vlastimil's patch is merged we can implement the new
>>>
>>> Vlastimil's patch has already landed mainline, by the way :)
>>
>> Nice! I suggest posting Shakeel's patch CC'ing stable for backports
>> and then following up with the fix using KMALLOC_TYPE. Vlastimil,
>> WDYT?
> 
> Sounded like a plan, but then I realized I misunderstood the amount of the
> wastage. E.g. on my system kmalloc-8k with 4 objects per slab would have
> obj_ext size of 64, but now it's 16k? That's ridiculous.

Right.

...which is why I was assuming either the KMALLOC_TYPE or SLAB_BUCKETS
approach would be backported as a follow-up. Err, should have
communicated clearly, apologies.

> I think it will> even self-amplify to some extent? kmalloc-8 would
have 512 objects per slab,
> so its obj_ext is 8k. It will not recursively create an obj_ext for the> obj_ext, but other 8k allocation in the same kmalloc-8k slab could then
> trigger it, right?

True, assuming that by 'self-amplifying' you meant this patch creates
more kmalloc-8k objects, and also now kmalloc-8k wastes memory memory.

> We could say it's for a debugging feature, but also it's running in
> production fleets (and Android?), so probably not that easy to dismiss.

I think a key factor is when it's enabled in production.

kconfigs says Android selects MEM_ALLOC_PROFILING, but not
MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.

I assumed that turning it on by default in the entire fleet
would be bit hard to justify... (please correct me,
if it's not the case)

> Sudden memory increase in a LTS due to this backport doesn't sound nice to me.

-- 
Cheers,
Harry / Hyeonggon



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30  7:03                                 ` Harry Yoo
@ 2026-06-30 14:35                                   ` Shakeel Butt
  2026-06-30 14:52                                     ` Suren Baghdasaryan
  0 siblings, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2026-06-30 14:35 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka (SUSE), Suren Baghdasaryan, Andrew Morton,
	Roman Gushchin, Hao Li, Christoph Lameter, David Rientjes,
	Usama Arif, Meta kernel team, linux-mm, linux-kernel,
	Danielle Costantino, Kees Cook

On Tue, Jun 30, 2026 at 04:03:30PM +0900, Harry Yoo wrote:
> 
> 
> On 6/30/26 3:12 PM, Vlastimil Babka (SUSE) wrote:
> > On 6/30/26 07:29, Suren Baghdasaryan wrote:
> >> On Mon, Jun 29, 2026 at 9:42 PM Harry Yoo <harry@kernel.org> wrote:
> >>>
> >>>
> >>>
> >>> On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
> >>>> On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>>
> >>>
> >>> Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
> >>> SLAB_BUCKETS approach.
> >>>
> >>>>> Yes, it's worth backporting, so we can merge Shakeel's change as is
> >>>
> >>> Right.
> >>>
> >>>>> and then once Vlastimil's patch is merged we can implement the new
> >>>
> >>> Vlastimil's patch has already landed mainline, by the way :)
> >>
> >> Nice! I suggest posting Shakeel's patch CC'ing stable for backports
> >> and then following up with the fix using KMALLOC_TYPE. Vlastimil,
> >> WDYT?
> > 
> > Sounded like a plan, but then I realized I misunderstood the amount of the
> > wastage. E.g. on my system kmalloc-8k with 4 objects per slab would have
> > obj_ext size of 64, but now it's 16k? That's ridiculous.
> 
> Right.

Yeah I should have given more thought on wastage.

> 
> ...which is why I was assuming either the KMALLOC_TYPE or SLAB_BUCKETS
> approach would be backported as a follow-up. Err, should have
> communicated clearly, apologies.

Harry, do you want to take a stab at prototyping these? If these look simple
enough, we can request backports of this.

> 
> > I think it will> even self-amplify to some extent? kmalloc-8 would
> have 512 objects per slab,
> > so its obj_ext is 8k. It will not recursively create an obj_ext for the> obj_ext, but other 8k allocation in the same kmalloc-8k slab could then
> > trigger it, right?
> 
> True, assuming that by 'self-amplifying' you meant this patch creates
> more kmalloc-8k objects, and also now kmalloc-8k wastes memory memory.
> 

I am not sure I understand what self-amplifying means here. Shouldn't 8k
allocations served by the same kmalloc-8k slab will share the obj_exts array?

> > We could say it's for a debugging feature, but also it's running in
> > production fleets (and Android?), so probably not that easy to dismiss.
> 
> I think a key factor is when it's enabled in production.
> 
> kconfigs says Android selects MEM_ALLOC_PROFILING, but not
> MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
> 
> I assumed that turning it on by default in the entire fleet
> would be bit hard to justify... (please correct me,
> if it's not the case)

Actually we have memory profiling enabled by default across Meta fleet. So, the
issue is very real. At the moment, we are seeing this issue on a specific
type of machine and we have disabled memory profiling for those machines.

Internally we did discuss to simply disable memory allocation profiling for
kmalloc-normal caches but to me that was a big hammer and thus suggested the
current approach.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30 14:35                                   ` Shakeel Butt
@ 2026-06-30 14:52                                     ` Suren Baghdasaryan
  2026-06-30 15:27                                       ` Harry Yoo
  0 siblings, 1 reply; 25+ messages in thread
From: Suren Baghdasaryan @ 2026-06-30 14:52 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Harry Yoo, Vlastimil Babka (SUSE), Andrew Morton, Roman Gushchin,
	Hao Li, Christoph Lameter, David Rientjes, Usama Arif,
	Meta kernel team, linux-mm, linux-kernel, Danielle Costantino,
	Kees Cook

On Tue, Jun 30, 2026 at 7:36 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Tue, Jun 30, 2026 at 04:03:30PM +0900, Harry Yoo wrote:
> >
> >
> > On 6/30/26 3:12 PM, Vlastimil Babka (SUSE) wrote:
> > > On 6/30/26 07:29, Suren Baghdasaryan wrote:
> > >> On Mon, Jun 29, 2026 at 9:42 PM Harry Yoo <harry@kernel.org> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
> > >>>> On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >>>>>
> > >>>
> > >>> Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
> > >>> SLAB_BUCKETS approach.
> > >>>
> > >>>>> Yes, it's worth backporting, so we can merge Shakeel's change as is
> > >>>
> > >>> Right.
> > >>>
> > >>>>> and then once Vlastimil's patch is merged we can implement the new
> > >>>
> > >>> Vlastimil's patch has already landed mainline, by the way :)
> > >>
> > >> Nice! I suggest posting Shakeel's patch CC'ing stable for backports
> > >> and then following up with the fix using KMALLOC_TYPE. Vlastimil,
> > >> WDYT?
> > >
> > > Sounded like a plan, but then I realized I misunderstood the amount of the
> > > wastage. E.g. on my system kmalloc-8k with 4 objects per slab would have
> > > obj_ext size of 64, but now it's 16k? That's ridiculous.
> >
> > Right.
>
> Yeah I should have given more thought on wastage.

Ugh! I didn't realize the wastage was that high.

>
> >
> > ...which is why I was assuming either the KMALLOC_TYPE or SLAB_BUCKETS
> > approach would be backported as a follow-up. Err, should have
> > communicated clearly, apologies.
>
> Harry, do you want to take a stab at prototyping these? If these look simple
> enough, we can request backports of this.

I'll also give it some thought to see if there is maybe a different
way to fix this that would be easy to backport.

>
> >
> > > I think it will> even self-amplify to some extent? kmalloc-8 would
> > have 512 objects per slab,
> > > so its obj_ext is 8k. It will not recursively create an obj_ext for the> obj_ext, but other 8k allocation in the same kmalloc-8k slab could then
> > > trigger it, right?
> >
> > True, assuming that by 'self-amplifying' you meant this patch creates
> > more kmalloc-8k objects, and also now kmalloc-8k wastes memory memory.
> >
>
> I am not sure I understand what self-amplifying means here. Shouldn't 8k
> allocations served by the same kmalloc-8k slab will share the obj_exts array?
>
> > > We could say it's for a debugging feature, but also it's running in
> > > production fleets (and Android?), so probably not that easy to dismiss.
> >
> > I think a key factor is when it's enabled in production.
> >
> > kconfigs says Android selects MEM_ALLOC_PROFILING, but not
> > MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
> >
> > I assumed that turning it on by default in the entire fleet
> > would be bit hard to justify... (please correct me,
> > if it's not the case)
>
> Actually we have memory profiling enabled by default across Meta fleet. So, the
> issue is very real. At the moment, we are seeing this issue on a specific
> type of machine and we have disabled memory profiling for those machines.
>
> Internally we did discuss to simply disable memory allocation profiling for
> kmalloc-normal caches but to me that was a big hammer and thus suggested the
> current approach.
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-30 14:52                                     ` Suren Baghdasaryan
@ 2026-06-30 15:27                                       ` Harry Yoo
  0 siblings, 0 replies; 25+ messages in thread
From: Harry Yoo @ 2026-06-30 15:27 UTC (permalink / raw)
  To: Suren Baghdasaryan, Shakeel Butt
  Cc: Vlastimil Babka (SUSE), Andrew Morton, Roman Gushchin, Hao Li,
	Christoph Lameter, David Rientjes, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino, Kees Cook


[-- Attachment #1.1: Type: text/plain, Size: 3020 bytes --]



On 6/30/26 11:52 PM, Suren Baghdasaryan wrote:
> On Tue, Jun 30, 2026 at 7:36 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>
>> On Tue, Jun 30, 2026 at 04:03:30PM +0900, Harry Yoo wrote:
>>>
>>>
>>> On 6/30/26 3:12 PM, Vlastimil Babka (SUSE) wrote:
>>>> On 6/30/26 07:29, Suren Baghdasaryan wrote:
>>>>> On Mon, Jun 29, 2026 at 9:42 PM Harry Yoo <harry@kernel.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/30/26 1:39 PM, Suren Baghdasaryan wrote:
>>>>>>> On Mon, Jun 29, 2026 at 9:38 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>>>>
>>>>>>
>>>>>> Ah, here I meant backporting either the kmalloc_flags()+KMALLOC_TYPE or
>>>>>> SLAB_BUCKETS approach.
>>>>>>
>>>>>>>> Yes, it's worth backporting, so we can merge Shakeel's change as is
>>>>>>
>>>>>> Right.
>>>>>>
>>>>>>>> and then once Vlastimil's patch is merged we can implement the new
>>>>>>
>>>>>> Vlastimil's patch has already landed mainline, by the way :)
>>>>>
>>>>> Nice! I suggest posting Shakeel's patch CC'ing stable for backports
>>>>> and then following up with the fix using KMALLOC_TYPE. Vlastimil,
>>>>> WDYT?
>>>>
>>>> Sounded like a plan, but then I realized I misunderstood the amount of the
>>>> wastage. E.g. on my system kmalloc-8k with 4 objects per slab would have
>>>> obj_ext size of 64, but now it's 16k? That's ridiculous.
>>>
>>> Right.
>>
>> Yeah I should have given more thought on wastage.
> 
> Ugh! I didn't realize the wastage was that high.

Ouch!

>>> ...which is why I was assuming either the KMALLOC_TYPE or SLAB_BUCKETS
>>> approach would be backported as a follow-up. Err, should have
>>> communicated clearly, apologies.
>>
>> Harry, do you want to take a stab at prototyping these? If these look simple
>> enough, we can request backports of this.

Ack, let's see what would be the minimal changes to resolve this.

> I'll also give it some thought to see if there is maybe a different
> way to fix this that would be easy to backport.

Thanks, let's discuss!

>>>> We could say it's for a debugging feature, but also it's running in
>>>> production fleets (and Android?), so probably not that easy to dismiss.
>>>
>>> I think a key factor is when it's enabled in production.
>>>
>>> kconfigs says Android selects MEM_ALLOC_PROFILING, but not
>>> MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
>>>
>>> I assumed that turning it on by default in the entire fleet
>>> would be bit hard to justify... (please correct me,
>>> if it's not the case)
>>
>> Actually we have memory profiling enabled by default across Meta fleet. So, the
>> issue is very real.

I see, thanks for clarifying.

>> At the moment, we are seeing this issue on a specific
>> type of machine and we have disabled memory profiling for those machines.
>> Internally we did discuss to simply disable memory allocation profiling for
>> kmalloc-normal caches but to me that was a big hammer and thus suggested the
>> current approach.

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-26 17:11     ` Vlastimil Babka (SUSE)
  2026-06-28  2:58       ` Shakeel Butt
@ 2026-06-28  8:10       ` Harry Yoo
  2026-06-28  8:36         ` Harry Yoo
  1 sibling, 1 reply; 25+ messages in thread
From: Harry Yoo @ 2026-06-28  8:10 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Shakeel Butt
  Cc: Andrew Morton, Roman Gushchin, Hao Li, Christoph Lameter,
	David Rientjes, Suren Baghdasaryan, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino


[-- Attachment #1.1: Type: text/plain, Size: 838 bytes --]



On 6/27/26 2:11 AM, Vlastimil Babka (SUSE) wrote:
> Minimally I'd now want to only do that size bumping when allocation
> profiling is enabled. Ideally that means both configured in and not booted
> with "never".
>
> We probably should have done that already in 280ea9c3154b2.

I think we did already.

obj_exts allocation triggered by memcg has __GFP_ACCOUNT, kmalloc type
is KMALLOC_CGROUP, and so is_kmalloc_normal() should return false?

Perhaps a comment above !is_kmalloc_normal() check would be nice.

> Because AFAIU
> memcg-only obj_exts array don't have this issue
> (or maybe they do have the > [1] issue? Harry?).

memcg-only obj_exts array doesn't have this issue and [1].

Because obj_exts are not accounted and so we can't allocate obj_exts
from KMALLOC_CGROUP.

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
  2026-06-28  8:10       ` Harry Yoo
@ 2026-06-28  8:36         ` Harry Yoo
  0 siblings, 0 replies; 25+ messages in thread
From: Harry Yoo @ 2026-06-28  8:36 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Shakeel Butt
  Cc: Andrew Morton, Roman Gushchin, Hao Li, Christoph Lameter,
	David Rientjes, Suren Baghdasaryan, Usama Arif, Meta kernel team,
	linux-mm, linux-kernel, Danielle Costantino


[-- Attachment #1.1: Type: text/plain, Size: 1242 bytes --]



On 6/28/26 5:10 PM, Harry Yoo wrote:
> 
> 
> On 6/27/26 2:11 AM, Vlastimil Babka (SUSE) wrote:
>> Minimally I'd now want to only do that size bumping when allocation
>> profiling is enabled. Ideally that means both configured in and not booted
>> with "never".
>>
>> We probably should have done that already in 280ea9c3154b2.
> 
> I think we did already.
> 
> obj_exts allocation triggered by memcg has __GFP_ACCOUNT, kmalloc type
> is KMALLOC_CGROUP, and so is_kmalloc_normal() should return false?
> 
> Perhaps a comment above !is_kmalloc_normal() check would be nice.
> 
>> Because AFAIU
>> memcg-only obj_exts array don't have this issue
>> (or maybe they do have the > [1] issue? Harry?).
> 
> memcg-only obj_exts array doesn't have this issue and [1].
>> Because obj_exts are not accounted and so we can't allocate obj_exts
> from KMALLOC_CGROUP.

Oh god, I am confused again! Commit 8dafa9f5900c says it might have this
issue. However is_kmalloc_normal(s) == true detects that it might have
the problem when either 1) memory allocation profiling is enabled, or 2)
SLUB_TINY w/ __GFP_RECLAIMABLE|__GFP_ACCOUNT allocates accounted slab
objects from KMALLOC_NORMAL.

-- 
Cheers,
Harry / Hyeonggon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-06-30 15:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 23:00 [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache Shakeel Butt
2026-06-26  4:22 ` Harry Yoo
2026-06-26 16:49   ` Shakeel Butt
2026-06-26 17:11     ` Vlastimil Babka (SUSE)
2026-06-28  2:58       ` Shakeel Butt
2026-06-28  3:23         ` Shakeel Butt
2026-06-28  7:47           ` Vlastimil Babka (SUSE)
2026-06-28  9:22             ` Harry Yoo
2026-06-28 23:37               ` Suren Baghdasaryan
2026-06-29  3:57                 ` Harry Yoo
2026-06-29  4:28                   ` Suren Baghdasaryan
2026-06-29 19:52                     ` Shakeel Butt
2026-06-30  2:03                       ` Harry Yoo
2026-06-30  2:30                     ` Harry Yoo
2026-06-30  4:38                       ` Suren Baghdasaryan
2026-06-30  4:39                         ` Suren Baghdasaryan
2026-06-30  4:42                           ` Harry Yoo
2026-06-30  5:29                             ` Suren Baghdasaryan
2026-06-30  6:12                               ` Vlastimil Babka (SUSE)
2026-06-30  7:03                                 ` Harry Yoo
2026-06-30 14:35                                   ` Shakeel Butt
2026-06-30 14:52                                     ` Suren Baghdasaryan
2026-06-30 15:27                                       ` Harry Yoo
2026-06-28  8:10       ` Harry Yoo
2026-06-28  8:36         ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox