public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 00/20] slab: replace cpu (partial) slabs with sheaves
@ 2026-01-12 15:16 Vlastimil Babka
  2026-01-12 15:16 ` [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-12 15:16 UTC (permalink / raw)
  To: Harry Yoo, Petr Tesarik, Christoph Lameter, David Rientjes,
	Roman Gushchin
  Cc: Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	Vlastimil Babka, kernel test robot, stable

Percpu sheaves caching was introduced as opt-in but the goal was to
eventually move all caches to them. This is the next step, enabling
sheaves for all caches (except the two bootstrap ones) and then removing
the per cpu (partial) slabs and lots of associated code.

Besides (hopefully) improved performance, this removes the rather
complicated code related to the lockless fastpaths (using
this_cpu_try_cmpxchg128/64) and its complications with PREEMPT_RT or
kmalloc_nolock().

The lockless slab freelist+counters update operation using
try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
without repeating the "alien" array flushing of SLUB, and to allow
flushing objects from sheaves to slabs mostly without the node
list_lock.

This v2 is the first non-RFC. I would consider exposing the series to
linux-next at this point.

Git branch for the v2:
  https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=sheaves-for-all-v2

Based on:
  https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-7.0/sheaves
  - includes a sheaves optimization that seemed minor but there was lkp
    test robot result with significant improvements:
    https://lore.kernel.org/all/202512291555.56ce2e53-lkp@intel.com/
    (could be an uncommon corner case workload though)

Significant (but not critical) remaining TODOs:
- Integration of rcu sheaves handling with kfree_rcu batching.
  - Currently the kfree_rcu batching is almost completely bypassed. I'm
    thinking it could be adjusted to handle rcu sheaves in addition to
    individual objects, to get the best of both.
- Performance evaluation. Petr Tesarik has been doing that on the RFC
  with some promising results (thanks!) and also found a memory leak.

Note that as many things, this caching scheme change is a tradeoff, as
summarized by Christoph:

  https://lore.kernel.org/all/f7c33974-e520-387e-9e2f-1e523bfe1545@gentwo.org/

- Objects allocated from sheaves should have better temporal locality
  (likely recently freed, thus cache hot) but worse spatial locality
  (likely from many different slabs, increasing memory usage and
  possibly TLB pressure on kernel's direct map).

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- Rebased to v6.19-rc1+slab.git slab/for-7.0/sheaves
  - Some of the preliminary patches from the RFC went in there.
- Incorporate feedback/reports from many people (thanks!), including:
  - Make caches with sheaves mergeable.
  - Fix a major memory leak.
- Cleanup of stat items.
- Link to v1: https://patch.msgid.link/20251023-sheaves-for-all-v1-0-6ffa2c9941c0@suse.cz

---
Vlastimil Babka (20):
      mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
      mm/slab: move and refactor __kmem_cache_alias()
      mm/slab: make caches with sheaves mergeable
      slab: add sheaves to most caches
      slab: introduce percpu sheaves bootstrap
      slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock()
      slab: handle kmalloc sheaves bootstrap
      slab: add optimized sheaf refill from partial list
      slab: remove cpu (partial) slabs usage from allocation paths
      slab: remove SLUB_CPU_PARTIAL
      slab: remove the do_slab_free() fastpath
      slab: remove defer_deactivate_slab()
      slab: simplify kmalloc_nolock()
      slab: remove struct kmem_cache_cpu
      slab: remove unused PREEMPT_RT specific macros
      slab: refill sheaves from all nodes
      slab: update overview comments
      slab: remove frozen slab checks from __slab_free()
      mm/slub: remove DEACTIVATE_TO_* stat items
      mm/slub: cleanup and repurpose some stat items

 include/linux/slab.h |    6 -
 mm/Kconfig           |   11 -
 mm/internal.h        |    1 +
 mm/page_alloc.c      |    5 +
 mm/slab.h            |   53 +-
 mm/slab_common.c     |   56 +-
 mm/slub.c            | 2591 +++++++++++++++++---------------------------------
 7 files changed, 950 insertions(+), 1773 deletions(-)
---
base-commit: aff9fb2fffa1175bd5ae3b4630f3d4ae53af450b
change-id: 20251002-sheaves-for-all-86ac13dc47a5

Best regards,
-- 
Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-12 15:16 [PATCH RFC v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
@ 2026-01-12 15:16 ` Vlastimil Babka
  2026-01-13  2:08   ` Harry Yoo
  2026-01-14  4:56   ` Harry Yoo
  2026-01-12 15:20 ` [PATCH v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
  2026-01-15 15:12 ` [PATCH RFC " Vlastimil Babka
  2 siblings, 2 replies; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-12 15:16 UTC (permalink / raw)
  To: Harry Yoo, Petr Tesarik, Christoph Lameter, David Rientjes,
	Roman Gushchin
  Cc: Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	Vlastimil Babka, kernel test robot, stable

After we submit the rcu_free sheaves to call_rcu() we need to make sure
the rcu callbacks complete. kvfree_rcu_barrier() does that via
flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
that.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
Cc: stable@vger.kernel.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index eed7ea556cb1..ee994ec7f251 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
  */
 void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
 {
-	if (s->cpu_sheaves)
+	if (s->cpu_sheaves) {
 		flush_rcu_sheaves_on_cache(s);
+		rcu_barrier();
+	}
+
 	/*
 	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
 	 * on a specific slab cache.

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 00/20] slab: replace cpu (partial) slabs with sheaves
  2026-01-12 15:16 [PATCH RFC v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
  2026-01-12 15:16 ` [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
@ 2026-01-12 15:20 ` Vlastimil Babka
  2026-01-15 15:12 ` [PATCH RFC " Vlastimil Babka
  2 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-12 15:20 UTC (permalink / raw)
  To: Harry Yoo, Petr Tesarik, Christoph Lameter, David Rientjes,
	Roman Gushchin
  Cc: Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On 1/12/26 16:16, Vlastimil Babka wrote:
> Percpu sheaves caching was introduced as opt-in but the goal was to
> eventually move all caches to them. This is the next step, enabling
> sheaves for all caches (except the two bootstrap ones) and then removing
> the per cpu (partial) slabs and lots of associated code.
> 
> Besides (hopefully) improved performance, this removes the rather
> complicated code related to the lockless fastpaths (using
> this_cpu_try_cmpxchg128/64) and its complications with PREEMPT_RT or
> kmalloc_nolock().
> 
> The lockless slab freelist+counters update operation using
> try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
> without repeating the "alien" array flushing of SLUB, and to allow
> flushing objects from sheaves to slabs mostly without the node
> list_lock.
> 
> This v2 is the first non-RFC. I would consider exposing the series to
> linux-next at this point.

Well if only I didn't forget to remove the RFC prefix before sending...

> Git branch for the v2:
>   https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=sheaves-for-all-v2
> 
> Based on:
>   https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-7.0/sheaves
>   - includes a sheaves optimization that seemed minor but there was lkp
>     test robot result with significant improvements:
>     https://lore.kernel.org/all/202512291555.56ce2e53-lkp@intel.com/
>     (could be an uncommon corner case workload though)
> 
> Significant (but not critical) remaining TODOs:
> - Integration of rcu sheaves handling with kfree_rcu batching.
>   - Currently the kfree_rcu batching is almost completely bypassed. I'm
>     thinking it could be adjusted to handle rcu sheaves in addition to
>     individual objects, to get the best of both.
> - Performance evaluation. Petr Tesarik has been doing that on the RFC
>   with some promising results (thanks!) and also found a memory leak.
> 
> Note that as many things, this caching scheme change is a tradeoff, as
> summarized by Christoph:
> 
>   https://lore.kernel.org/all/f7c33974-e520-387e-9e2f-1e523bfe1545@gentwo.org/
> 
> - Objects allocated from sheaves should have better temporal locality
>   (likely recently freed, thus cache hot) but worse spatial locality
>   (likely from many different slabs, increasing memory usage and
>   possibly TLB pressure on kernel's direct map).
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Changes in v2:
> - Rebased to v6.19-rc1+slab.git slab/for-7.0/sheaves
>   - Some of the preliminary patches from the RFC went in there.
> - Incorporate feedback/reports from many people (thanks!), including:
>   - Make caches with sheaves mergeable.
>   - Fix a major memory leak.
> - Cleanup of stat items.
> - Link to v1: https://patch.msgid.link/20251023-sheaves-for-all-v1-0-6ffa2c9941c0@suse.cz
> 
> ---
> Vlastimil Babka (20):
>       mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
>       mm/slab: move and refactor __kmem_cache_alias()
>       mm/slab: make caches with sheaves mergeable
>       slab: add sheaves to most caches
>       slab: introduce percpu sheaves bootstrap
>       slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock()
>       slab: handle kmalloc sheaves bootstrap
>       slab: add optimized sheaf refill from partial list
>       slab: remove cpu (partial) slabs usage from allocation paths
>       slab: remove SLUB_CPU_PARTIAL
>       slab: remove the do_slab_free() fastpath
>       slab: remove defer_deactivate_slab()
>       slab: simplify kmalloc_nolock()
>       slab: remove struct kmem_cache_cpu
>       slab: remove unused PREEMPT_RT specific macros
>       slab: refill sheaves from all nodes
>       slab: update overview comments
>       slab: remove frozen slab checks from __slab_free()
>       mm/slub: remove DEACTIVATE_TO_* stat items
>       mm/slub: cleanup and repurpose some stat items
> 
>  include/linux/slab.h |    6 -
>  mm/Kconfig           |   11 -
>  mm/internal.h        |    1 +
>  mm/page_alloc.c      |    5 +
>  mm/slab.h            |   53 +-
>  mm/slab_common.c     |   56 +-
>  mm/slub.c            | 2591 +++++++++++++++++---------------------------------
>  7 files changed, 950 insertions(+), 1773 deletions(-)
> ---
> base-commit: aff9fb2fffa1175bd5ae3b4630f3d4ae53af450b
> change-id: 20251002-sheaves-for-all-86ac13dc47a5
> 
> Best regards,


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-12 15:16 ` [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
@ 2026-01-13  2:08   ` Harry Yoo
  2026-01-13  9:32     ` Vlastimil Babka
  2026-01-14  4:56   ` Harry Yoo
  1 sibling, 1 reply; 12+ messages in thread
From: Harry Yoo @ 2026-01-13  2:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> that.

Oops, my bad.

> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

The fix looks good to me, but I wonder why
`if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
didn't prevent the bug from happening?

>  mm/slab_common.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index eed7ea556cb1..ee994ec7f251 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>   */
>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
>  {
> -	if (s->cpu_sheaves)
> +	if (s->cpu_sheaves) {
>  		flush_rcu_sheaves_on_cache(s);
> +		rcu_barrier();
> +	}
> +
>  	/*
>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
>  	 * on a specific slab cache.
> 
> -- 
> 2.52.0
> 

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-13  2:08   ` Harry Yoo
@ 2026-01-13  9:32     ` Vlastimil Babka
  2026-01-13 12:31       ` Harry Yoo
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-13  9:32 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On 1/13/26 3:08 AM, Harry Yoo wrote:
> On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
>> After we submit the rcu_free sheaves to call_rcu() we need to make sure
>> the rcu callbacks complete. kvfree_rcu_barrier() does that via
>> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
>> that.
> 
> Oops, my bad.
> 
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
>> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
> 
> The fix looks good to me, but I wonder why
> `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> didn't prevent the bug from happening?

Hmm good point, didn't notice it's there.

I think it doesn't help because it happens only after
flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
will do sheaf_flush_unused() and end up installing the cpu slab again.

Because the bot flagged commit "slab: add sheaves to most caches" where
cpu slabs still exist. It's thus possible that with the full series, the
bug is gone. But we should prevent it upfront anyway. The rcu_barrier()
in __kmem_cache_shutdown() however is probably unnecessary then and we
can remove it, right?

>>  mm/slab_common.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index eed7ea556cb1..ee994ec7f251 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>>   */
>>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
>>  {
>> -	if (s->cpu_sheaves)
>> +	if (s->cpu_sheaves) {
>>  		flush_rcu_sheaves_on_cache(s);
>> +		rcu_barrier();
>> +	}
>> +
>>  	/*
>>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
>>  	 * on a specific slab cache.
>>
>> -- 
>> 2.52.0
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-13  9:32     ` Vlastimil Babka
@ 2026-01-13 12:31       ` Harry Yoo
  2026-01-13 13:09         ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Harry Yoo @ 2026-01-13 12:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
> On 1/13/26 3:08 AM, Harry Yoo wrote:
> > On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> >> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> >> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> >> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> >> that.
> > 
> > Oops, my bad.
> > 
> >> Reported-by: kernel test robot <oliver.sang@intel.com>
> >> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> >> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> > 
> > The fix looks good to me, but I wonder why
> > `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> > didn't prevent the bug from happening?
> 
> Hmm good point, didn't notice it's there.
> 
> I think it doesn't help because it happens only after
> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
> will do sheaf_flush_unused() and end up installing the cpu slab again.

I thought about it a little bit more...

It's not because a cpu slab was installed again (for list_slab_objects()
to be called on a slab, it must be on n->partial list), but because
flush_slab() cannot handle concurrent frees to the cpu slab.

CPU X                                CPU Y

- flush_slab() reads
  c->freelist
                                     rcu_free_sheaf_nobarn()
				     ->sheaf_flush_unused()
				     ->__kmem_cache_free_bulk()
				     ->do_slab_free()
				       -> sees slab == c->slab
				       -> frees to c->freelist
- c->slab = NULL,
  c->freelist = NULL
- call deactivate_slab()
  ^ the object freed by sheaf_flush_unused() is leaked,
    thus slab->inuse != 0

That said, flush_slab() works fine only when it is guaranteed that
there will be no concurrent frees to the cpu slab (acquiring local_lock
in flush_slab() doesn't help because free fastpath doesn't take it)

calling rcu_barrier() before flush_all_cpus_locked() ensures
there will be no concurrent frees.

A side question; I'm not sure how __kmem_cache_shrink(),
validate_slab_cache(), cpu_partial_store() are supposed to work
correctly? They call flush_all() without guaranteeing there will be
no concurrent frees to the cpu slab.

...probably doesn't matter after sheaves-for-all :)

> Because the bot flagged commit "slab: add sheaves to most caches" where
> cpu slabs still exist. It's thus possible that with the full series, the
> bug is gone. But we should prevent it upfront anyway.

> The rcu_barrier() in __kmem_cache_shutdown() however is probably
> unnecessary then and we can remove it, right?

Agreed. As it's called (after flushing rcu sheaves) in
kvfree_rcu_barrier_on_cache(), it's not necessary in
__kmem_cache_shutdown().

> >>  mm/slab_common.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index eed7ea556cb1..ee994ec7f251 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
> >>   */
> >>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
> >>  {
> >> -	if (s->cpu_sheaves)
> >> +	if (s->cpu_sheaves) {
> >>  		flush_rcu_sheaves_on_cache(s);
> >> +		rcu_barrier();
> >> +	}
> >> +
> >>  	/*
> >>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
> >>  	 * on a specific slab cache.

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-13 12:31       ` Harry Yoo
@ 2026-01-13 13:09         ` Vlastimil Babka
  2026-01-14 11:14           ` Harry Yoo
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-13 13:09 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On 1/13/26 1:31 PM, Harry Yoo wrote:
> On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
>> On 1/13/26 3:08 AM, Harry Yoo wrote:
>>> On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
>>>> After we submit the rcu_free sheaves to call_rcu() we need to make sure
>>>> the rcu callbacks complete. kvfree_rcu_barrier() does that via
>>>> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
>>>> that.
>>>
>>> Oops, my bad.
>>>
>>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>>> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
>>>> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>>> ---
>>>
>>> The fix looks good to me, but I wonder why
>>> `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
>>> didn't prevent the bug from happening?
>>
>> Hmm good point, didn't notice it's there.
>>
>> I think it doesn't help because it happens only after
>> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
>> will do sheaf_flush_unused() and end up installing the cpu slab again.
> 
> I thought about it a little bit more...
> 
> It's not because a cpu slab was installed again (for list_slab_objects()
> to be called on a slab, it must be on n->partial list), but because

Hmm that's true.

> flush_slab() cannot handle concurrent frees to the cpu slab.
> 
> CPU X                                CPU Y
> 
> - flush_slab() reads
>   c->freelist
>                                      rcu_free_sheaf_nobarn()
> 				     ->sheaf_flush_unused()
> 				     ->__kmem_cache_free_bulk()
> 				     ->do_slab_free()
> 				       -> sees slab == c->slab
> 				       -> frees to c->freelist
> - c->slab = NULL,
>   c->freelist = NULL
> - call deactivate_slab()
>   ^ the object freed by sheaf_flush_unused() is leaked,
>     thus slab->inuse != 0

But for this to be the same "c" it has to be the same cpu, not different
X and Y, no?
And that case is protected I think, the action by X with
local_lock_irqsave() prevents an irq handler to execute Y. Action Y is
using __update_cpu_freelist_fast to find out it was interrupted by X
messing with c-> fields.


> That said, flush_slab() works fine only when it is guaranteed that
> there will be no concurrent frees to the cpu slab (acquiring local_lock
> in flush_slab() doesn't help because free fastpath doesn't take it)
> 
> calling rcu_barrier() before flush_all_cpus_locked() ensures
> there will be no concurrent frees.
> 
> A side question; I'm not sure how __kmem_cache_shrink(),
> validate_slab_cache(), cpu_partial_store() are supposed to work
> correctly? They call flush_all() without guaranteeing there will be
> no concurrent frees to the cpu slab.
> 
> ...probably doesn't matter after sheaves-for-all :)
> 
>> Because the bot flagged commit "slab: add sheaves to most caches" where
>> cpu slabs still exist. It's thus possible that with the full series, the
>> bug is gone. But we should prevent it upfront anyway.
> 
>> The rcu_barrier() in __kmem_cache_shutdown() however is probably
>> unnecessary then and we can remove it, right?
> 
> Agreed. As it's called (after flushing rcu sheaves) in
> kvfree_rcu_barrier_on_cache(), it's not necessary in
> __kmem_cache_shutdown().
> 
>>>>  mm/slab_common.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>>> index eed7ea556cb1..ee994ec7f251 100644
>>>> --- a/mm/slab_common.c
>>>> +++ b/mm/slab_common.c
>>>> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>>>>   */
>>>>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
>>>>  {
>>>> -	if (s->cpu_sheaves)
>>>> +	if (s->cpu_sheaves) {
>>>>  		flush_rcu_sheaves_on_cache(s);
>>>> +		rcu_barrier();
>>>> +	}
>>>> +
>>>>  	/*
>>>>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
>>>>  	 * on a specific slab cache.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-12 15:16 ` [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
  2026-01-13  2:08   ` Harry Yoo
@ 2026-01-14  4:56   ` Harry Yoo
  1 sibling, 0 replies; 12+ messages in thread
From: Harry Yoo @ 2026-01-14  4:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> that.
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

LGTM,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

and I reproduced it locally and this resolves the issue, so:
Tested-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-13 13:09         ` Vlastimil Babka
@ 2026-01-14 11:14           ` Harry Yoo
  2026-01-14 13:02             ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Harry Yoo @ 2026-01-14 11:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On Tue, Jan 13, 2026 at 02:09:33PM +0100, Vlastimil Babka wrote:
> On 1/13/26 1:31 PM, Harry Yoo wrote:
> > On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
> >> On 1/13/26 3:08 AM, Harry Yoo wrote:
> >>> On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> >>>> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> >>>> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> >>>> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> >>>> that.
> >>>
> >>> Oops, my bad.
> >>>
> >>>> Reported-by: kernel test robot <oliver.sang@intel.com>
> >>>> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> >>>> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> >>>> Cc: stable@vger.kernel.org
> >>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >>>> ---
> >>>
> >>> The fix looks good to me, but I wonder why
> >>> `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> >>> didn't prevent the bug from happening?
> >>
> >> Hmm good point, didn't notice it's there.
> >>
> >> I think it doesn't help because it happens only after
> >> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
> >> will do sheaf_flush_unused() and end up installing the cpu slab again.
> > 
> > I thought about it a little bit more...
> > 
> > It's not because a cpu slab was installed again (for list_slab_objects()
> > to be called on a slab, it must be on n->partial list), but because
> 
> Hmm that's true.
> 
> > flush_slab() cannot handle concurrent frees to the cpu slab.
> > 
> > CPU X                                CPU Y
> > 
> > - flush_slab() reads
> >   c->freelist
> >                                      rcu_free_sheaf_nobarn()
> > 				     ->sheaf_flush_unused()
> > 				     ->__kmem_cache_free_bulk()
> > 				     ->do_slab_free()
> > 				       -> sees slab == c->slab
> > 				       -> frees to c->freelist
> > - c->slab = NULL,
> >   c->freelist = NULL
> > - call deactivate_slab()
> >   ^ the object freed by sheaf_flush_unused() is leaked,
> >     thus slab->inuse != 0
> 
> But for this to be the same "c" it has to be the same cpu, not different
> X and Y, no?

You're absolutely right! It just slipped my mind.

> And that case is protected I think, the action by X with
> local_lock_irqsave() prevents an irq handler to execute Y.
> Action Y is
> using __update_cpu_freelist_fast to find out it was interrupted by X
> messing with c-> fields.

Right.

Also, the test module is just freeing one object (with slab merging
disabled), so there is no concurrent freeing in the test.

For the record, an accurate analysis of the problem (as discussed
off-list):

It turns out the object freed by sheaf_flush_unused() was in KASAN
percpu quarantine list (confirmed by dumping the list) by the time
__kmem_cache_shutdown() returns an error.

Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
processed after kasan_cache_shutdown() finishes.

That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
because it's called after kasan_cache_shutdown().

Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
that it'll be added to the quarantine list before kasan_cache_shutdown()
is called. So it's a valid fix!

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-14 11:14           ` Harry Yoo
@ 2026-01-14 13:02             ` Vlastimil Babka
  2026-01-15 23:52               ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-14 13:02 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Petr Tesarik, Christoph Lameter, David Rientjes, Roman Gushchin,
	Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On 1/14/26 12:14, Harry Yoo wrote:
> For the record, an accurate analysis of the problem (as discussed
> off-list):
> 
> It turns out the object freed by sheaf_flush_unused() was in KASAN
> percpu quarantine list (confirmed by dumping the list) by the time
> __kmem_cache_shutdown() returns an error.
> 
> Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
> but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
> processed after kasan_cache_shutdown() finishes.
> 
> That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
> because it's called after kasan_cache_shutdown().
> 
> Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
> that it'll be added to the quarantine list before kasan_cache_shutdown()
> is called. So it's a valid fix!

Thanks a lot! Will incorporate to commit log.
This being KASAN-only means further reducing the urgency.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 00/20] slab: replace cpu (partial) slabs with sheaves
  2026-01-12 15:16 [PATCH RFC v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
  2026-01-12 15:16 ` [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
  2026-01-12 15:20 ` [PATCH v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
@ 2026-01-15 15:12 ` Vlastimil Babka
  2 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2026-01-15 15:12 UTC (permalink / raw)
  To: Harry Yoo, Petr Tesarik, Christoph Lameter, David Rientjes,
	Roman Gushchin
  Cc: Hao Li, Andrew Morton, Uladzislau Rezki, Liam R. Howlett,
	Suren Baghdasaryan, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On 1/12/26 16:16, Vlastimil Babka wrote:
> Percpu sheaves caching was introduced as opt-in but the goal was to
> eventually move all caches to them. This is the next step, enabling
> sheaves for all caches (except the two bootstrap ones) and then removing
> the per cpu (partial) slabs and lots of associated code.
> 
> Besides (hopefully) improved performance, this removes the rather
> complicated code related to the lockless fastpaths (using
> this_cpu_try_cmpxchg128/64) and its complications with PREEMPT_RT or
> kmalloc_nolock().
> 
> The lockless slab freelist+counters update operation using
> try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
> without repeating the "alien" array flushing of SLUB, and to allow
> flushing objects from sheaves to slabs mostly without the node
> list_lock.
> 
> This v2 is the first non-RFC. I would consider exposing the series to
> linux-next at this point.
> 
> Git branch for the v2:
>   https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=sheaves-for-all-v2

The current state with collected fixes:

https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/sheaves-for-all



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
  2026-01-14 13:02             ` Vlastimil Babka
@ 2026-01-15 23:52               ` Suren Baghdasaryan
  0 siblings, 0 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2026-01-15 23:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Harry Yoo, Petr Tesarik, Christoph Lameter, David Rientjes,
	Roman Gushchin, Hao Li, Andrew Morton, Uladzislau Rezki,
	Liam R. Howlett, Sebastian Andrzej Siewior, Alexei Starovoitov,
	linux-mm, linux-kernel, linux-rt-devel, bpf, kasan-dev,
	kernel test robot, stable

On Wed, Jan 14, 2026 at 1:02 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/14/26 12:14, Harry Yoo wrote:
> > For the record, an accurate analysis of the problem (as discussed
> > off-list):
> >
> > It turns out the object freed by sheaf_flush_unused() was in KASAN
> > percpu quarantine list (confirmed by dumping the list) by the time
> > __kmem_cache_shutdown() returns an error.
> >
> > Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
> > but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
> > processed after kasan_cache_shutdown() finishes.
> >
> > That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
> > because it's called after kasan_cache_shutdown().
> >
> > Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
> > that it'll be added to the quarantine list before kasan_cache_shutdown()
> > is called. So it's a valid fix!
>
> Thanks a lot! Will incorporate to commit log.
> This being KASAN-only means further reducing the urgency.

Thanks for the detailed explanation!

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-01-15 23:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 15:16 [PATCH RFC v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
2026-01-12 15:16 ` [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
2026-01-13  2:08   ` Harry Yoo
2026-01-13  9:32     ` Vlastimil Babka
2026-01-13 12:31       ` Harry Yoo
2026-01-13 13:09         ` Vlastimil Babka
2026-01-14 11:14           ` Harry Yoo
2026-01-14 13:02             ` Vlastimil Babka
2026-01-15 23:52               ` Suren Baghdasaryan
2026-01-14  4:56   ` Harry Yoo
2026-01-12 15:20 ` [PATCH v2 00/20] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
2026-01-15 15:12 ` [PATCH RFC " Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox