Re: [REGRESSION] slab: replace cpu (partial) slabs with sheaves

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: Aishwarya Rambhadran <aishwarya.rambhadran@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Harry Yoo <harry.yoo@oracle.com>,
	Petr Tesarik <ptesarik@suse.com>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hao Li <hao.li@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Alexei Starovoitov <ast@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org,
	kasan-dev@googlegroups.com,
	kernel test robot <oliver.sang@intel.com>,
	stable@vger.kernel.org, "Paul E. McKenney" <paulmck@kernel.org>,
	ryan.roberts@arm.com
Subject: Re: [REGRESSION] slab: replace cpu (partial) slabs with sheaves
Date: Thu, 26 Mar 2026 15:42:02 +0100	[thread overview]
Message-ID: <ed58493b-0369-4729-bcf7-bc89f72a7913@kernel.org> (raw)
In-Reply-To: <afe9ba0a-1924-42a8-a9c5-34eec709f883@arm.com>

On 3/26/26 13:43, Aishwarya Rambhadran wrote:
> Hi Vlastimil, Harry,

Hi!

> We have observed few kernel performance benchmark regressions,
> mainly in perf & vmalloc workloads, when comparing v6.19 mainline
> kernel results against later releases in the v7.0 cycle.
> Independent bisections on different machines consistently point
> to commits within the slab percpu sheaves series. However, towards
> the end of the bisection, the signal becomes less clear, so it's
> not yet certain which specific commit within the series is the
> root cause.
> 
> The workloads were triggered on AWS Graviton3 (arm64) & AWS Intel
> Sapphire Rapids (x86_64) systems in which the regressions are
> reproducible across different kernel release candidates.
> (R)/(I) mean statistically significant regression/improvement,
> where "statistically significant" means the 95% confidence
> intervals do not overlap”.
> 
> Below given are the performance benchmark results generated by
> Fastpath Tool, for different kernel -rc versions relative to the
> base version v6.19, executed on the mentioned SUTs. The perf/
> syscall benchmarks (execve/fork) regress consistently by ~6–11% on
> both arm64 and x86_64 across v7.0-rc1 to rc5, while vmalloc
> workloads show smaller but stable regressions (~2–10%), particularly
> in kvfree_rcu paths.
> 
> Regressions on AWS Intel Sapphire Rapids (x86_64) :

The table formatting is broken for me, can you resend it please? Maybe a
.txt attachment would work better.

> +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> | Benchmark       | Result Class            |   6-19-0 (base) |  
>   7-0-0-rc1 |   7-0-0-rc2 |  7-0-0-rc2-gaf4e9ef3d784 |   7-0-0-rc3 |  
>   7-0-0-rc4 |   7-0-0-rc5 |
> +=================+==========================================================+=================+=============+=============+===========================+=============+=============+=============+
> | micromm/vmalloc | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 
> (usec) |       262605.17 |      -4.94% |      -7.48% |             (R) 
> -8.11% |      -4.51% |      -6.23% |      -3.47% |
> |                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 
> (usec) |       253198.67 |      -7.56% | (R) -10.57% |            (R) 
> -10.13% |  (R) -7.07% |      -6.37% |      -6.55% |
> |                 | pcpu_alloc_test: p:1, h:0, l:500000 (usec)           
>   |       197904.67 |      -2.07% |      -3.38% |             -2.07% |  
>      -2.97% |  (R) -4.30% |      -3.39% |
> |                 | random_size_align_alloc_test: p:1, h:0, l:500000 
> (usec)  |      1707089.83 |      -2.63% |  (R) -3.69% |               
> (R) -3.25% |  (R) -2.87% |      -2.22% |  (R) -3.63% |
> +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> | perf/syscall    | execve (ops/sec)            |         1202.92 |  (R) 
> -7.15% |  (R) -7.05% |         (R) -7.03% |  (R) -7.93% |  (R) -6.51% |  
> (R) -7.36% |
> |                 | fork (ops/sec)            |          996.00 |  (R) 
> -9.00% | (R) -10.27% |         (R) -9.92% | (R) -11.19% | (R) -10.69% | 
> (R) -10.28% |
> +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> 
> Regressions on AWS Graviton3 (arm64) :
> +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> | Benchmark       | Result Class            |   6-19-0 (base) |  
>   7-0-0-rc1 |   7-0-0-rc2 |  7-0-0-rc2-gaf4e9ef3d784 |   7-0-0-rc3 |  
>   7-0-0-rc4 |   7-0-0-rc5 |
> +=================+==========================================================+=================+=============+=============+===========================+=============+=============+=============+
> | micromm/vmalloc | fix_size_alloc_test: p:1, h:0, l:500000 (usec)      
>       |       320101.50 |  (R) -4.72% |  (R) -3.81% |               (R) 
> -5.05% |      -3.06% |      -3.16% |  (R) -3.91% |
> |                 | fix_size_alloc_test: p:4, h:0, l:500000 (usec)      
>       |       522072.83 |  (R) -2.15% |      -1.25% |               (R) 
> -2.16% |  (R) -2.13% |      -2.10% |      -1.82% |
> |                 | fix_size_alloc_test: p:16, h:0, l:500000 (usec)      
>      |      1041640.33 |      -0.50% |  (R) -2.04% |                 
> -1.43% |      -0.69% |      -1.78% |  (R) -2.03% |
> |                 | fix_size_alloc_test: p:256, h:1, l:100000 (usec)    
>       |      2255794.00 |      -1.51% |  (R) -2.24% |             (R) 
> -2.33% |      -1.14% |      -0.94% |      -1.60% |
> |                 | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 
> (usec) |       343543.83 |  (R) -4.50% |  (R) -3.54% |             (R) 
> -5.00% |  (R) -4.88% |  (R) -4.01% |  (R) -5.54% |
> |                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 
> (usec) |       342290.33 |  (R) -5.15% |  (R) -3.24% |             (R) 
> -3.76% |  (R) -5.37% |  (R) -3.74% |  (R) -5.51% |
> |                 | random_size_align_alloc_test: p:1, h:0, l:500000 
> (usec)  |      1209666.83 |      -2.43% |      -2.09% |                 
>    -1.19% |  (R) -4.39% |      -1.81% |      -3.15% |
> +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> | perf/syscall    | execve (ops/sec)            |         1219.58 |      
>         |  (R) -8.12% |         (R) -7.37% |  (R) -7.60% |  (R) -7.86% 
> |  (R) -7.71% |
> |                 | fork (ops/sec)            |          863.67 |        
>       |  (R) -7.24% |         (R) -7.07% |  (R) -6.42% |  (R) -6.93% |  
> (R) -6.55% |
> +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> 
> 
> The details of latest bisections that were carried out for the above
> listed regressions, are given below :
> -Graviton3 (arm64)
>   good: v6.19 (05f7e89ab973)
>   bad:  v7.0-rc2 (11439c4635ed)
>   workload: perf/syscall (execve)
>   bisected to: f1427a1d6415 (“slab: make percpu sheaves compatible with
>   kmalloc_nolock()/kfree_nolock()”)
> 
> -Sapphire Rapids (x86_64)
>   good: v6.19 (05f7e89ab973)
>   bad:  v7.0-rc3 (1f318b96cc84)
>   workload: perf/syscall (fork)
>   bisected to: f1427a1d6415 (“slab: make percpu sheaves compatible with
>   kmalloc_nolock()/kfree_nolock()”)
> 
> -Graviton3 (arm64)
>   good: v6.19 (05f7e89ab973)
>   bad:  v7.0-rc3 (1f318b96cc84)
>   workload: perf/syscall (execve)
>   bisected to: f3421f8d154c (“slab: introduce percpu sheaves bootstrap”)

Yeah none of these are likely to introduce the regression.
We've seen other reports from e.g. lkp pointing to later commits that remove
the cpu (partial) slabs. The theory is that on benchmarks that stress vma
and maple node caches (fork and execve are likely those), the introduction
of sheaves in 6.18 (for those caches only) resulted in ~doubled percpu
caching capacity (and likely associated performance increase) - by sheaves
backed by cpu (partial) slabs,. Removing the latter then looks like a
regression in isolation in the 7.0 series.

A regression of vmalloc related to kvfree_rcu might be new. Although if it's
kvfree_rcu() of vmalloc'd objects, it would be weird. More likely they are
kvmalloc'd but small enough to be actually kmalloc'd? What are the details
of that test?

> I'm aware that some fixes for the sheaves series have already been
> merged around v7.0-rc3; however, these do not appear to resolve the
> regressions described above completely. Are there additional fixes or
> follow-ups in progress that I should evaluate? I can investigate
> further and provide additional data, if that would be useful.

We have some followups planned for 7.1 that would make a difference for
systems with memoryless nodes. That would mean "numactl -H" shows nodes that
have cpus but no memory, or that memory is all ZONE_MOVABLE and not ZONE_NORMAL.

Thanks,
Vlastimil

> Thank you.
> Aishwarya Rambhadran
> 
> 
> On 23/01/26 12:22 PM, Vlastimil Babka wrote:

next prev parent reply	other threads:[~2026-03-26 14:42 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23  6:52 [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 01/22] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
2026-01-27 16:08   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 02/22] mm/slab: fix false lockdep warning in __kfree_rcu_sheaf() Vlastimil Babka
2026-01-23 12:03   ` Sebastian Andrzej Siewior
2026-01-24 10:58     ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 03/22] slab: add SLAB_CONSISTENCY_CHECKS to SLAB_NEVER_MERGE Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 04/22] mm/slab: move and refactor __kmem_cache_alias() Vlastimil Babka
2026-01-27 16:17   ` Liam R. Howlett
2026-01-27 16:59     ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 05/22] mm/slab: make caches with sheaves mergeable Vlastimil Babka
2026-01-27 16:23   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 06/22] slab: add sheaves to most caches Vlastimil Babka
2026-01-26  6:36   ` Hao Li
2026-01-26  8:39     ` Vlastimil Babka
2026-01-26 13:59   ` Breno Leitao
2026-01-27 16:34   ` Liam R. Howlett
2026-01-27 17:01     ` Vlastimil Babka
2026-01-29  7:24   ` Zhao Liu
2026-01-29  8:21     ` Vlastimil Babka
2026-01-30  7:15       ` Zhao Liu
2026-02-04 18:01         ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 07/22] slab: introduce percpu sheaves bootstrap Vlastimil Babka
2026-01-26  6:13   ` Hao Li
2026-01-26  8:42     ` Vlastimil Babka
2026-01-27 17:31   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 08/22] slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock() Vlastimil Babka
2026-01-23 18:05   ` Alexei Starovoitov
2026-01-27 17:36   ` Liam R. Howlett
2026-01-29  8:25     ` Vlastimil Babka
2026-03-02 11:56   ` D, Suneeth
2026-03-02 12:16     ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 09/22] slab: handle kmalloc sheaves bootstrap Vlastimil Babka
2026-01-27 18:30   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 10/22] slab: add optimized sheaf refill from partial list Vlastimil Babka
2026-01-26  7:12   ` Hao Li
2026-01-29  7:43     ` Harry Yoo
2026-01-29  8:29       ` Vlastimil Babka
2026-01-27 20:05   ` Liam R. Howlett
2026-01-29  8:01   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 11/22] slab: remove cpu (partial) slabs usage from allocation paths Vlastimil Babka
2026-01-23 18:17   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 12/22] slab: remove SLUB_CPU_PARTIAL Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 13/22] slab: remove the do_slab_free() fastpath Vlastimil Babka
2026-01-23 18:15   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 14/22] slab: remove defer_deactivate_slab() Vlastimil Babka
2026-01-23 17:31   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 15/22] slab: simplify kmalloc_nolock() Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 16/22] slab: remove struct kmem_cache_cpu Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 17/22] slab: remove unused PREEMPT_RT specific macros Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 18/22] slab: refill sheaves from all nodes Vlastimil Babka
2026-01-27 14:28   ` Mateusz Guzik
2026-01-27 22:04     ` Vlastimil Babka
2026-01-29  9:16   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 19/22] slab: update overview comments Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 20/22] slab: remove frozen slab checks from __slab_free() Vlastimil Babka
2026-01-29  7:16   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 21/22] mm/slub: remove DEACTIVATE_TO_* stat items Vlastimil Babka
2026-01-29  7:21   ` Harry Yoo
2026-01-23  6:53 ` [PATCH v4 22/22] mm/slub: cleanup and repurpose some " Vlastimil Babka
2026-01-29  7:40   ` Harry Yoo
2026-01-29 15:18 ` [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Hao Li
2026-01-29 15:28   ` Vlastimil Babka
2026-01-29 16:06     ` Hao Li
2026-01-29 16:44       ` Liam R. Howlett
2026-01-30  4:38         ` Hao Li
2026-01-30  4:50     ` Hao Li
2026-01-30  6:17       ` Hao Li
2026-02-04 18:02       ` Vlastimil Babka
2026-02-04 18:24         ` Christoph Lameter (Ampere)
2026-02-06 16:44           ` Vlastimil Babka
2026-03-26 12:43 ` [REGRESSION] " Aishwarya Rambhadran
2026-03-26 14:42   ` Vlastimil Babka (SUSE) [this message]
2026-03-26 18:16     ` Uladzislau Rezki
2026-03-26 18:24       ` Vlastimil Babka (SUSE)
2026-03-26 18:50         ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed58493b-0369-4729-bcf7-bc89f72a7913@kernel.org \
    --to=vbabka@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aishwarya.rambhadran@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=paulmck@kernel.org \
    --cc=ptesarik@suse.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox