From: Uladzislau Rezki <urezki@gmail.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Aishwarya Rambhadran <aishwarya.rambhadran@arm.com>
Cc: Aishwarya Rambhadran <aishwarya.rambhadran@arm.com>,
Vlastimil Babka <vbabka@suse.cz>,
Harry Yoo <harry.yoo@oracle.com>,
Petr Tesarik <ptesarik@suse.com>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hao Li <hao.li@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Suren Baghdasaryan <surenb@google.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Alexei Starovoitov <ast@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org,
kasan-dev@googlegroups.com,
kernel test robot <oliver.sang@intel.com>,
stable@vger.kernel.org, "Paul E. McKenney" <paulmck@kernel.org>,
ryan.roberts@arm.com
Subject: Re: [REGRESSION] slab: replace cpu (partial) slabs with sheaves
Date: Thu, 26 Mar 2026 19:16:10 +0100 [thread overview]
Message-ID: <acV36oPNFMgL4puz@milan> (raw)
In-Reply-To: <ed58493b-0369-4729-bcf7-bc89f72a7913@kernel.org>
On Thu, Mar 26, 2026 at 03:42:02PM +0100, Vlastimil Babka (SUSE) wrote:
> On 3/26/26 13:43, Aishwarya Rambhadran wrote:
> > Hi Vlastimil, Harry,
>
> Hi!
>
> > We have observed few kernel performance benchmark regressions,
> > mainly in perf & vmalloc workloads, when comparing v6.19 mainline
> > kernel results against later releases in the v7.0 cycle.
> > Independent bisections on different machines consistently point
> > to commits within the slab percpu sheaves series. However, towards
> > the end of the bisection, the signal becomes less clear, so it's
> > not yet certain which specific commit within the series is the
> > root cause.
> >
> > The workloads were triggered on AWS Graviton3 (arm64) & AWS Intel
> > Sapphire Rapids (x86_64) systems in which the regressions are
> > reproducible across different kernel release candidates.
> > (R)/(I) mean statistically significant regression/improvement,
> > where "statistically significant" means the 95% confidence
> > intervals do not overlap”.
> >
> > Below given are the performance benchmark results generated by
> > Fastpath Tool, for different kernel -rc versions relative to the
> > base version v6.19, executed on the mentioned SUTs. The perf/
> > syscall benchmarks (execve/fork) regress consistently by ~6–11% on
> > both arm64 and x86_64 across v7.0-rc1 to rc5, while vmalloc
> > workloads show smaller but stable regressions (~2–10%), particularly
> > in kvfree_rcu paths.
> >
> > Regressions on AWS Intel Sapphire Rapids (x86_64) :
>
> The table formatting is broken for me, can you resend it please? Maybe a
> .txt attachment would work better.
>
> > +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> > | Benchmark | Result Class | 6-19-0 (base) |
> > 7-0-0-rc1 | 7-0-0-rc2 | 7-0-0-rc2-gaf4e9ef3d784 | 7-0-0-rc3 |
> > 7-0-0-rc4 | 7-0-0-rc5 |
> > +=================+==========================================================+=================+=============+=============+===========================+=============+=============+=============+
> > | micromm/vmalloc | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000
> > (usec) | 262605.17 | -4.94% | -7.48% | (R)
> > -8.11% | -4.51% | -6.23% | -3.47% |
> > | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000
> > (usec) | 253198.67 | -7.56% | (R) -10.57% | (R)
> > -10.13% | (R) -7.07% | -6.37% | -6.55% |
> > | | pcpu_alloc_test: p:1, h:0, l:500000 (usec)
> > | 197904.67 | -2.07% | -3.38% | -2.07% |
> > -2.97% | (R) -4.30% | -3.39% |
> > | | random_size_align_alloc_test: p:1, h:0, l:500000
> > (usec) | 1707089.83 | -2.63% | (R) -3.69% |
> > (R) -3.25% | (R) -2.87% | -2.22% | (R) -3.63% |
> > +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> > | perf/syscall | execve (ops/sec) | 1202.92 | (R)
> > -7.15% | (R) -7.05% | (R) -7.03% | (R) -7.93% | (R) -6.51% |
> > (R) -7.36% |
> > | | fork (ops/sec) | 996.00 | (R)
> > -9.00% | (R) -10.27% | (R) -9.92% | (R) -11.19% | (R) -10.69% |
> > (R) -10.28% |
> > +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> >
> > Regressions on AWS Graviton3 (arm64) :
> > +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> > | Benchmark | Result Class | 6-19-0 (base) |
> > 7-0-0-rc1 | 7-0-0-rc2 | 7-0-0-rc2-gaf4e9ef3d784 | 7-0-0-rc3 |
> > 7-0-0-rc4 | 7-0-0-rc5 |
> > +=================+==========================================================+=================+=============+=============+===========================+=============+=============+=============+
> > | micromm/vmalloc | fix_size_alloc_test: p:1, h:0, l:500000 (usec)
> > | 320101.50 | (R) -4.72% | (R) -3.81% | (R)
> > -5.05% | -3.06% | -3.16% | (R) -3.91% |
> > | | fix_size_alloc_test: p:4, h:0, l:500000 (usec)
> > | 522072.83 | (R) -2.15% | -1.25% | (R)
> > -2.16% | (R) -2.13% | -2.10% | -1.82% |
> > | | fix_size_alloc_test: p:16, h:0, l:500000 (usec)
> > | 1041640.33 | -0.50% | (R) -2.04% |
> > -1.43% | -0.69% | -1.78% | (R) -2.03% |
> > | | fix_size_alloc_test: p:256, h:1, l:100000 (usec)
> > | 2255794.00 | -1.51% | (R) -2.24% | (R)
> > -2.33% | -1.14% | -0.94% | -1.60% |
> > | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000
> > (usec) | 343543.83 | (R) -4.50% | (R) -3.54% | (R)
> > -5.00% | (R) -4.88% | (R) -4.01% | (R) -5.54% |
> > | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000
> > (usec) | 342290.33 | (R) -5.15% | (R) -3.24% | (R)
> > -3.76% | (R) -5.37% | (R) -3.74% | (R) -5.51% |
> > | | random_size_align_alloc_test: p:1, h:0, l:500000
> > (usec) | 1209666.83 | -2.43% | -2.09% |
> > -1.19% | (R) -4.39% | -1.81% | -3.15% |
> > +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> > | perf/syscall | execve (ops/sec) | 1219.58 |
> > | (R) -8.12% | (R) -7.37% | (R) -7.60% | (R) -7.86%
> > | (R) -7.71% |
> > | | fork (ops/sec) | 863.67 |
> > | (R) -7.24% | (R) -7.07% | (R) -6.42% | (R) -6.93% |
> > (R) -6.55% |
> > +-----------------+----------------------------------------------------------+-----------------+-------------+-------------+---------------------------+-------------+-------------+-------------+
> >
> >
> > The details of latest bisections that were carried out for the above
> > listed regressions, are given below :
> > -Graviton3 (arm64)
> > good: v6.19 (05f7e89ab973)
> > bad: v7.0-rc2 (11439c4635ed)
> > workload: perf/syscall (execve)
> > bisected to: f1427a1d6415 (“slab: make percpu sheaves compatible with
> > kmalloc_nolock()/kfree_nolock()”)
> >
> > -Sapphire Rapids (x86_64)
> > good: v6.19 (05f7e89ab973)
> > bad: v7.0-rc3 (1f318b96cc84)
> > workload: perf/syscall (fork)
> > bisected to: f1427a1d6415 (“slab: make percpu sheaves compatible with
> > kmalloc_nolock()/kfree_nolock()”)
> >
> > -Graviton3 (arm64)
> > good: v6.19 (05f7e89ab973)
> > bad: v7.0-rc3 (1f318b96cc84)
> > workload: perf/syscall (execve)
> > bisected to: f3421f8d154c (“slab: introduce percpu sheaves bootstrap”)
>
> Yeah none of these are likely to introduce the regression.
> We've seen other reports from e.g. lkp pointing to later commits that remove
> the cpu (partial) slabs. The theory is that on benchmarks that stress vma
> and maple node caches (fork and execve are likely those), the introduction
> of sheaves in 6.18 (for those caches only) resulted in ~doubled percpu
> caching capacity (and likely associated performance increase) - by sheaves
> backed by cpu (partial) slabs,. Removing the latter then looks like a
> regression in isolation in the 7.0 series.
>
> A regression of vmalloc related to kvfree_rcu might be new. Although if it's
> kvfree_rcu() of vmalloc'd objects, it would be weird. More likely they are
> kvmalloc'd but small enough to be actually kmalloc'd? What are the details
> of that test?
>
static int
kvfree_rcu_2_arg_vmalloc_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = vmalloc(1 * PAGE_SIZE);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p, rcu);
}
return 0;
}
static bool kfree_rcu_sheaf(void *obj)
{
struct kmem_cache *s;
struct slab *slab;
if (is_vmalloc_addr(obj))
return false;
slab = virt_to_slab(obj);
if (unlikely(!slab))
return false;
s = slab->slab_cache;
if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) == numa_mem_id()))
return __kfree_rcu_sheaf(s, obj);
return false;
}
it does not go via sheaf since it is a vmalloc address.
--
Uladzislau Rezki
next prev parent reply other threads:[~2026-03-26 18:16 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-23 6:52 [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 01/22] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
2026-01-27 16:08 ` Liam R. Howlett
2026-01-23 6:52 ` [PATCH v4 02/22] mm/slab: fix false lockdep warning in __kfree_rcu_sheaf() Vlastimil Babka
2026-01-23 12:03 ` Sebastian Andrzej Siewior
2026-01-24 10:58 ` Harry Yoo
2026-01-23 6:52 ` [PATCH v4 03/22] slab: add SLAB_CONSISTENCY_CHECKS to SLAB_NEVER_MERGE Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 04/22] mm/slab: move and refactor __kmem_cache_alias() Vlastimil Babka
2026-01-27 16:17 ` Liam R. Howlett
2026-01-27 16:59 ` Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 05/22] mm/slab: make caches with sheaves mergeable Vlastimil Babka
2026-01-27 16:23 ` Liam R. Howlett
2026-01-23 6:52 ` [PATCH v4 06/22] slab: add sheaves to most caches Vlastimil Babka
2026-01-26 6:36 ` Hao Li
2026-01-26 8:39 ` Vlastimil Babka
2026-01-26 13:59 ` Breno Leitao
2026-01-27 16:34 ` Liam R. Howlett
2026-01-27 17:01 ` Vlastimil Babka
2026-01-29 7:24 ` Zhao Liu
2026-01-29 8:21 ` Vlastimil Babka
2026-01-30 7:15 ` Zhao Liu
2026-02-04 18:01 ` Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 07/22] slab: introduce percpu sheaves bootstrap Vlastimil Babka
2026-01-26 6:13 ` Hao Li
2026-01-26 8:42 ` Vlastimil Babka
2026-01-27 17:31 ` Liam R. Howlett
2026-01-23 6:52 ` [PATCH v4 08/22] slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock() Vlastimil Babka
2026-01-23 18:05 ` Alexei Starovoitov
2026-01-27 17:36 ` Liam R. Howlett
2026-01-29 8:25 ` Vlastimil Babka
2026-03-02 11:56 ` D, Suneeth
2026-03-02 12:16 ` Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 09/22] slab: handle kmalloc sheaves bootstrap Vlastimil Babka
2026-01-27 18:30 ` Liam R. Howlett
2026-01-23 6:52 ` [PATCH v4 10/22] slab: add optimized sheaf refill from partial list Vlastimil Babka
2026-01-26 7:12 ` Hao Li
2026-01-29 7:43 ` Harry Yoo
2026-01-29 8:29 ` Vlastimil Babka
2026-01-27 20:05 ` Liam R. Howlett
2026-01-29 8:01 ` Harry Yoo
2026-01-23 6:52 ` [PATCH v4 11/22] slab: remove cpu (partial) slabs usage from allocation paths Vlastimil Babka
2026-01-23 18:17 ` Alexei Starovoitov
2026-01-23 6:52 ` [PATCH v4 12/22] slab: remove SLUB_CPU_PARTIAL Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 13/22] slab: remove the do_slab_free() fastpath Vlastimil Babka
2026-01-23 18:15 ` Alexei Starovoitov
2026-01-23 6:52 ` [PATCH v4 14/22] slab: remove defer_deactivate_slab() Vlastimil Babka
2026-01-23 17:31 ` Alexei Starovoitov
2026-01-23 6:52 ` [PATCH v4 15/22] slab: simplify kmalloc_nolock() Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 16/22] slab: remove struct kmem_cache_cpu Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 17/22] slab: remove unused PREEMPT_RT specific macros Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 18/22] slab: refill sheaves from all nodes Vlastimil Babka
2026-01-27 14:28 ` Mateusz Guzik
2026-01-27 22:04 ` Vlastimil Babka
2026-01-29 9:16 ` Harry Yoo
2026-01-23 6:52 ` [PATCH v4 19/22] slab: update overview comments Vlastimil Babka
2026-01-23 6:52 ` [PATCH v4 20/22] slab: remove frozen slab checks from __slab_free() Vlastimil Babka
2026-01-29 7:16 ` Harry Yoo
2026-01-23 6:52 ` [PATCH v4 21/22] mm/slub: remove DEACTIVATE_TO_* stat items Vlastimil Babka
2026-01-29 7:21 ` Harry Yoo
2026-01-23 6:53 ` [PATCH v4 22/22] mm/slub: cleanup and repurpose some " Vlastimil Babka
2026-01-29 7:40 ` Harry Yoo
2026-01-29 15:18 ` [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Hao Li
2026-01-29 15:28 ` Vlastimil Babka
2026-01-29 16:06 ` Hao Li
2026-01-29 16:44 ` Liam R. Howlett
2026-01-30 4:38 ` Hao Li
2026-01-30 4:50 ` Hao Li
2026-01-30 6:17 ` Hao Li
2026-02-04 18:02 ` Vlastimil Babka
2026-02-04 18:24 ` Christoph Lameter (Ampere)
2026-02-06 16:44 ` Vlastimil Babka
2026-03-26 12:43 ` [REGRESSION] " Aishwarya Rambhadran
2026-03-26 14:42 ` Vlastimil Babka (SUSE)
2026-03-26 18:16 ` Uladzislau Rezki [this message]
2026-03-26 18:24 ` Vlastimil Babka (SUSE)
2026-03-26 18:50 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acV36oPNFMgL4puz@milan \
--to=urezki@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=aishwarya.rambhadran@arm.com \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=cl@gentwo.org \
--cc=hao.li@linux.dev \
--cc=harry.yoo@oracle.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=paulmck@kernel.org \
--cc=ptesarik@suse.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox