From: Harry Yoo <harry.yoo@oracle.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-block@vger.kernel.org, Hao Li <hao.li@linux.dev>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation
Date: Wed, 11 Mar 2026 10:10:13 +0900 [thread overview]
Message-ID: <abDA9UrJBT1wXh22@hyeyoo> (raw)
In-Reply-To: <aaqq7YmUcOht3GWH@fedora>
On Fri, Mar 06, 2026 at 06:22:37PM +0800, Ming Lei wrote:
> On Fri, Mar 06, 2026 at 09:47:27AM +0100, Vlastimil Babka (SUSE) wrote:
> > On 3/6/26 05:55, Harry Yoo wrote:
> > > On Thu, Feb 26, 2026 at 07:02:11PM +0100, Vlastimil Babka (SUSE) wrote:
> > >> On 2/25/26 10:31, Ming Lei wrote:
> > >> > Hi Vlastimil,
> > >> >
> > >> > On Wed, Feb 25, 2026 at 09:45:03AM +0100, Vlastimil Babka (SUSE) wrote:
> > >> >> On 2/24/26 21:27, Vlastimil Babka wrote:
> > >> >> >
> > >> >> > It made sense to me not to refill sheaves when we can't reclaim, but I
> > >> >> > didn't anticipate this interaction with mempools. We could change them
> > >> >> > but there might be others using a similar pattern. Maybe it would be for
> > >> >> > the best to just drop that heuristic from __pcs_replace_empty_main()
> > >> >> > (but carefully as some deadlock avoidance depends on it, we might need
> > >> >> > to e.g. replace it with gfpflags_allow_spinning()). I'll send a patch
> > >> >> > tomorrow to test this theory, unless someone beats me to it (feel free to).
> > >> >> Could you try this then, please? Thanks!
> > >> >
> > >> > Thanks for working on this issue!
> > >> >
> > >> > Unfortunately the patch doesn't make a difference on IOPS in the perf test,
> > >> > follows the collected perf profile on linus tree(basically 7.0-rc1 with your patch):
> > >>
> > >> what about this patch in addition to the previous one? Thanks.
> > >>
> > >> ----8<----
> > >> From d3e8118c078996d1372a9f89285179d93971fdb2 Mon Sep 17 00:00:00 2001
> > >> From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
> > >> Date: Thu, 26 Feb 2026 18:59:56 +0100
> > >> Subject: [PATCH] mm/slab: put barn on every online node
> > >>
> > >> Including memoryless nodes.
> > >>
> > >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > >> ---
> > >
> > > Just taking a quick grasp...
> > >
> > >> @@ -6121,7 +6122,8 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> > >> if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)))
> > >> return;
> > >>
> > >> - if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) == numa_mem_id())
> > >> + if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> > >> + || !node_isset(slab_nid(slab), slab_nodes))
> > >
> > > I think you intended !node_isset(numa_mem_id(), slab_nodes)?
> > >
> > > "Skip freeing to pcs if it's remote free, but memoryless nodes is
> > > an exception".
> >
> > Indeed, thanks! Ming, could you retry with that fixed up please?
>
> After applying the following change, IOPS is ~25M:
>
> - delta change on the two patches
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 085fe49eec68..56fe8bd956c0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -6142,7 +6142,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> return;
>
> if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> - || !node_isset(slab_nid(slab), slab_nodes))
> + || !node_isset(numa_mem_id(), slab_nodes))
> && likely(!slab_test_pfmemalloc(slab))) {
> if (likely(free_to_pcs(s, object, true)))
> return;
>
Hi Ming, thanks a lot for helping testing!
The stats look quite fine to me, but we're still seeing suboptimal IOPS.
> - slab stat on patched `815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next`
Does that doesn't include Vlastimil's (fb1091febd66 mm/slab: allow sheaf
refill if blocking is not allowed)?
Next time when testing it, could you please test on top of 7.0-rc3 w/
the memoryless node patch (w/ the delta above) applied?
Also, let us check a few things...
1) Does bumping up sheaf capacity change the slab stats & IOPS?
diff --git a/mm/slub.c b/mm/slub.c
index 0c906fefc31b..5207279417e2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7611,13 +7611,13 @@ static unsigned int calculate_sheaf_capacity(struct kmem_cache *s,
* should result in similar lock contention (barn or list_lock)
*/
if (s->size >= PAGE_SIZE)
- capacity = 4;
+ capacity = 6;
else if (s->size >= 1024)
- capacity = 12;
+ capacity = 24;
else if (s->size >= 256)
- capacity = 26;
+ capacity = 52;
else
- capacity = 60;
+ capacity = 120;
/* Increment capacity to make sheaf exactly a kmalloc size bucket */
size = struct_size_t(struct slab_sheaf, objects, capacity);
2) Is there any change in NUMA locality between v6.19 vs. v7.0-rc3 (patched)?
(e.g., measured via
perf stat -e node-loads,node-load-misses,node-stores,node-store-misses)
3) It's quite strange that blk_mq_sched_bio_merge() completely
disappeared in v7.0-rc2 profile [1] . Is there any change
in read/write io merge rate? (/proc/diskstats) between v6.19 and
v7.0-rc3?
[1] https://lore.kernel.org/linux-mm/aamluV66pLIdo66g@fedora
> # (cd /sys/kernel/slab/bio-256/ && find . -type f -exec grep -aH . {} \;)
> ./remote_node_defrag_ratio:100
> ./total_objects:7395 N1=3876 N5=3519
> ./alloc_fastpath:507619662 C0=70 C1=27608632 C3=28990301 C5=35098386 C6=9 C7=35782152 C8=115 C9=31757274 C10=32 C11=30087065 C12=34 C13=31615065 C14=7 C15=31798233 C17=30695955 C18=128 C19=32204853 C20=64 C21=36842392 C23=36212376 C25=30013640 C27=29055001 C29=29990232 C30=48 C31=29867595 C36=2 C50=1
> ./cpu_slabs:0
> ./objects:7232 N1=3816 N5=3416
> ./sheaf_return_slow:0
> ./objects_partial:500 N1=195 N5=305
> ./sheaf_return_fast:0
> ./cpu_partial:0
> ./free_slowpath:20 C4=20
> ./barn_get_fail:260 C1=6 C3=26 C5=26 C7=7 C9=5 C10=2 C11=26 C12=2 C13=10 C14=1 C15=19 C17=8 C18=5 C19=19 C20=1 C21=9 C23=22 C25=11 C27=21 C29=26 C31=6 C36=1 C50=1
> ./sheaf_prefill_oversize:0
> ./skip_kfence:0
> ./min_partial:5
> ./order_fallback:0
> ./sheaf_capacity:28
> ./sheaf_flush:28 C24=28
> ./free_rcu_sheaf:0
> ./sheaf_alloc:178 C0=4 C2=9 C3=1 C4=9 C5=65 C6=4 C8=5 C10=8 C11=1 C12=4 C13=1 C14=8 C15=1 C16=5 C18=8 C19=1 C20=3 C22=10 C23=1 C24=5 C25=1 C26=7 C27=1 C28=10 C29=1 C30=2 C31=1 C36=1 C50=1
> ./sheaf_free:0
> ./sheaf_prefill_slow:0
> ./sheaf_prefill_fast:0
> ./poison:0
> ./red_zone:0
> ./free_slab:0
> ./slabs:145 N1=76 N5=69
> ./barn_get:18129029 C0=3 C1=986017 C3=1035342 C5=1253488 C6=1 C7=1277927 C8=5 C9=1134184 C11=1074513 C13=1129100 C15=1135633 C17=1096277 C19=1150155 C20=2 C21=1315791 C23=1293278 C25=1071905 C27=1037658 C29=1071054 C30=2 C31=1066694
> ./alloc_slowpath:0
> ./destroy_by_rcu:1
> ./free_rcu_sheaf_fail:0
> ./barn_put:18129105 C0=986015 C2=1035357 C4=1253502 C6=1277924 C8=1134182 C10=1074529 C12=1129101 C14=1135641 C16=1096273 C18=1150168 C20=1315792 C22=1293288 C24=1071905 C26=1037668 C28=1071069 C30=1066691
> ./usersize:0
> ./sanity_checks:0
> ./barn_put_fail:1 C24=1
> ./align:64
> ./alloc_node_mismatch:0
> ./alloc_slab:145 C1=3 C3=19 C5=6 C7=3 C9=3 C10=2 C11=18 C12=2 C13=6 C14=1 C15=12 C17=8 C18=3 C19=12 C21=2 C23=5 C25=7 C27=12 C29=15 C31=4 C36=1 C50=1
> ./free_remove_partial:0
> ./aliases:0
> ./store_user:0
> ./trace:0
> ./reclaim_account:0
> ./order:2
> ./sheaf_refill:7280 C1=168 C3=728 C5=728 C7=196 C9=140 C10=56 C11=728 C12=56 C13=280 C14=28 C15=532 C17=224 C18=140 C19=532 C20=28 C21=252 C23=616 C25=308 C27=588 C29=728 C31=168 C36=28 C50=28
> ./object_size:256
> ./free_fastpath:507615526 C0=27608438 C2=28990052 C4=35098103 C6=35781903 C8=31757101 C10=30086841 C12=31614841 C14=31797983 C16=30695700 C18=32204722 C19=1 C20=36842201 C22=36212117 C24=30013416 C26=29054742 C28=29989974 C30=29867383 C31=4 C39=2 C47=2
> ./hwcache_align:1
> ./cmpxchg_double_fail:0
> ./objs_per_slab:51
> ./partial:13 N1=5 N5=8
> ./slabs_cpu_partial:0(0)
> ./free_add_partial:117 C1=3 C3=7 C5=19 C7=4 C9=2 C11=8 C13=4 C15=7 C18=2 C19=7 C20=1 C21=7 C23=17 C24=3 C25=4 C27=9 C29=11 C31=2
> ./slab_size:320
> ./cache_dma:0
>
>
> Thanks,
> Ming
>
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-03-11 1:10 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 2:52 [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation Ming Lei
2026-02-24 5:00 ` Harry Yoo
2026-02-24 9:07 ` Ming Lei
2026-02-25 5:32 ` Hao Li
2026-02-25 6:54 ` Harry Yoo
2026-02-25 7:06 ` Hao Li
2026-02-25 7:19 ` Harry Yoo
2026-02-25 8:19 ` Hao Li
2026-02-25 8:41 ` Harry Yoo
2026-02-25 8:54 ` Hao Li
2026-02-25 8:21 ` Harry Yoo
2026-02-24 6:51 ` Hao Li
2026-02-24 7:10 ` Harry Yoo
2026-02-24 7:41 ` Hao Li
2026-02-24 20:27 ` Vlastimil Babka
2026-02-25 5:24 ` Harry Yoo
2026-02-25 8:45 ` Vlastimil Babka (SUSE)
2026-02-25 9:31 ` Ming Lei
2026-02-25 11:29 ` Vlastimil Babka (SUSE)
2026-02-25 12:24 ` Ming Lei
2026-02-25 13:22 ` Vlastimil Babka (SUSE)
2026-02-26 18:02 ` Vlastimil Babka (SUSE)
2026-02-27 9:23 ` Ming Lei
2026-03-05 13:05 ` Vlastimil Babka (SUSE)
2026-03-05 15:48 ` Ming Lei
2026-03-06 1:01 ` Ming Lei
2026-03-06 4:17 ` Hao Li
2026-03-06 4:55 ` Harry Yoo
2026-03-06 8:32 ` Hao Li
2026-03-06 8:47 ` Vlastimil Babka (SUSE)
2026-03-06 10:22 ` Ming Lei
2026-03-11 1:10 ` Harry Yoo [this message]
2026-03-11 10:15 ` Ming Lei
2026-03-11 10:43 ` Ming Lei
2026-03-12 4:11 ` Harry Yoo
2026-03-12 11:26 ` Hao Li
2026-03-12 11:56 ` Ming Lei
2026-03-12 12:13 ` Hao Li
2026-03-12 14:50 ` Ming Lei
2026-03-13 3:26 ` Hao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abDA9UrJBT1wXh22@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=hao.li@linux.dev \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.lei@redhat.com \
--cc=vbabka@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.