From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Hao Li <hao.li@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ming Lei <ming.lei@redhat.com>
Subject: Re: [PATCH slab/for-next-fixes] mm/slab: allow sheaf refill if blocking is not allowed
Date: Wed, 4 Mar 2026 19:03:20 +0900 [thread overview]
Message-ID: <aagDaOUvgMSipjXa@hyeyoo> (raw)
In-Reply-To: <a7494308-cec6-43c7-aa17-a438747b50c3@suse.cz>
On Wed, Mar 04, 2026 at 10:58:58AM +0100, Vlastimil Babka wrote:
> On 3/4/26 4:05 AM, Harry Yoo wrote:
> > On Mon, Mar 02, 2026 at 10:55:37AM +0100, Vlastimil Babka (SUSE) wrote:
> >> Ming Lei reported [1] a regression in the ublk null target benchmark due
> >> to sheaves. The profile shows that the alloc_from_pcs() fastpath fails
> >> and allocations fall back to ___slab_alloc(). It also shows the
> >> allocations happen through mempool_alloc().
> >>
> >> The strategy of mempool_alloc() is to call the underlying allocator
> >> (here slab) without __GFP_DIRECT_RECLAIM first. This does not play well
> >> with __pcs_replace_empty_main() checking for gfpflags_allow_blocking()
> >> to decide if it should refill an empty sheaf or fallback to the
> >> slowpath, so we end up falling back.
> >>
> >> We could change the mempool strategy but there might be other paths
> >> doing the same ting. So instead allow sheaf refill when blocking is not
> >> allowed, changing the condition to gfpflags_allow_spinning(). The
> >> original condition was unnecessarily restrictive.
> >>
> >> Note this doesn't fully resolve the regression [1] as another component
> >> of that are memoryless nodes, which is to be addressed separately.
> >>
> >> Reported-by: Ming Lei <ming.lei@redhat.com>
> >> Fixes: e47c897a2949 ("slab: add sheaves to most caches")
> >> Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> >> ---
> >> mm/slub.c | 21 +++++++++------------
> >> 1 file changed, 9 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index b1e9f16ba435..17b200695e9b 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -4632,11 +4631,8 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
> >> if (!full)
> >> return NULL;
> >>
> >> - /*
> >> - * we can reach here only when gfpflags_allow_blocking
> >> - * so this must not be an irq
> >> - */
> >> - local_lock(&s->cpu_sheaves->lock);
> >> + if (!local_trylock(&s->cpu_sheaves->lock))
> >> + goto barn_put;
> >
> > My AI buddy says (don't worry, I filtered it):
> > | When local_trylock() fails above, the function jumps to barn_put and returns
> > | pcs without holding the lock. This appears to violate the function's contract
> > | documented in the comment at the beginning of __pcs_replace_empty_main():
> > |
> > | "If not successful, returns NULL and the local lock unlocked."
> > |
> > | The caller in alloc_from_pcs() checks for NULL to detect failure:
> > |
> > | if (unlikely(pcs->main->size == 0)) {
> > | pcs = __pcs_replace_empty_main(s, pcs, gfp);
> > | if (unlikely(!pcs))
> > | return NULL;
> > | }
> > |
> > | If the trylock fails and pcs (non-NULL) is returned, the caller proceeds
> > | without realizing the lock was never re-acquired. This leads to accessing
> > | pcs->main without the lock and later trying to unlock a lock that isn't held.
> >
> > And the analysis sounds correct to me.
> >
> > perhaps it should be:
> >
> > if (!local_trylock(&s->cpu_sheaves->lock)) {
> > pcs = NULL;
> > goto barn_put;
> > }
>
> Thanks a lot Harry. In fact I realized this mistake after initially
> sending the patch to Ming in a reply, and fixed it locally (same as you
> suggest).
> Or so I thought, because the fix got apparently lost.
That happens sometimes, yeah :)
> So I'll do that now in slab/for-next-fixes
Thanks.
> Or actually I think a more robust way is to set pcs = NULL after the
> unlock, unconditionally, so I'll do that.
Oh, that sounds better!
> >> pcs = this_cpu_ptr(s->cpu_sheaves);
> >>
> >> /*
> >> @@ -4667,6 +4663,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
> >> return pcs;
> >> }
> >>
> >> +barn_put:
> >> barn_put_full_sheaf(barn, full);
> >> stat(s, BARN_PUT);
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-03-04 10:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 9:55 [PATCH slab/for-next-fixes] mm/slab: allow sheaf refill if blocking is not allowed Vlastimil Babka (SUSE)
2026-03-04 3:05 ` Harry Yoo
2026-03-04 9:58 ` Vlastimil Babka
2026-03-04 10:03 ` Harry Yoo [this message]
2026-03-04 7:44 ` Hao Li
2026-03-04 10:14 ` Vlastimil Babka (SUSE)
2026-03-05 1:39 ` Hao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aagDaOUvgMSipjXa@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=hao.li@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.lei@redhat.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.