From: Harry Yoo <harry.yoo@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, linux-mm <linux-mm@kvack.org>,
Vlastimil Babka <vbabka@suse.cz>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Sebastian Sewior <bigeasy@linutronix.de>,
Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: SLAB_NO_CMPXCHG was:: [PATCH 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
Date: Wed, 25 Jun 2025 20:38:23 +0900 [thread overview]
Message-ID: <aFvfr1KiNrLofavW@hyeyoo> (raw)
In-Reply-To: <CAADnVQKQ-kqpO_vkyDcaUdvrvntWjwUfDy00SOawuRMrx0rfzw@mail.gmail.com>
On Tue, Jun 24, 2025 at 10:13:49AM -0700, Alexei Starovoitov wrote:
> On Thu, May 8, 2025 at 6:03 PM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > + s = kmalloc_slab(size, NULL, alloc_gfp, _RET_IP_);
> > > +
> > > + if (!(s->flags & __CMPXCHG_DOUBLE))
> > > + /*
> > > + * kmalloc_nolock() is not supported on architectures that
> > > + * don't implement cmpxchg16b.
> > > + */
> > > + return NULL;
> >
> > Hmm when someone uses slab debugging flags (e.g., passing boot
> > parameter slab_debug=FPZ as a hardening option on production [1], or
> > just for debugging), __CMPXCHG_DOUBLE is not set even when the arch
> > supports it.
> >
> > Is it okay to fail all kmalloc_nolock() calls in such cases?
>
> I studied the code and the git history.
> Looks like slub doesn't have to disable cmpxchg mode when slab_debug is on.
A slight correction; Debug caches do not use cmpxchg mode at all by
design. If a future change enables cmpxchg mode for them, it will cause
the same consistency issue.
> The commit 41bec7c33f37 ("mm/slub: remove slab_lock() usage for debug
> operations")
> removed slab_lock from debug validation checks.
An excellent point!
Yes, SLUB does not maintain cpu slab and percpu partial slabs on
debug caches. Alloc/free is done under n->list_lock, so no need for
cmpxchg double and slab_lock() at all :)
> So right now slab_lock() only serializes slab->freelist/counter update.
> It's still necessary on arch-s that don't have cmpxchg, but that's it.
> Only __update_freelist_slow() is using it.
Yes.
> The comment next to SLAB_NO_CMPXCHG is obsolete as well.
> It's been there since days that slab_lock() was taken during
> consistency checks.
Yes.
> I think the following diff is appropriate:
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 044e43ee3373..9d615cfd1b6f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -286,14 +286,6 @@ static inline bool
> kmem_cache_has_cpu_partial(struct kmem_cache *s)
> #define DEBUG_DEFAULT_FLAGS (SLAB_CONSISTENCY_CHECKS | SLAB_RED_ZONE | \
> SLAB_POISON | SLAB_STORE_USER)
>
> -/*
> - * These debug flags cannot use CMPXCHG because there might be consistency
> - * issues when checking or reading debug information
> - */
> -#define SLAB_NO_CMPXCHG (SLAB_CONSISTENCY_CHECKS | SLAB_STORE_USER | \
> - SLAB_TRACE)
> -
> -
>
> /*
> * Debugging flags that require metadata to be stored in the slab. These get
> * disabled when slab_debug=O is used and a cache's min order increases with
> @@ -6654,7 +6646,7 @@ int do_kmem_cache_create(struct kmem_cache *s,
> const char *name,
> }
>
> #ifdef system_has_freelist_aba
> - if (system_has_freelist_aba() && !(s->flags & SLAB_NO_CMPXCHG)) {
> + if (system_has_freelist_aba()) {
> /* Enable fast mode */
> s->flags |= __CMPXCHG_DOUBLE;
> }
>
> It survived my stress tests.
> Thoughts?
Perhaps it's better to change the condition in
kmalloc_nolock_noprof() from;
if (!(s->flags & __CMPXCHG_DOUBLE))
return NULL;
to;
if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
return NULL;
Because debug caches do not use cmpxchg double (and that's why
it survived your test), it is not accurate to set __CMPXCHG_DOUBLE.
And it'll get into trouble anyway if debug caches use cmpxchg double.
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-06-25 11:39 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 3:27 [PATCH 0/6] mm: Reentrant kmalloc Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 1/6] mm: Rename try_alloc_pages() to alloc_pages_nolock() Alexei Starovoitov
2025-05-06 8:26 ` Vlastimil Babka
2025-05-07 1:24 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 2/6] locking/local_lock: Expose dep_map in local_trylock_t Alexei Starovoitov
2025-05-06 12:56 ` Vlastimil Babka
2025-05-06 14:55 ` Vlastimil Babka
2025-05-07 1:25 ` Alexei Starovoitov
2025-05-12 13:26 ` Sebastian Andrzej Siewior
2025-05-12 16:46 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 3/6] locking/local_lock: Introduce local_lock_is_locked() Alexei Starovoitov
2025-05-06 12:59 ` Vlastimil Babka
2025-05-07 1:28 ` Alexei Starovoitov
2025-05-12 14:56 ` Sebastian Andrzej Siewior
2025-05-12 15:01 ` Vlastimil Babka
2025-05-12 15:23 ` Sebastian Andrzej Siewior
2025-05-01 3:27 ` [PATCH 4/6] locking/local_lock: Introduce local_lock_irqsave_check() Alexei Starovoitov
2025-05-07 13:02 ` Vlastimil Babka
2025-05-12 14:03 ` Sebastian Andrzej Siewior
2025-05-12 17:16 ` Alexei Starovoitov
2025-05-13 6:58 ` Vlastimil Babka
2025-05-13 21:55 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 5/6] mm: Allow GFP_ACCOUNT and GFP_COMP to be used in alloc_pages_nolock() Alexei Starovoitov
2025-05-06 8:55 ` Vlastimil Babka
2025-05-07 1:33 ` Alexei Starovoitov
2025-05-01 3:27 ` [PATCH 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock() Alexei Starovoitov
2025-05-05 18:46 ` Shakeel Butt
2025-05-06 0:49 ` Alexei Starovoitov
2025-05-06 1:24 ` Shakeel Butt
2025-05-06 1:51 ` Alexei Starovoitov
2025-05-06 18:05 ` Shakeel Butt
2025-05-06 12:01 ` Vlastimil Babka
2025-05-07 0:31 ` Harry Yoo
2025-05-07 2:23 ` Alexei Starovoitov
2025-05-07 8:38 ` Vlastimil Babka
2025-05-07 2:20 ` Alexei Starovoitov
2025-05-07 10:44 ` Vlastimil Babka
2025-05-09 1:03 ` Harry Yoo
2025-06-24 17:13 ` SLAB_NO_CMPXCHG was:: " Alexei Starovoitov
2025-06-25 11:38 ` Harry Yoo [this message]
2025-06-26 20:03 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFvfr1KiNrLofavW@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.