From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>, bpf <bpf@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, Harry Yoo <harry.yoo@oracle.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 3/6] locking/local_lock: Introduce local_lock_lockdep_start/end()
Date: Mon, 14 Jul 2025 10:52:10 -0700 [thread overview]
Message-ID: <CAADnVQLORq64ezK+gaU=Q2F2KyCYOBZiVE0aaJuqK=xfUwMFiw@mail.gmail.com> (raw)
In-Reply-To: <20250714110639.uOaKJEfL@linutronix.de>
On Mon, Jul 14, 2025 at 4:06 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-07-11 19:19:26 [-0700], Alexei Starovoitov wrote:
> > > If there is no parent check then we could do "normal lock" on both
> > > sides.
> >
> > How would ___slab_alloc() know whether there was a parent check or not?
> >
> > imo keeping local_lock_irqsave() as-is is cleaner,
> > since if there is no parent check lockdep will rightfully complain.
>
> what about this:
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 7e2ffe1d46c6c..3520d1c25c205 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3693,6 +3693,34 @@ static inline void *freeze_slab(struct kmem_cache *s, struct slab *slab)
> return freelist;
> }
>
> +static void local_lock_cpu_slab(struct kmem_cache *s, const gfp_t gfp_flags,
> + unsigned long *flags)
> +{
> + bool allow_spin = gfpflags_allow_spinning(gfp_flags);
> +
> + /*
> + * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
> + * can be acquired without a deadlock before invoking the function.
> + *
> + * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
> + * disabled context. The lock will always be acquired and if needed it
> + * block and sleep until the lock is available.
> + *
> + * On !PREEMPT_RT allocations from any context but NMI are safe. The lock
> + * is always acquired with disabled interrupts meaning it is always
> + * possible to it.
> + * In NMI context it is needed to check if the lock is acquired. If it is not,
> + * it is safe to acquire it. The trylock semantic is used to tell lockdep
> + * that we don't spin. The BUG_ON() will not trigger if it is safe to acquire
> + * the lock.
> + *
> + */
> + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && !allow_spin)
> + BUG_ON(!local_trylock_irqsave(&s->cpu_slab->lock, *flags));
> + else
> + local_lock_irqsave(&s->cpu_slab->lock, *flags);
> +}
the patch misses these two:
diff --git a/mm/slub.c b/mm/slub.c
index 36779519b02c..2f30b85fbf68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3260,7 +3260,7 @@ static void put_cpu_partial(struct kmem_cache
*s, struct slab *slab, int drain)
unsigned long flags;
int slabs = 0;
- local_lock_irqsave(&s->cpu_slab->lock, flags);
+ local_lock_cpu_slab(s, 0, &flags);
oldslab = this_cpu_read(s->cpu_slab->partial);
@@ -4889,8 +4889,9 @@ static __always_inline void do_slab_free(struct
kmem_cache *s,
goto redo;
}
} else {
+ long flags;
/* Update the free list under the local lock */
- local_lock(&s->cpu_slab->lock);
+ local_lock_cpu_slab(s, 0, &flags);
c = this_cpu_ptr(s->cpu_slab);
if (unlikely(slab != c->slab)) {
local_unlock(&s->cpu_slab->lock);
I realized that the latter one was missing local_lock_lockdep_start/end()
in my patch as well, but that's secondary.
So with above it works on !RT,
but on RT lockdep complains as I explained earlier.
With yours and above hunks applied here is full lockdep splat:
[ 39.819636] ============================================
[ 39.819638] WARNING: possible recursive locking detected
[ 39.819641] 6.16.0-rc5-00342-gc8aca7837440-dirty #54 Tainted: G O
[ 39.819645] --------------------------------------------
[ 39.819646] page_alloc_kthr/2306 is trying to acquire lock:
[ 39.819650] ff110001f5cbea88 ((&c->lock)){+.+.}-{3:3}, at:
___slab_alloc+0xb7/0xec0
[ 39.819667]
[ 39.819667] but task is already holding lock:
[ 39.819668] ff110001f5cbfe88 ((&c->lock)){+.+.}-{3:3}, at:
___slab_alloc+0xb7/0xec0
[ 39.819677]
[ 39.819677] other info that might help us debug this:
[ 39.819678] Possible unsafe locking scenario:
[ 39.819678]
[ 39.819679] CPU0
[ 39.819680] ----
[ 39.819681] lock((&c->lock));
[ 39.819684] lock((&c->lock));
[ 39.819687]
[ 39.819687] *** DEADLOCK ***
[ 39.819687]
[ 39.819687] May be due to missing lock nesting notation
[ 39.819687]
[ 39.819689] 2 locks held by page_alloc_kthr/2306:
[ 39.819691] #0: ff110001f5cbfe88 ((&c->lock)){+.+.}-{3:3}, at:
___slab_alloc+0xb7/0xec0
[ 39.819700] #1: ffffffff8588f3a0 (rcu_read_lock){....}-{1:3}, at:
rt_spin_lock+0x197/0x250
[ 39.819710]
[ 39.819710] stack backtrace:
[ 39.819714] CPU: 1 UID: 0 PID: 2306 Comm: page_alloc_kthr Tainted:
G O 6.16.0-rc5-00342-gc8aca7837440-dirty #54
PREEMPT_RT
[ 39.819721] Tainted: [O]=OOT_MODULE
[ 39.819723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 39.819726] Call Trace:
[ 39.819729] <TASK>
[ 39.819734] dump_stack_lvl+0x5b/0x80
[ 39.819740] print_deadlock_bug.cold+0xbd/0xca
[ 39.819747] __lock_acquire+0x12ad/0x2590
[ 39.819753] ? __lock_acquire+0x42b/0x2590
[ 39.819758] lock_acquire+0x133/0x2d0
[ 39.819763] ? ___slab_alloc+0xb7/0xec0
[ 39.819769] ? try_to_take_rt_mutex+0x624/0xfc0
[ 39.819773] ? __lock_acquire+0x42b/0x2590
[ 39.819778] rt_spin_lock+0x6f/0x250
[ 39.819783] ? ___slab_alloc+0xb7/0xec0
[ 39.819788] ? rtlock_slowlock_locked+0x5c60/0x5c60
[ 39.819792] ? rtlock_slowlock_locked+0xc3/0x5c60
[ 39.819798] ___slab_alloc+0xb7/0xec0
[ 39.819803] ? __lock_acquire+0x42b/0x2590
[ 39.819809] ? my_debug_callback+0x20e/0x390 [bpf_testmod]
[ 39.819826] ? __lock_acquire+0x42b/0x2590
[ 39.819830] ? rt_read_unlock+0x2f0/0x2f0
[ 39.819835] ? my_debug_callback+0x20e/0x390 [bpf_testmod]
[ 39.819844] ? kmalloc_nolock_noprof+0x15a/0x430
[ 39.819849] kmalloc_nolock_noprof+0x15a/0x430
[ 39.819857] my_debug_callback+0x20e/0x390 [bpf_testmod]
[ 39.819867] ? page_alloc_kthread+0x320/0x320 [bpf_testmod]
[ 39.819875] ? lock_is_held_type+0x85/0xe0
[ 39.819881] ___slab_alloc+0x256/0xec0
[ 39.819898] ? lock_acquire+0x133/0x2d0
[ 39.819927] ? __kmalloc_cache_noprof+0xd6/0x3b0
[ 39.819932] __kmalloc_cache_noprof+0xd6/0x3b0
As I said earlier lockdep _has_ to be tricked.
We cannot unconditionally call local_lock_irqsave() on RT.
lockdep doesn't understand per-cpu local_lock.
And it doesn't understand this "if !locked_by_current_task -> go and lock"
concept.
lockdep has to be taught about safe lock region (call it tricking
lockdep, but it has to be an external signal to lockdep).
next prev parent reply other threads:[~2025-07-14 17:52 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 1:52 [PATCH v2 0/6] slab: Re-entrant kmalloc_nolock() Alexei Starovoitov
2025-07-09 1:52 ` [PATCH v2 1/6] locking/local_lock: Expose dep_map in local_trylock_t Alexei Starovoitov
2025-07-11 8:02 ` Sebastian Andrzej Siewior
2025-07-09 1:52 ` [PATCH v2 2/6] locking/local_lock: Introduce local_lock_is_locked() Alexei Starovoitov
2025-07-11 7:52 ` Sebastian Andrzej Siewior
2025-07-09 1:53 ` [PATCH v2 3/6] locking/local_lock: Introduce local_lock_lockdep_start/end() Alexei Starovoitov
2025-07-11 7:50 ` Sebastian Andrzej Siewior
2025-07-11 9:55 ` Vlastimil Babka
2025-07-11 15:17 ` Sebastian Andrzej Siewior
2025-07-11 15:23 ` Vlastimil Babka
2025-07-12 2:19 ` Alexei Starovoitov
2025-07-14 11:06 ` Sebastian Andrzej Siewior
2025-07-14 15:35 ` Vlastimil Babka
2025-07-14 15:54 ` Sebastian Andrzej Siewior
2025-07-14 17:52 ` Alexei Starovoitov [this message]
2025-07-14 18:33 ` Vlastimil Babka
2025-07-14 18:46 ` Alexei Starovoitov
2025-07-15 6:56 ` Vlastimil Babka
2025-07-15 17:29 ` Alexei Starovoitov
2025-07-15 17:48 ` Vlastimil Babka
2025-07-15 21:00 ` Alexei Starovoitov
2025-07-09 1:53 ` [PATCH v2 4/6] mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock() Alexei Starovoitov
2025-07-09 14:20 ` Vlastimil Babka
2025-07-09 1:53 ` [PATCH v2 5/6] mm: Introduce alloc_frozen_pages_nolock() Alexei Starovoitov
2025-07-09 14:21 ` Vlastimil Babka
2025-07-09 1:53 ` [PATCH v2 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock() Alexei Starovoitov
2025-07-10 9:36 ` Vlastimil Babka
2025-07-10 10:21 ` Harry Yoo
2025-07-10 15:05 ` Vlastimil Babka
2025-07-10 19:13 ` Alexei Starovoitov
2025-07-11 6:06 ` Harry Yoo
2025-07-11 10:30 ` Vlastimil Babka
2025-07-12 1:55 ` Alexei Starovoitov
2025-07-10 19:21 ` Alexei Starovoitov
2025-07-11 7:26 ` Sebastian Andrzej Siewior
2025-07-11 7:36 ` Harry Yoo
2025-07-11 7:40 ` Harry Yoo
2025-07-11 10:48 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAADnVQLORq64ezK+gaU=Q2F2KyCYOBZiVE0aaJuqK=xfUwMFiw@mail.gmail.com' \
--to=alexei.starovoitov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).