From: "Zqiang" <qiang.zhang@linux.dev>
To: paulmck@kernel.org
Cc: "kernel test robot" <oliver.sang@intel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
"Andrii Nakryiko" <andrii@kernel.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"Peter Zijlstra" <peterz@infradead.org>,
rcu@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
Date: Sun, 31 Aug 2025 02:22:56 +0000 [thread overview]
Message-ID: <aefc893f1b7c17049c2e6eb2256a97739a5e328d@linux.dev> (raw)
In-Reply-To: <f58f7c75-46be-4ddd-be70-ee4f6a3370a9@paulmck-laptop>
>
> On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
>
> >
> > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> >
> > >
> > > hi, Paul,
> > >
> > > the similar issue still exists on this dev.2025.08.21a branch.
> > > again, if the issue is already fixed on later branches, please just ignore.
> > > thanks
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > >
> > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > >
> > > in testcase: rcutorture
> > > version:
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: default
> > > torture_type: tasks-tracing
> > >
> > >
> > >
> > > config: x86_64-randconfig-003-20250824
> > > compiler: clang-20
> > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > Again, apologies for being slow, and thank you for your testing efforts.
> >
> > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > for an alleged fix. Does it do the trick for you?
> >
> > Thanx, Paul
> >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > >
> > >
> > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > [ 42.366985][ T393] ------------------------------------------------------
> > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.368775][ T393]
> > > [ 42.368775][ T393] but task is already holding lock:
> > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.370043][ T393]
> > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > [ 42.370043][ T393]
> > > [ 42.370755][ T393]
> > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > [ 42.371388][ T393]
> > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.376973][ T393]
> > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.382796][ T393]
> > > [ 42.382796][ T393] other info that might help us debug this:
> > > [ 42.382796][ T393]
> > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > [ 42.383515][ T393]
> > > [ 42.384052][ T393] CPU0 CPU1
> > > [ 42.384428][ T393] ---- ----
> > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > [ 42.386663][ T393]
> > > [ 42.386663][ T393] *** DEADLOCK ***
> > > [ 42.386663][ T393]
> > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.388419][ T393]
> > > [ 42.388419][ T393] stack backtrace:
> > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 42.390786][ T393] Call Trace:
> > > [ 42.391020][ T393] <TASK>
> > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.400151][ T393] </TASK>
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > >
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index 6e9fe2ce1075d5..db63378f062051 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > preempt_enable();
> > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> >
> >
> > The fllowing case may exist:
> >
> >
> > CPU0
> >
> > task1:
> > __srcu_read_lock()
> >
> For mainline kernels, here we must have blocked, correct?
>
> In -rcu, there is of course:
>
> 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
>
> And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> we could be preempted.
>
> And maybe this is a reason to drop this commit. Or...
>
For tiny srcu, even if the preempt schedule not happend in
srcu read ctrical section, we can still do voluntary
scheduling in srcu_read ctrical section, this case is
also still happend.
> >
> > ....
> >
> >
> > task2 preempt run:
> >
> > srcu_drive_gp()
> > ->swait_event_exclusive()
> >
> >
> > ....
> > task1 continue run:
> > ....
> > raw_spin_lock_irqsave
> > __srcu_read_unlock()
> > ->find all previours condition are met
> > but the irqs_disable() return true,
> > not invoke swake_up_one().
> >
> > task2 maybe always hung.
> >
> The bug that kernel test robot reported existed for a long time.
> The offending commit simply introduced the use case that exercised
> this bug. So we do need a fix.
>
> One approach would be to impose a rule like we used to have for RCU,
> namely that if interrupts were disabled across srcu_read_unlock(),
> then they must have been disabled since the matching srcu_read_lock().
> Another would be to make the current swait_event_exclusive() in
> srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> that checks ssp->srcu_lock_nesting[].
>
> But is there a better way?
I think the second approach is enough :)
Thanks
Zqiang
>
> Either way, good catch! And thank you!!!
>
> Thanx, Paul
>
next prev parent reply other threads:[~2025-08-31 2:23 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 8:47 [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected kernel test robot
2025-08-29 17:23 ` Paul E. McKenney
2025-08-30 2:38 ` Zqiang
2025-08-30 12:59 ` Paul E. McKenney
2025-08-31 2:22 ` Zqiang [this message]
2025-08-31 16:40 ` Paul E. McKenney
2025-08-31 23:52 ` Zqiang
2025-09-01 1:06 ` Paul E. McKenney
2025-09-01 13:22 ` Zqiang
2025-09-03 2:03 ` Oliver Sang
2025-09-03 10:42 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aefc893f1b7c17049c2e6eb2256a97739a5e328d@linux.dev \
--to=qiang.zhang@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.