From: "Zqiang" <qiang.zhang@linux.dev>
To: paulmck@kernel.org
Cc: "kernel test robot" <oliver.sang@intel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
"Andrii Nakryiko" <andrii@kernel.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"Peter Zijlstra" <peterz@infradead.org>,
rcu@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
Date: Mon, 01 Sep 2025 13:22:30 +0000 [thread overview]
Message-ID: <aa7893d53dd90beb54e869b3c9cd45be7155db00@linux.dev> (raw)
In-Reply-To: <711fe3bf-0ac2-4ae9-9dda-97ba047eb64f@paulmck-laptop>
>
> On Sun, Aug 31, 2025 at 11:52:40PM +0000, Zqiang wrote:
>
> >
> > On Sun, Aug 31, 2025 at 02:22:56AM +0000, Zqiang wrote:
> >
> > >
> > > On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
> > >
> > > >
> > > > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> > > >
> > > > >
> > > > > hi, Paul,
> > > > >
> > > > > the similar issue still exists on this dev.2025.08.21a branch.
> > > > > again, if the issue is already fixed on later branches, please just ignore.
> > > > > thanks
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > > > >
> > > > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > > > >
> > > > > in testcase: rcutorture
> > > > > version:
> > > > > with following parameters:
> > > > >
> > > > > runtime: 300s
> > > > > test: default
> > > > > torture_type: tasks-tracing
> > > > >
> > > > >
> > > > >
> > > > > config: x86_64-randconfig-003-20250824
> > > > > compiler: clang-20
> > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > > >
> > > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > > >
> > > > Again, apologies for being slow, and thank you for your testing efforts.
> > > >
> > > > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > > > for an alleged fix. Does it do the trick for you?
> > > >
> > > > Thanx, Paul
> > > >
> > > > >
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > > > >
> > > > >
> > > > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > > > [ 42.366985][ T393] ------------------------------------------------------
> > > > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.368775][ T393]
> > > > > [ 42.368775][ T393] but task is already holding lock:
> > > > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.370043][ T393]
> > > > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > > > [ 42.370043][ T393]
> > > > > [ 42.370755][ T393]
> > > > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > > > [ 42.371388][ T393]
> > > > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.376973][ T393]
> > > > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.382796][ T393]
> > > > > [ 42.382796][ T393] other info that might help us debug this:
> > > > > [ 42.382796][ T393]
> > > > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > > > [ 42.383515][ T393]
> > > > > [ 42.384052][ T393] CPU0 CPU1
> > > > > [ 42.384428][ T393] ---- ----
> > > > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > > [ 42.386663][ T393]
> > > > > [ 42.386663][ T393] *** DEADLOCK ***
> > > > > [ 42.386663][ T393]
> > > > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.388419][ T393]
> > > > > [ 42.388419][ T393] stack backtrace:
> > > > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > [ 42.390786][ T393] Call Trace:
> > > > > [ 42.391020][ T393] <TASK>
> > > > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.400151][ T393] </TASK>
> > > > >
> > > > >
> > > > > The kernel config and materials to reproduce are available at:
> > > > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > 0-DAY CI Kernel Test Service
> > > > > https://github.com/intel/lkp-tests/wiki
> > > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > > index 6e9fe2ce1075d5..db63378f062051 100644
> > > > --- a/kernel/rcu/srcutiny.c
> > > > +++ b/kernel/rcu/srcutiny.c
> > > > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > > > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > > > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > > > preempt_enable();
> > > > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > > > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> > > >
> > > >
> > > > The fllowing case may exist:
> > > >
> > > >
> > > > CPU0
> > > >
> > > > task1:
> > > > __srcu_read_lock()
> > > >
> > > For mainline kernels, here we must have blocked, correct?
> > >
> > > In -rcu, there is of course:
> > >
> > > 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
> > >
> > > And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> > > we could be preempted.
> > >
> > > And maybe this is a reason to drop this commit. Or...
> > >
> > >
> > > For tiny srcu, even if the preempt schedule not happend in
> > > srcu read ctrical section, we can still do voluntary
> > > scheduling in srcu_read ctrical section, this case is
> > > also still happend.
> > >
> > > >
> > > > ....
> > > >
> > > >
> > > > task2 preempt run:
> > > >
> > > > srcu_drive_gp()
> > > > ->swait_event_exclusive()
> > > >
> > > >
> > > > ....
> > > > task1 continue run:
> > > > ....
> > > > raw_spin_lock_irqsave
> > > > __srcu_read_unlock()
> > > > ->find all previours condition are met
> > > > but the irqs_disable() return true,
> > > > not invoke swake_up_one().
> > > >
> > > > task2 maybe always hung.
> > > >
> > > The bug that kernel test robot reported existed for a long time.
> > > The offending commit simply introduced the use case that exercised
> > > this bug. So we do need a fix.
> > >
> > > One approach would be to impose a rule like we used to have for RCU,
> > > namely that if interrupts were disabled across srcu_read_unlock(),
> > > then they must have been disabled since the matching srcu_read_lock().
> > > Another would be to make the current swait_event_exclusive() in
> > > srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> > > that checks ssp->srcu_lock_nesting[].
> > >
> > > But is there a better way?
> > >
> > > I think the second approach is enough :)
> > >
> > Hmmm... OK, how about the incremental patch below?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit a543d73eeaa491021040a02bdf0e8a9148b5c186
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date: Sun Aug 31 09:38:44 2025 -0700
> >
> > squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
> >
> > [ paulmck: Apply Zqiang feedback. ]
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > Should the previous fix that added irqs_disabled() also be
> > added to this patch? or can we use preemptible() instead of
> > in_tasks() && irqs_disabled()?
> >
> Yes, both this fix and the addition of the !irqw_disabled() check
> are to be squashed into the original commit:
>
> f56bf5dd7ffc ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
>
> But I do not immediately see how checking preemptible() would help,
> especially in (say) CONFIG_PREEMPT_NONE=y kernels in which this function
> always returns zero. What am I missing here?
You are right, I miss CONFIG_PREEMPT_NONE=y kernels is always return zero.
Thanks
Zqiang
>
> Thanx, Paul
>
> >
> > Thanks
> > Zqiang
> >
> >
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index db63378f062051..b52ec45698e85b 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -113,8 +113,8 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
> >
> > /*
> > * Workqueue handler to drive one grace period and invoke any callbacks
> > - * that become ready as a result. Single-CPU and !PREEMPTION operation
> > - * means that we get away with murder on synchronization. ;-)
> > + * that become ready as a result. Single-CPU operation and preemption
> > + * disabling mean that we get away with murder on synchronization. ;-)
> > */
> > void srcu_drive_gp(struct work_struct *wp)
> > {
> > @@ -141,7 +141,12 @@ void srcu_drive_gp(struct work_struct *wp)
> > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> > WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
> > preempt_enable();
> > - swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
> > + do {
> > + // Deadlock issues prevent __srcu_read_unlock() from
> > + // doing an unconditional wakeup, so polling is required.
> > + swait_event_timeout_exclusive(ssp->srcu_wq,
> > + !READ_ONCE(ssp->srcu_lock_nesting[idx]), HZ / 10);
> > + } while (READ_ONCE(ssp->srcu_lock_nesting[idx]));
> > preempt_disable(); // Needed for PREEMPT_LAZY
> > WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
> > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> >
>
next prev parent reply other threads:[~2025-09-01 13:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 8:47 [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected kernel test robot
2025-08-29 17:23 ` Paul E. McKenney
2025-08-30 2:38 ` Zqiang
2025-08-30 12:59 ` Paul E. McKenney
2025-08-31 2:22 ` Zqiang
2025-08-31 16:40 ` Paul E. McKenney
2025-08-31 23:52 ` Zqiang
2025-09-01 1:06 ` Paul E. McKenney
2025-09-01 13:22 ` Zqiang [this message]
2025-09-03 2:03 ` Oliver Sang
2025-09-03 10:42 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa7893d53dd90beb54e869b3c9cd45be7155db00@linux.dev \
--to=qiang.zhang@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.