* [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
@ 2025-08-26 8:47 kernel test robot
2025-08-29 17:23 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2025-08-26 8:47 UTC (permalink / raw)
To: Paul E. McKenney
Cc: oe-lkp, lkp, Andrii Nakryiko, Alexei Starovoitov, Peter Zijlstra,
rcu, linux-kernel, oliver.sang
hi, Paul,
the similar issue still exists on this dev.2025.08.21a branch.
again, if the issue is already fixed on later branches, please just ignore.
thanks
Hello,
kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
https://github.com/paulmckrcu/linux dev.2025.08.21a
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: default
torture_type: tasks-tracing
config: x86_64-randconfig-003-20250824
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
[ 42.365933][ T393] WARNING: possible circular locking dependency detected
[ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
[ 42.366985][ T393] ------------------------------------------------------
[ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
[ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
[ 42.368775][ T393]
[ 42.368775][ T393] but task is already holding lock:
[ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
[ 42.370043][ T393]
[ 42.370043][ T393] which lock already depends on the new lock.
[ 42.370043][ T393]
[ 42.370755][ T393]
[ 42.370755][ T393] the existing dependency chain (in reverse order) is:
[ 42.371388][ T393]
[ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
[ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
[ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
[ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
[ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
[ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
[ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
[ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
[ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
[ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
[ 42.375920][ T393] kthread (kernel/kthread.c:465)
[ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
[ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
[ 42.376973][ T393]
[ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
[ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
[ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
[ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
[ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
[ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
[ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
[ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
[ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
[ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
[ 42.381714][ T393] kthread (kernel/kthread.c:465)
[ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
[ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
[ 42.382796][ T393]
[ 42.382796][ T393] other info that might help us debug this:
[ 42.382796][ T393]
[ 42.383515][ T393] Possible unsafe locking scenario:
[ 42.383515][ T393]
[ 42.384052][ T393] CPU0 CPU1
[ 42.384428][ T393] ---- ----
[ 42.384799][ T393] lock(&p->pi_lock);
[ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
[ 42.385707][ T393] lock(&p->pi_lock);
[ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
[ 42.386663][ T393]
[ 42.386663][ T393] *** DEADLOCK ***
[ 42.386663][ T393]
[ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
[ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
[ 42.388419][ T393]
[ 42.388419][ T393] stack backtrace:
[ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
[ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
[ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 42.390786][ T393] Call Trace:
[ 42.391020][ T393] <TASK>
[ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
[ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
[ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
[ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
[ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
[ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
[ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
[ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
[ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
[ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
[ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
[ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
[ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
[ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
[ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
[ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
[ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
[ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
[ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
[ 42.398565][ T393] kthread (kernel/kthread.c:465)
[ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
[ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
[ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
[ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
[ 42.400151][ T393] </TASK>
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-26 8:47 [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected kernel test robot
@ 2025-08-29 17:23 ` Paul E. McKenney
2025-08-30 2:38 ` Zqiang
2025-09-03 2:03 ` Oliver Sang
0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2025-08-29 17:23 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Andrii Nakryiko, Alexei Starovoitov, Peter Zijlstra,
rcu, linux-kernel
On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
>
> hi, Paul,
>
> the similar issue still exists on this dev.2025.08.21a branch.
> again, if the issue is already fixed on later branches, please just ignore.
> thanks
>
>
> Hello,
>
> kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
>
> commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> https://github.com/paulmckrcu/linux dev.2025.08.21a
>
> in testcase: rcutorture
> version:
> with following parameters:
>
> runtime: 300s
> test: default
> torture_type: tasks-tracing
>
>
>
> config: x86_64-randconfig-003-20250824
> compiler: clang-20
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
Again, apologies for being slow, and thank you for your testing efforts.
Idiot here forgot about Tiny SRCU, so please see the end of this email
for an alleged fix. Does it do the trick for you?
Thanx, Paul
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
>
>
> [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> [ 42.366985][ T393] ------------------------------------------------------
> [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> [ 42.368775][ T393]
> [ 42.368775][ T393] but task is already holding lock:
> [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> [ 42.370043][ T393]
> [ 42.370043][ T393] which lock already depends on the new lock.
> [ 42.370043][ T393]
> [ 42.370755][ T393]
> [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> [ 42.371388][ T393]
> [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> [ 42.376973][ T393]
> [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> [ 42.382796][ T393]
> [ 42.382796][ T393] other info that might help us debug this:
> [ 42.382796][ T393]
> [ 42.383515][ T393] Possible unsafe locking scenario:
> [ 42.383515][ T393]
> [ 42.384052][ T393] CPU0 CPU1
> [ 42.384428][ T393] ---- ----
> [ 42.384799][ T393] lock(&p->pi_lock);
> [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> [ 42.385707][ T393] lock(&p->pi_lock);
> [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> [ 42.386663][ T393]
> [ 42.386663][ T393] *** DEADLOCK ***
> [ 42.386663][ T393]
> [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> [ 42.388419][ T393]
> [ 42.388419][ T393] stack backtrace:
> [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 42.390786][ T393] Call Trace:
> [ 42.391020][ T393] <TASK>
> [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> [ 42.400151][ T393] </TASK>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
------------------------------------------------------------------------
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6e9fe2ce1075d5..db63378f062051 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
preempt_enable();
- if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
+ if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
swake_up_one(&ssp->srcu_wq);
}
EXPORT_SYMBOL_GPL(__srcu_read_unlock);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-29 17:23 ` Paul E. McKenney
@ 2025-08-30 2:38 ` Zqiang
2025-08-30 12:59 ` Paul E. McKenney
2025-09-03 2:03 ` Oliver Sang
1 sibling, 1 reply; 11+ messages in thread
From: Zqiang @ 2025-08-30 2:38 UTC (permalink / raw)
To: paulmck, kernel test robot
Cc: oe-lkp, lkp, Andrii Nakryiko, Alexei Starovoitov, Peter Zijlstra,
rcu, linux-kernel
>
> On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
>
> >
> > hi, Paul,
> >
> > the similar issue still exists on this dev.2025.08.21a branch.
> > again, if the issue is already fixed on later branches, please just ignore.
> > thanks
> >
> >
> > Hello,
> >
> > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> >
> > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > https://github.com/paulmckrcu/linux dev.2025.08.21a
> >
> > in testcase: rcutorture
> > version:
> > with following parameters:
> >
> > runtime: 300s
> > test: default
> > torture_type: tasks-tracing
> >
> >
> >
> > config: x86_64-randconfig-003-20250824
> > compiler: clang-20
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> Again, apologies for being slow, and thank you for your testing efforts.
>
> Idiot here forgot about Tiny SRCU, so please see the end of this email
> for an alleged fix. Does it do the trick for you?
>
> Thanx, Paul
>
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> >
> >
> > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > [ 42.366985][ T393] ------------------------------------------------------
> > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > [ 42.368775][ T393]
> > [ 42.368775][ T393] but task is already holding lock:
> > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > [ 42.370043][ T393]
> > [ 42.370043][ T393] which lock already depends on the new lock.
> > [ 42.370043][ T393]
> > [ 42.370755][ T393]
> > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > [ 42.371388][ T393]
> > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > [ 42.376973][ T393]
> > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > [ 42.382796][ T393]
> > [ 42.382796][ T393] other info that might help us debug this:
> > [ 42.382796][ T393]
> > [ 42.383515][ T393] Possible unsafe locking scenario:
> > [ 42.383515][ T393]
> > [ 42.384052][ T393] CPU0 CPU1
> > [ 42.384428][ T393] ---- ----
> > [ 42.384799][ T393] lock(&p->pi_lock);
> > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > [ 42.385707][ T393] lock(&p->pi_lock);
> > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > [ 42.386663][ T393]
> > [ 42.386663][ T393] *** DEADLOCK ***
> > [ 42.386663][ T393]
> > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > [ 42.388419][ T393]
> > [ 42.388419][ T393] stack backtrace:
> > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 42.390786][ T393] Call Trace:
> > [ 42.391020][ T393] <TASK>
> > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > [ 42.400151][ T393] </TASK>
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> >
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> index 6e9fe2ce1075d5..db63378f062051 100644
> --- a/kernel/rcu/srcutiny.c
> +++ b/kernel/rcu/srcutiny.c
> @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> preempt_enable();
> - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
The fllowing case may exist:
CPU0
task1:
__srcu_read_lock()
....
task2 preempt run:
srcu_drive_gp()
->swait_event_exclusive()
....
task1 continue run:
....
raw_spin_lock_irqsave
__srcu_read_unlock()
->find all previours condition are met
but the irqs_disable() return true,
not invoke swake_up_one().
task2 maybe always hung.
Thanks
Zqiang
> swake_up_one(&ssp->srcu_wq);
> }
> EXPORT_SYMBOL_GPL(__srcu_read_unlock);
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-30 2:38 ` Zqiang
@ 2025-08-30 12:59 ` Paul E. McKenney
2025-08-31 2:22 ` Zqiang
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2025-08-30 12:59 UTC (permalink / raw)
To: Zqiang
Cc: kernel test robot, oe-lkp, lkp, Andrii Nakryiko,
Alexei Starovoitov, Peter Zijlstra, rcu, linux-kernel
On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
> >
> > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> >
> > >
> > > hi, Paul,
> > >
> > > the similar issue still exists on this dev.2025.08.21a branch.
> > > again, if the issue is already fixed on later branches, please just ignore.
> > > thanks
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > >
> > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > >
> > > in testcase: rcutorture
> > > version:
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: default
> > > torture_type: tasks-tracing
> > >
> > >
> > >
> > > config: x86_64-randconfig-003-20250824
> > > compiler: clang-20
> > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > Again, apologies for being slow, and thank you for your testing efforts.
> >
> > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > for an alleged fix. Does it do the trick for you?
> >
> > Thanx, Paul
> >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > >
> > >
> > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > [ 42.366985][ T393] ------------------------------------------------------
> > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.368775][ T393]
> > > [ 42.368775][ T393] but task is already holding lock:
> > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.370043][ T393]
> > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > [ 42.370043][ T393]
> > > [ 42.370755][ T393]
> > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > [ 42.371388][ T393]
> > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.376973][ T393]
> > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.382796][ T393]
> > > [ 42.382796][ T393] other info that might help us debug this:
> > > [ 42.382796][ T393]
> > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > [ 42.383515][ T393]
> > > [ 42.384052][ T393] CPU0 CPU1
> > > [ 42.384428][ T393] ---- ----
> > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > [ 42.386663][ T393]
> > > [ 42.386663][ T393] *** DEADLOCK ***
> > > [ 42.386663][ T393]
> > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.388419][ T393]
> > > [ 42.388419][ T393] stack backtrace:
> > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 42.390786][ T393] Call Trace:
> > > [ 42.391020][ T393] <TASK>
> > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.400151][ T393] </TASK>
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > >
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index 6e9fe2ce1075d5..db63378f062051 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > preempt_enable();
> > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
>
>
> The fllowing case may exist:
>
>
> CPU0
>
> task1:
> __srcu_read_lock()
For mainline kernels, here we must have blocked, correct?
In -rcu, there is of course:
740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
we could be preempted.
And maybe this is a reason to drop this commit. Or...
> ....
>
>
> task2 preempt run:
>
> srcu_drive_gp()
> ->swait_event_exclusive()
>
>
> ....
> task1 continue run:
> ....
> raw_spin_lock_irqsave
> __srcu_read_unlock()
> ->find all previours condition are met
> but the irqs_disable() return true,
> not invoke swake_up_one().
>
> task2 maybe always hung.
The bug that kernel test robot reported existed for a long time.
The offending commit simply introduced the use case that exercised
this bug. So we do need a fix.
One approach would be to impose a rule like we used to have for RCU,
namely that if interrupts were disabled across srcu_read_unlock(),
then they must have been disabled since the matching srcu_read_lock().
Another would be to make the current swait_event_exclusive() in
srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
that checks ssp->srcu_lock_nesting[].
But is there a better way?
Either way, good catch! And thank you!!!
Thanx, Paul
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-30 12:59 ` Paul E. McKenney
@ 2025-08-31 2:22 ` Zqiang
2025-08-31 16:40 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Zqiang @ 2025-08-31 2:22 UTC (permalink / raw)
To: paulmck
Cc: kernel test robot, oe-lkp, lkp, Andrii Nakryiko,
Alexei Starovoitov, Peter Zijlstra, rcu, linux-kernel
>
> On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
>
> >
> > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> >
> > >
> > > hi, Paul,
> > >
> > > the similar issue still exists on this dev.2025.08.21a branch.
> > > again, if the issue is already fixed on later branches, please just ignore.
> > > thanks
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > >
> > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > >
> > > in testcase: rcutorture
> > > version:
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: default
> > > torture_type: tasks-tracing
> > >
> > >
> > >
> > > config: x86_64-randconfig-003-20250824
> > > compiler: clang-20
> > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > Again, apologies for being slow, and thank you for your testing efforts.
> >
> > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > for an alleged fix. Does it do the trick for you?
> >
> > Thanx, Paul
> >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > >
> > >
> > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > [ 42.366985][ T393] ------------------------------------------------------
> > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.368775][ T393]
> > > [ 42.368775][ T393] but task is already holding lock:
> > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.370043][ T393]
> > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > [ 42.370043][ T393]
> > > [ 42.370755][ T393]
> > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > [ 42.371388][ T393]
> > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.376973][ T393]
> > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.382796][ T393]
> > > [ 42.382796][ T393] other info that might help us debug this:
> > > [ 42.382796][ T393]
> > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > [ 42.383515][ T393]
> > > [ 42.384052][ T393] CPU0 CPU1
> > > [ 42.384428][ T393] ---- ----
> > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > [ 42.386663][ T393]
> > > [ 42.386663][ T393] *** DEADLOCK ***
> > > [ 42.386663][ T393]
> > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.388419][ T393]
> > > [ 42.388419][ T393] stack backtrace:
> > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 42.390786][ T393] Call Trace:
> > > [ 42.391020][ T393] <TASK>
> > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > [ 42.400151][ T393] </TASK>
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > >
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index 6e9fe2ce1075d5..db63378f062051 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > preempt_enable();
> > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> >
> >
> > The fllowing case may exist:
> >
> >
> > CPU0
> >
> > task1:
> > __srcu_read_lock()
> >
> For mainline kernels, here we must have blocked, correct?
>
> In -rcu, there is of course:
>
> 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
>
> And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> we could be preempted.
>
> And maybe this is a reason to drop this commit. Or...
>
For tiny srcu, even if the preempt schedule not happend in
srcu read ctrical section, we can still do voluntary
scheduling in srcu_read ctrical section, this case is
also still happend.
> >
> > ....
> >
> >
> > task2 preempt run:
> >
> > srcu_drive_gp()
> > ->swait_event_exclusive()
> >
> >
> > ....
> > task1 continue run:
> > ....
> > raw_spin_lock_irqsave
> > __srcu_read_unlock()
> > ->find all previours condition are met
> > but the irqs_disable() return true,
> > not invoke swake_up_one().
> >
> > task2 maybe always hung.
> >
> The bug that kernel test robot reported existed for a long time.
> The offending commit simply introduced the use case that exercised
> this bug. So we do need a fix.
>
> One approach would be to impose a rule like we used to have for RCU,
> namely that if interrupts were disabled across srcu_read_unlock(),
> then they must have been disabled since the matching srcu_read_lock().
> Another would be to make the current swait_event_exclusive() in
> srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> that checks ssp->srcu_lock_nesting[].
>
> But is there a better way?
I think the second approach is enough :)
Thanks
Zqiang
>
> Either way, good catch! And thank you!!!
>
> Thanx, Paul
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-31 2:22 ` Zqiang
@ 2025-08-31 16:40 ` Paul E. McKenney
2025-08-31 23:52 ` Zqiang
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2025-08-31 16:40 UTC (permalink / raw)
To: Zqiang
Cc: kernel test robot, oe-lkp, lkp, Andrii Nakryiko,
Alexei Starovoitov, Peter Zijlstra, rcu, linux-kernel
On Sun, Aug 31, 2025 at 02:22:56AM +0000, Zqiang wrote:
> >
> > On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
> >
> > >
> > > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> > >
> > > >
> > > > hi, Paul,
> > > >
> > > > the similar issue still exists on this dev.2025.08.21a branch.
> > > > again, if the issue is already fixed on later branches, please just ignore.
> > > > thanks
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > > >
> > > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > > >
> > > > in testcase: rcutorture
> > > > version:
> > > > with following parameters:
> > > >
> > > > runtime: 300s
> > > > test: default
> > > > torture_type: tasks-tracing
> > > >
> > > >
> > > >
> > > > config: x86_64-randconfig-003-20250824
> > > > compiler: clang-20
> > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > >
> > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > >
> > > Again, apologies for being slow, and thank you for your testing efforts.
> > >
> > > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > > for an alleged fix. Does it do the trick for you?
> > >
> > > Thanx, Paul
> > >
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > > >
> > > >
> > > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > > [ 42.366985][ T393] ------------------------------------------------------
> > > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.368775][ T393]
> > > > [ 42.368775][ T393] but task is already holding lock:
> > > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.370043][ T393]
> > > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > > [ 42.370043][ T393]
> > > > [ 42.370755][ T393]
> > > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > > [ 42.371388][ T393]
> > > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > [ 42.376973][ T393]
> > > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > [ 42.382796][ T393]
> > > > [ 42.382796][ T393] other info that might help us debug this:
> > > > [ 42.382796][ T393]
> > > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > > [ 42.383515][ T393]
> > > > [ 42.384052][ T393] CPU0 CPU1
> > > > [ 42.384428][ T393] ---- ----
> > > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > [ 42.386663][ T393]
> > > > [ 42.386663][ T393] *** DEADLOCK ***
> > > > [ 42.386663][ T393]
> > > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.388419][ T393]
> > > > [ 42.388419][ T393] stack backtrace:
> > > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > [ 42.390786][ T393] Call Trace:
> > > > [ 42.391020][ T393] <TASK>
> > > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > [ 42.400151][ T393] </TASK>
> > > >
> > > >
> > > > The kernel config and materials to reproduce are available at:
> > > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > > >
> > > >
> > > >
> > > > --
> > > > 0-DAY CI Kernel Test Service
> > > > https://github.com/intel/lkp-tests/wiki
> > > >
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > index 6e9fe2ce1075d5..db63378f062051 100644
> > > --- a/kernel/rcu/srcutiny.c
> > > +++ b/kernel/rcu/srcutiny.c
> > > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > > preempt_enable();
> > > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> > >
> > >
> > > The fllowing case may exist:
> > >
> > >
> > > CPU0
> > >
> > > task1:
> > > __srcu_read_lock()
> > >
> > For mainline kernels, here we must have blocked, correct?
> >
> > In -rcu, there is of course:
> >
> > 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
> >
> > And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> > we could be preempted.
> >
> > And maybe this is a reason to drop this commit. Or...
> >
>
> For tiny srcu, even if the preempt schedule not happend in
> srcu read ctrical section, we can still do voluntary
> scheduling in srcu_read ctrical section, this case is
> also still happend.
>
> > >
> > > ....
> > >
> > >
> > > task2 preempt run:
> > >
> > > srcu_drive_gp()
> > > ->swait_event_exclusive()
> > >
> > >
> > > ....
> > > task1 continue run:
> > > ....
> > > raw_spin_lock_irqsave
> > > __srcu_read_unlock()
> > > ->find all previours condition are met
> > > but the irqs_disable() return true,
> > > not invoke swake_up_one().
> > >
> > > task2 maybe always hung.
> > >
> > The bug that kernel test robot reported existed for a long time.
> > The offending commit simply introduced the use case that exercised
> > this bug. So we do need a fix.
> >
> > One approach would be to impose a rule like we used to have for RCU,
> > namely that if interrupts were disabled across srcu_read_unlock(),
> > then they must have been disabled since the matching srcu_read_lock().
> > Another would be to make the current swait_event_exclusive() in
> > srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> > that checks ssp->srcu_lock_nesting[].
> >
> > But is there a better way?
>
> I think the second approach is enough :)
Hmmm... OK, how about the incremental patch below?
Thanx, Paul
------------------------------------------------------------------------
commit a543d73eeaa491021040a02bdf0e8a9148b5c186
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Sun Aug 31 09:38:44 2025 -0700
squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
[ paulmck: Apply Zqiang feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index db63378f062051..b52ec45698e85b 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -113,8 +113,8 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
/*
* Workqueue handler to drive one grace period and invoke any callbacks
- * that become ready as a result. Single-CPU and !PREEMPTION operation
- * means that we get away with murder on synchronization. ;-)
+ * that become ready as a result. Single-CPU operation and preemption
+ * disabling mean that we get away with murder on synchronization. ;-)
*/
void srcu_drive_gp(struct work_struct *wp)
{
@@ -141,7 +141,12 @@ void srcu_drive_gp(struct work_struct *wp)
WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
preempt_enable();
- swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
+ do {
+ // Deadlock issues prevent __srcu_read_unlock() from
+ // doing an unconditional wakeup, so polling is required.
+ swait_event_timeout_exclusive(ssp->srcu_wq,
+ !READ_ONCE(ssp->srcu_lock_nesting[idx]), HZ / 10);
+ } while (READ_ONCE(ssp->srcu_lock_nesting[idx]));
preempt_disable(); // Needed for PREEMPT_LAZY
WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-31 16:40 ` Paul E. McKenney
@ 2025-08-31 23:52 ` Zqiang
2025-09-01 1:06 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Zqiang @ 2025-08-31 23:52 UTC (permalink / raw)
To: paulmck
Cc: kernel test robot, oe-lkp, lkp, Andrii Nakryiko,
Alexei Starovoitov, Peter Zijlstra, rcu, linux-kernel
>
> On Sun, Aug 31, 2025 at 02:22:56AM +0000, Zqiang wrote:
>
> >
> > On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
> >
> > >
> > > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> > >
> > > >
> > > > hi, Paul,
> > > >
> > > > the similar issue still exists on this dev.2025.08.21a branch.
> > > > again, if the issue is already fixed on later branches, please just ignore.
> > > > thanks
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > > >
> > > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > > >
> > > > in testcase: rcutorture
> > > > version:
> > > > with following parameters:
> > > >
> > > > runtime: 300s
> > > > test: default
> > > > torture_type: tasks-tracing
> > > >
> > > >
> > > >
> > > > config: x86_64-randconfig-003-20250824
> > > > compiler: clang-20
> > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > >
> > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > >
> > > Again, apologies for being slow, and thank you for your testing efforts.
> > >
> > > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > > for an alleged fix. Does it do the trick for you?
> > >
> > > Thanx, Paul
> > >
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > > >
> > > >
> > > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > > [ 42.366985][ T393] ------------------------------------------------------
> > > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.368775][ T393]
> > > > [ 42.368775][ T393] but task is already holding lock:
> > > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.370043][ T393]
> > > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > > [ 42.370043][ T393]
> > > > [ 42.370755][ T393]
> > > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > > [ 42.371388][ T393]
> > > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > [ 42.376973][ T393]
> > > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > [ 42.382796][ T393]
> > > > [ 42.382796][ T393] other info that might help us debug this:
> > > > [ 42.382796][ T393]
> > > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > > [ 42.383515][ T393]
> > > > [ 42.384052][ T393] CPU0 CPU1
> > > > [ 42.384428][ T393] ---- ----
> > > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > [ 42.386663][ T393]
> > > > [ 42.386663][ T393] *** DEADLOCK ***
> > > > [ 42.386663][ T393]
> > > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.388419][ T393]
> > > > [ 42.388419][ T393] stack backtrace:
> > > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > [ 42.390786][ T393] Call Trace:
> > > > [ 42.391020][ T393] <TASK>
> > > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > [ 42.400151][ T393] </TASK>
> > > >
> > > >
> > > > The kernel config and materials to reproduce are available at:
> > > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > > >
> > > >
> > > >
> > > > --
> > > > 0-DAY CI Kernel Test Service
> > > > https://github.com/intel/lkp-tests/wiki
> > > >
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > index 6e9fe2ce1075d5..db63378f062051 100644
> > > --- a/kernel/rcu/srcutiny.c
> > > +++ b/kernel/rcu/srcutiny.c
> > > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > > preempt_enable();
> > > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> > >
> > >
> > > The fllowing case may exist:
> > >
> > >
> > > CPU0
> > >
> > > task1:
> > > __srcu_read_lock()
> > >
> > For mainline kernels, here we must have blocked, correct?
> >
> > In -rcu, there is of course:
> >
> > 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
> >
> > And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> > we could be preempted.
> >
> > And maybe this is a reason to drop this commit. Or...
> >
> >
> > For tiny srcu, even if the preempt schedule not happend in
> > srcu read ctrical section, we can still do voluntary
> > scheduling in srcu_read ctrical section, this case is
> > also still happend.
> >
> > >
> > > ....
> > >
> > >
> > > task2 preempt run:
> > >
> > > srcu_drive_gp()
> > > ->swait_event_exclusive()
> > >
> > >
> > > ....
> > > task1 continue run:
> > > ....
> > > raw_spin_lock_irqsave
> > > __srcu_read_unlock()
> > > ->find all previours condition are met
> > > but the irqs_disable() return true,
> > > not invoke swake_up_one().
> > >
> > > task2 maybe always hung.
> > >
> > The bug that kernel test robot reported existed for a long time.
> > The offending commit simply introduced the use case that exercised
> > this bug. So we do need a fix.
> >
> > One approach would be to impose a rule like we used to have for RCU,
> > namely that if interrupts were disabled across srcu_read_unlock(),
> > then they must have been disabled since the matching srcu_read_lock().
> > Another would be to make the current swait_event_exclusive() in
> > srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> > that checks ssp->srcu_lock_nesting[].
> >
> > But is there a better way?
> >
> > I think the second approach is enough :)
> >
> Hmmm... OK, how about the incremental patch below?
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit a543d73eeaa491021040a02bdf0e8a9148b5c186
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Sun Aug 31 09:38:44 2025 -0700
>
> squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
>
> [ paulmck: Apply Zqiang feedback. ]
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Should the previous fix that added irqs_disabled() also be
added to this patch? or can we use preemptible() instead of
in_tasks() && irqs_disabled()?
Thanks
Zqiang
>
> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> index db63378f062051..b52ec45698e85b 100644
> --- a/kernel/rcu/srcutiny.c
> +++ b/kernel/rcu/srcutiny.c
> @@ -113,8 +113,8 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
>
> /*
> * Workqueue handler to drive one grace period and invoke any callbacks
> - * that become ready as a result. Single-CPU and !PREEMPTION operation
> - * means that we get away with murder on synchronization. ;-)
> + * that become ready as a result. Single-CPU operation and preemption
> + * disabling mean that we get away with murder on synchronization. ;-)
> */
> void srcu_drive_gp(struct work_struct *wp)
> {
> @@ -141,7 +141,12 @@ void srcu_drive_gp(struct work_struct *wp)
> WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
> preempt_enable();
> - swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
> + do {
> + // Deadlock issues prevent __srcu_read_unlock() from
> + // doing an unconditional wakeup, so polling is required.
> + swait_event_timeout_exclusive(ssp->srcu_wq,
> + !READ_ONCE(ssp->srcu_lock_nesting[idx]), HZ / 10);
> + } while (READ_ONCE(ssp->srcu_lock_nesting[idx]));
> preempt_disable(); // Needed for PREEMPT_LAZY
> WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
> WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-31 23:52 ` Zqiang
@ 2025-09-01 1:06 ` Paul E. McKenney
2025-09-01 13:22 ` Zqiang
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2025-09-01 1:06 UTC (permalink / raw)
To: Zqiang
Cc: kernel test robot, oe-lkp, lkp, Andrii Nakryiko,
Alexei Starovoitov, Peter Zijlstra, rcu, linux-kernel
On Sun, Aug 31, 2025 at 11:52:40PM +0000, Zqiang wrote:
> >
> > On Sun, Aug 31, 2025 at 02:22:56AM +0000, Zqiang wrote:
> >
> > >
> > > On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
> > >
> > > >
> > > > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> > > >
> > > > >
> > > > > hi, Paul,
> > > > >
> > > > > the similar issue still exists on this dev.2025.08.21a branch.
> > > > > again, if the issue is already fixed on later branches, please just ignore.
> > > > > thanks
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > > > >
> > > > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > > > >
> > > > > in testcase: rcutorture
> > > > > version:
> > > > > with following parameters:
> > > > >
> > > > > runtime: 300s
> > > > > test: default
> > > > > torture_type: tasks-tracing
> > > > >
> > > > >
> > > > >
> > > > > config: x86_64-randconfig-003-20250824
> > > > > compiler: clang-20
> > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > > >
> > > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > > >
> > > > Again, apologies for being slow, and thank you for your testing efforts.
> > > >
> > > > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > > > for an alleged fix. Does it do the trick for you?
> > > >
> > > > Thanx, Paul
> > > >
> > > > >
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > > > >
> > > > >
> > > > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > > > [ 42.366985][ T393] ------------------------------------------------------
> > > > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.368775][ T393]
> > > > > [ 42.368775][ T393] but task is already holding lock:
> > > > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.370043][ T393]
> > > > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > > > [ 42.370043][ T393]
> > > > > [ 42.370755][ T393]
> > > > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > > > [ 42.371388][ T393]
> > > > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.376973][ T393]
> > > > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.382796][ T393]
> > > > > [ 42.382796][ T393] other info that might help us debug this:
> > > > > [ 42.382796][ T393]
> > > > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > > > [ 42.383515][ T393]
> > > > > [ 42.384052][ T393] CPU0 CPU1
> > > > > [ 42.384428][ T393] ---- ----
> > > > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > > [ 42.386663][ T393]
> > > > > [ 42.386663][ T393] *** DEADLOCK ***
> > > > > [ 42.386663][ T393]
> > > > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.388419][ T393]
> > > > > [ 42.388419][ T393] stack backtrace:
> > > > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > [ 42.390786][ T393] Call Trace:
> > > > > [ 42.391020][ T393] <TASK>
> > > > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.400151][ T393] </TASK>
> > > > >
> > > > >
> > > > > The kernel config and materials to reproduce are available at:
> > > > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > 0-DAY CI Kernel Test Service
> > > > > https://github.com/intel/lkp-tests/wiki
> > > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > > index 6e9fe2ce1075d5..db63378f062051 100644
> > > > --- a/kernel/rcu/srcutiny.c
> > > > +++ b/kernel/rcu/srcutiny.c
> > > > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > > > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > > > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > > > preempt_enable();
> > > > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > > > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> > > >
> > > >
> > > > The fllowing case may exist:
> > > >
> > > >
> > > > CPU0
> > > >
> > > > task1:
> > > > __srcu_read_lock()
> > > >
> > > For mainline kernels, here we must have blocked, correct?
> > >
> > > In -rcu, there is of course:
> > >
> > > 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
> > >
> > > And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> > > we could be preempted.
> > >
> > > And maybe this is a reason to drop this commit. Or...
> > >
> > >
> > > For tiny srcu, even if the preempt schedule not happend in
> > > srcu read ctrical section, we can still do voluntary
> > > scheduling in srcu_read ctrical section, this case is
> > > also still happend.
> > >
> > > >
> > > > ....
> > > >
> > > >
> > > > task2 preempt run:
> > > >
> > > > srcu_drive_gp()
> > > > ->swait_event_exclusive()
> > > >
> > > >
> > > > ....
> > > > task1 continue run:
> > > > ....
> > > > raw_spin_lock_irqsave
> > > > __srcu_read_unlock()
> > > > ->find all previours condition are met
> > > > but the irqs_disable() return true,
> > > > not invoke swake_up_one().
> > > >
> > > > task2 maybe always hung.
> > > >
> > > The bug that kernel test robot reported existed for a long time.
> > > The offending commit simply introduced the use case that exercised
> > > this bug. So we do need a fix.
> > >
> > > One approach would be to impose a rule like we used to have for RCU,
> > > namely that if interrupts were disabled across srcu_read_unlock(),
> > > then they must have been disabled since the matching srcu_read_lock().
> > > Another would be to make the current swait_event_exclusive() in
> > > srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> > > that checks ssp->srcu_lock_nesting[].
> > >
> > > But is there a better way?
> > >
> > > I think the second approach is enough :)
> > >
> > Hmmm... OK, how about the incremental patch below?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit a543d73eeaa491021040a02bdf0e8a9148b5c186
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date: Sun Aug 31 09:38:44 2025 -0700
> >
> > squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
> >
> > [ paulmck: Apply Zqiang feedback. ]
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
> Should the previous fix that added irqs_disabled() also be
> added to this patch? or can we use preemptible() instead of
> in_tasks() && irqs_disabled()?
Yes, both this fix and the addition of the !irqw_disabled() check
are to be squashed into the original commit:
f56bf5dd7ffc ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
But I do not immediately see how checking preemptible() would help,
especially in (say) CONFIG_PREEMPT_NONE=y kernels in which this function
always returns zero. What am I missing here?
Thanx, Paul
> Thanks
> Zqiang
>
>
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index db63378f062051..b52ec45698e85b 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -113,8 +113,8 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
> >
> > /*
> > * Workqueue handler to drive one grace period and invoke any callbacks
> > - * that become ready as a result. Single-CPU and !PREEMPTION operation
> > - * means that we get away with murder on synchronization. ;-)
> > + * that become ready as a result. Single-CPU operation and preemption
> > + * disabling mean that we get away with murder on synchronization. ;-)
> > */
> > void srcu_drive_gp(struct work_struct *wp)
> > {
> > @@ -141,7 +141,12 @@ void srcu_drive_gp(struct work_struct *wp)
> > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> > WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
> > preempt_enable();
> > - swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
> > + do {
> > + // Deadlock issues prevent __srcu_read_unlock() from
> > + // doing an unconditional wakeup, so polling is required.
> > + swait_event_timeout_exclusive(ssp->srcu_wq,
> > + !READ_ONCE(ssp->srcu_lock_nesting[idx]), HZ / 10);
> > + } while (READ_ONCE(ssp->srcu_lock_nesting[idx]));
> > preempt_disable(); // Needed for PREEMPT_LAZY
> > WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
> > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-09-01 1:06 ` Paul E. McKenney
@ 2025-09-01 13:22 ` Zqiang
0 siblings, 0 replies; 11+ messages in thread
From: Zqiang @ 2025-09-01 13:22 UTC (permalink / raw)
To: paulmck
Cc: kernel test robot, oe-lkp, lkp, Andrii Nakryiko,
Alexei Starovoitov, Peter Zijlstra, rcu, linux-kernel
>
> On Sun, Aug 31, 2025 at 11:52:40PM +0000, Zqiang wrote:
>
> >
> > On Sun, Aug 31, 2025 at 02:22:56AM +0000, Zqiang wrote:
> >
> > >
> > > On Sat, Aug 30, 2025 at 02:38:35AM +0000, Zqiang wrote:
> > >
> > > >
> > > > On Tue, Aug 26, 2025 at 04:47:22PM +0800, kernel test robot wrote:
> > > >
> > > > >
> > > > > hi, Paul,
> > > > >
> > > > > the similar issue still exists on this dev.2025.08.21a branch.
> > > > > again, if the issue is already fixed on later branches, please just ignore.
> > > > > thanks
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
> > > > >
> > > > > commit: 8bd9383727068a5a18acfecefbdfa44a7d6bd838 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> > > > > https://github.com/paulmckrcu/linux dev.2025.08.21a
> > > > >
> > > > > in testcase: rcutorture
> > > > > version:
> > > > > with following parameters:
> > > > >
> > > > > runtime: 300s
> > > > > test: default
> > > > > torture_type: tasks-tracing
> > > > >
> > > > >
> > > > >
> > > > > config: x86_64-randconfig-003-20250824
> > > > > compiler: clang-20
> > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > > >
> > > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > > >
> > > > Again, apologies for being slow, and thank you for your testing efforts.
> > > >
> > > > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > > > for an alleged fix. Does it do the trick for you?
> > > >
> > > > Thanx, Paul
> > > >
> > > > >
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
> > > > >
> > > > >
> > > > > [ 42.365933][ T393] WARNING: possible circular locking dependency detected
> > > > > [ 42.366428][ T393] 6.17.0-rc1-00035-g8bd938372706 #1 Tainted: G T
> > > > > [ 42.366985][ T393] ------------------------------------------------------
> > > > > [ 42.367490][ T393] rcu_torture_rea/393 is trying to acquire lock:
> > > > > [ 42.367952][ T393] ffffffffad41dc88 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}, at: swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.368775][ T393]
> > > > > [ 42.368775][ T393] but task is already holding lock:
> > > > > [ 42.369278][ T393] ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.370043][ T393]
> > > > > [ 42.370043][ T393] which lock already depends on the new lock.
> > > > > [ 42.370043][ T393]
> > > > > [ 42.370755][ T393]
> > > > > [ 42.370755][ T393] the existing dependency chain (in reverse order) is:
> > > > > [ 42.371388][ T393]
> > > > > [ 42.371388][ T393] -> #1 (&p->pi_lock){-.-.}-{2:2}:
> > > > > [ 42.371903][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.372309][ T393] try_to_wake_up (include/linux/spinlock.h:557 (discriminator 1) kernel/sched/core.c:4216 (discriminator 1))
> > > > > [ 42.372669][ T393] swake_up_locked (include/linux/list.h:111)
> > > > > [ 42.373029][ T393] swake_up_one (kernel/sched/swait.c:54 (discriminator 1))
> > > > > [ 42.373380][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.373952][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.374452][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.374976][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.375460][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.375920][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.376241][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.376603][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.376973][ T393]
> > > > > [ 42.376973][ T393] -> #0 (rcu_tasks_trace_srcu_struct.srcu_wq.lock){....}-{2:2}:
> > > > > [ 42.377657][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > > [ 42.378031][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > > [ 42.378378][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.378794][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.379152][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.379714][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.380217][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.380731][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.381220][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.381714][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.382060][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.382420][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.382796][ T393]
> > > > > [ 42.382796][ T393] other info that might help us debug this:
> > > > > [ 42.382796][ T393]
> > > > > [ 42.383515][ T393] Possible unsafe locking scenario:
> > > > > [ 42.383515][ T393]
> > > > > [ 42.384052][ T393] CPU0 CPU1
> > > > > [ 42.384428][ T393] ---- ----
> > > > > [ 42.384799][ T393] lock(&p->pi_lock);
> > > > > [ 42.385083][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > > [ 42.385707][ T393] lock(&p->pi_lock);
> > > > > [ 42.386180][ T393] lock(rcu_tasks_trace_srcu_struct.srcu_wq.lock);
> > > > > [ 42.386663][ T393]
> > > > > [ 42.386663][ T393] *** DEADLOCK ***
> > > > > [ 42.386663][ T393]
> > > > > [ 42.387236][ T393] 1 lock held by rcu_torture_rea/393:
> > > > > [ 42.387626][ T393] #0: ffff88813d1ff2e8 (&p->pi_lock){-.-.}-{2:2}, at: rcutorture_one_extend (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.388419][ T393]
> > > > > [ 42.388419][ T393] stack backtrace:
> > > > > [ 42.388852][ T393] CPU: 0 UID: 0 PID: 393 Comm: rcu_torture_rea Tainted: G T 6.17.0-rc1-00035-g8bd938372706 #1 PREEMPT(full)
> > > > > [ 42.389758][ T393] Tainted: [T]=RANDSTRUCT
> > > > > [ 42.390057][ T393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > [ 42.390786][ T393] Call Trace:
> > > > > [ 42.391020][ T393] <TASK>
> > > > > [ 42.391225][ T393] dump_stack_lvl (lib/dump_stack.c:123 (discriminator 2))
> > > > > [ 42.391544][ T393] print_circular_bug (kernel/locking/lockdep.c:2045)
> > > > > [ 42.391898][ T393] check_noncircular (kernel/locking/lockdep.c:?)
> > > > > [ 42.392242][ T393] __lock_acquire (kernel/locking/lockdep.c:3166)
> > > > > [ 42.392594][ T393] ? __schedule (kernel/sched/sched.h:1531 (discriminator 1) kernel/sched/core.c:6969 (discriminator 1))
> > > > > [ 42.392930][ T393] ? lock_release (kernel/locking/lockdep.c:470 (discriminator 3))
> > > > > [ 42.393272][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.393610][ T393] lock_acquire (kernel/locking/lockdep.c:5868)
> > > > > [ 42.393930][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.394264][ T393] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162)
> > > > > [ 42.394640][ T393] ? swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.394969][ T393] swake_up_one (kernel/sched/swait.c:52 (discriminator 1))
> > > > > [ 42.395281][ T393] tasks_tracing_torture_read_unlock (include/linux/srcu.h:408 (discriminator 1) include/linux/rcupdate_trace.h:81 (discriminator 1) kernel/rcu/rcutorture.c:1112 (discriminator 1)) rcutorture
> > > > > [ 42.395814][ T393] rcutorture_one_extend (kernel/rcu/rcutorture.c:2141) rcutorture
> > > > > [ 42.396276][ T393] rcu_torture_one_read_end (kernel/rcu/rcutorture.c:2357) rcutorture
> > > > > [ 42.396756][ T393] rcu_torture_one_read (kernel/rcu/rcutorture.c:?) rcutorture
> > > > > [ 42.397219][ T393] ? __cfi_rcu_torture_reader (kernel/rcu/rcutorture.c:2426) rcutorture
> > > > > [ 42.397690][ T393] rcu_torture_reader (kernel/rcu/rcutorture.c:2443) rcutorture
> > > > > [ 42.398126][ T393] ? __cfi_rcu_torture_timer (kernel/rcu/rcutorture.c:2405) rcutorture
> > > > > [ 42.398565][ T393] kthread (kernel/kthread.c:465)
> > > > > [ 42.398857][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > > [ 42.399169][ T393] ret_from_fork (arch/x86/kernel/process.c:154)
> > > > > [ 42.399491][ T393] ? __cfi_kthread (kernel/kthread.c:412)
> > > > > [ 42.399815][ T393] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
> > > > > [ 42.400151][ T393] </TASK>
> > > > >
> > > > >
> > > > > The kernel config and materials to reproduce are available at:
> > > > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > 0-DAY CI Kernel Test Service
> > > > > https://github.com/intel/lkp-tests/wiki
> > > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > > index 6e9fe2ce1075d5..db63378f062051 100644
> > > > --- a/kernel/rcu/srcutiny.c
> > > > +++ b/kernel/rcu/srcutiny.c
> > > > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > > > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > > > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > > > preempt_enable();
> > > > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > > > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> > > >
> > > >
> > > > The fllowing case may exist:
> > > >
> > > >
> > > > CPU0
> > > >
> > > > task1:
> > > > __srcu_read_lock()
> > > >
> > > For mainline kernels, here we must have blocked, correct?
> > >
> > > In -rcu, there is of course:
> > >
> > > 740cda2fe1a9 ("EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels")
> > >
> > > And this means that in -rcu kernels built with CONFIG_PREEMPT_NONE=y,
> > > we could be preempted.
> > >
> > > And maybe this is a reason to drop this commit. Or...
> > >
> > >
> > > For tiny srcu, even if the preempt schedule not happend in
> > > srcu read ctrical section, we can still do voluntary
> > > scheduling in srcu_read ctrical section, this case is
> > > also still happend.
> > >
> > > >
> > > > ....
> > > >
> > > >
> > > > task2 preempt run:
> > > >
> > > > srcu_drive_gp()
> > > > ->swait_event_exclusive()
> > > >
> > > >
> > > > ....
> > > > task1 continue run:
> > > > ....
> > > > raw_spin_lock_irqsave
> > > > __srcu_read_unlock()
> > > > ->find all previours condition are met
> > > > but the irqs_disable() return true,
> > > > not invoke swake_up_one().
> > > >
> > > > task2 maybe always hung.
> > > >
> > > The bug that kernel test robot reported existed for a long time.
> > > The offending commit simply introduced the use case that exercised
> > > this bug. So we do need a fix.
> > >
> > > One approach would be to impose a rule like we used to have for RCU,
> > > namely that if interrupts were disabled across srcu_read_unlock(),
> > > then they must have been disabled since the matching srcu_read_lock().
> > > Another would be to make the current swait_event_exclusive() in
> > > srcu_drive_gp() instead be a loop around wait_event_timeout_exclusive()
> > > that checks ssp->srcu_lock_nesting[].
> > >
> > > But is there a better way?
> > >
> > > I think the second approach is enough :)
> > >
> > Hmmm... OK, how about the incremental patch below?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit a543d73eeaa491021040a02bdf0e8a9148b5c186
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date: Sun Aug 31 09:38:44 2025 -0700
> >
> > squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
> >
> > [ paulmck: Apply Zqiang feedback. ]
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > Should the previous fix that added irqs_disabled() also be
> > added to this patch? or can we use preemptible() instead of
> > in_tasks() && irqs_disabled()?
> >
> Yes, both this fix and the addition of the !irqw_disabled() check
> are to be squashed into the original commit:
>
> f56bf5dd7ffc ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
>
> But I do not immediately see how checking preemptible() would help,
> especially in (say) CONFIG_PREEMPT_NONE=y kernels in which this function
> always returns zero. What am I missing here?
You are right, I miss CONFIG_PREEMPT_NONE=y kernels is always return zero.
Thanks
Zqiang
>
> Thanx, Paul
>
> >
> > Thanks
> > Zqiang
> >
> >
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index db63378f062051..b52ec45698e85b 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -113,8 +113,8 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
> >
> > /*
> > * Workqueue handler to drive one grace period and invoke any callbacks
> > - * that become ready as a result. Single-CPU and !PREEMPTION operation
> > - * means that we get away with murder on synchronization. ;-)
> > + * that become ready as a result. Single-CPU operation and preemption
> > + * disabling mean that we get away with murder on synchronization. ;-)
> > */
> > void srcu_drive_gp(struct work_struct *wp)
> > {
> > @@ -141,7 +141,12 @@ void srcu_drive_gp(struct work_struct *wp)
> > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> > WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
> > preempt_enable();
> > - swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
> > + do {
> > + // Deadlock issues prevent __srcu_read_unlock() from
> > + // doing an unconditional wakeup, so polling is required.
> > + swait_event_timeout_exclusive(ssp->srcu_wq,
> > + !READ_ONCE(ssp->srcu_lock_nesting[idx]), HZ / 10);
> > + } while (READ_ONCE(ssp->srcu_lock_nesting[idx]));
> > preempt_disable(); // Needed for PREEMPT_LAZY
> > WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
> > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
> >
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-08-29 17:23 ` Paul E. McKenney
2025-08-30 2:38 ` Zqiang
@ 2025-09-03 2:03 ` Oliver Sang
2025-09-03 10:42 ` Paul E. McKenney
1 sibling, 1 reply; 11+ messages in thread
From: Oliver Sang @ 2025-09-03 2:03 UTC (permalink / raw)
To: Paul E. McKenney
Cc: oe-lkp, lkp, Andrii Nakryiko, Alexei Starovoitov, Peter Zijlstra,
rcu, linux-kernel, oliver.sang
hi, Paul,
On Fri, Aug 29, 2025 at 10:23:17AM -0700, Paul E. McKenney wrote:
[...]
> Again, apologies for being slow, and thank you for your testing efforts.
>
> Idiot here forgot about Tiny SRCU, so please see the end of this email
> for an alleged fix. Does it do the trick for you?
besides the patch [1] as in the end part of mail, we noticed you also have a
patch in [2]. I don't have enough knowledge to follow the dicussion between
you and Zqiang well. it just seems to me both patches are fixes which should
be squashed into original patch?
I made below applyment:
* e5ab29c09c470e squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast <--- patch in [2]
* f717bca99dfb15 1st fix patch for 8bd9383727 from Paul <--- patch in [1]
* 8bd9383727068a rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
by testing, the issue "WARNING:possible_circular_locking_dependency_detected"
cannot be reproduced on both f717bca99dfb15 and e5ab29c09c470e
sorry if I miss or misunderstood something, otherwise,
Tested-by: kernel test robot <oliver.sang@intel.com>
>
> Thanx, Paul
>
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
[...]
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
>
[1]
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> index 6e9fe2ce1075d5..db63378f062051 100644
> --- a/kernel/rcu/srcutiny.c
> +++ b/kernel/rcu/srcutiny.c
> @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> preempt_enable();
> - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> swake_up_one(&ssp->srcu_wq);
> }
> EXPORT_SYMBOL_GPL(__srcu_read_unlock);
>
[2] https://lore.kernel.org/all/8f43f958-e3e6-44d5-9600-9e096c3a06b7@paulmck-laptop/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected
2025-09-03 2:03 ` Oliver Sang
@ 2025-09-03 10:42 ` Paul E. McKenney
0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2025-09-03 10:42 UTC (permalink / raw)
To: Oliver Sang
Cc: oe-lkp, lkp, Andrii Nakryiko, Alexei Starovoitov, Peter Zijlstra,
rcu, linux-kernel
On Wed, Sep 03, 2025 at 10:03:48AM +0800, Oliver Sang wrote:
> hi, Paul,
> On Fri, Aug 29, 2025 at 10:23:17AM -0700, Paul E. McKenney wrote:
>
> [...]
>
> > Again, apologies for being slow, and thank you for your testing efforts.
> >
> > Idiot here forgot about Tiny SRCU, so please see the end of this email
> > for an alleged fix. Does it do the trick for you?
>
> besides the patch [1] as in the end part of mail, we noticed you also have a
> patch in [2]. I don't have enough knowledge to follow the dicussion between
> you and Zqiang well. it just seems to me both patches are fixes which should
> be squashed into original patch?
>
> I made below applyment:
>
> * e5ab29c09c470e squash! rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast <--- patch in [2]
> * f717bca99dfb15 1st fix patch for 8bd9383727 from Paul <--- patch in [1]
> * 8bd9383727068a rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast
>
> by testing, the issue "WARNING:possible_circular_locking_dependency_detected"
> cannot be reproduced on both f717bca99dfb15 and e5ab29c09c470e
>
> sorry if I miss or misunderstood something, otherwise,
You got it exactly! The updated commit is this one:
a9e63382cbfa ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
> Tested-by: kernel test robot <oliver.sang@intel.com>
I will apply on my next rebase, and thank you for your testing efforts!
Thanx, Paul
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
>
> [...]
>
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20250826/202508261642.b15eefbb-lkp@intel.com
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> >
>
> [1]
>
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index 6e9fe2ce1075d5..db63378f062051 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -106,7 +106,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
> > WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
> > preempt_enable();
> > - if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task())
> > + if (!newval && READ_ONCE(ssp->srcu_gp_waiting) && in_task() && !irqs_disabled())
> > swake_up_one(&ssp->srcu_wq);
> > }
> > EXPORT_SYMBOL_GPL(__srcu_read_unlock);
> >
>
> [2] https://lore.kernel.org/all/8f43f958-e3e6-44d5-9600-9e096c3a06b7@paulmck-laptop/
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-09-03 10:42 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 8:47 [paulmckrcu:dev.2025.08.21a] [rcu] 8bd9383727: WARNING:possible_circular_locking_dependency_detected kernel test robot
2025-08-29 17:23 ` Paul E. McKenney
2025-08-30 2:38 ` Zqiang
2025-08-30 12:59 ` Paul E. McKenney
2025-08-31 2:22 ` Zqiang
2025-08-31 16:40 ` Paul E. McKenney
2025-08-31 23:52 ` Zqiang
2025-09-01 1:06 ` Paul E. McKenney
2025-09-01 13:22 ` Zqiang
2025-09-03 2:03 ` Oliver Sang
2025-09-03 10:42 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).