* [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat
@ 2024-03-07 16:09 Stefan Wiehler
2024-03-07 17:45 ` Paul E. McKenney
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Stefan Wiehler @ 2024-03-07 16:09 UTC (permalink / raw)
To: Russell King, Paul E. McKenney, Joel Fernandes, Josh Triplett,
Boqun Feng
Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
linux-arm-kernel, rcu, linux-kernel, Stefan Wiehler
With CONFIG_PROVE_RCU_LIST=y and by executing
$ echo 0 > /sys/devices/system/cpu/cpu1/online
one can trigger the following Lockdep-RCU splat on ARM:
=============================
WARNING: suspicious RCU usage
6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
-----------------------------
kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
Hardware name: Allwinner sun8i Family
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x60/0x90
dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
__lock_acquire from lock_acquire+0x10c/0x348
lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
_raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
arch_cpu_idle_dead from do_idle+0xbc/0x138
do_idle from cpu_startup_entry+0x28/0x2c
cpu_startup_entry from secondary_start_kernel+0x11c/0x124
secondary_start_kernel from 0x401018a0
The CPU is already reported as offline from RCU perspective in
cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
ASID spinlock.
Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
online again while the spinlock is held.
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
---
arch/arm/kernel/smp.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3431c0553f45..6875e2c5dd50 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
{
unsigned int cpu = smp_processor_id();
+ /*
+ * Briefly report CPU as online again to avoid false positive
+ * Lockdep-RCU splat when check_and_switch_context() acquires ASID
+ * spinlock.
+ */
+ rcutree_report_cpu_starting(cpu);
idle_task_exit();
+ rcutree_report_cpu_dead();
local_irq_disable();
--
2.42.0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 16:09 [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat Stefan Wiehler @ 2024-03-07 17:45 ` Paul E. McKenney 2024-03-07 17:49 ` Paul E. McKenney 2024-03-07 18:54 ` Russell King (Oracle) 2024-03-07 18:49 ` Boqun Feng ` (2 subsequent siblings) 3 siblings, 2 replies; 20+ messages in thread From: Paul E. McKenney @ 2024-03-07 17:45 UTC (permalink / raw) To: Stefan Wiehler Cc: Russell King, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote: > With CONFIG_PROVE_RCU_LIST=y and by executing > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > one can trigger the following Lockdep-RCU splat on ARM: > > ============================= > WARNING: suspicious RCU usage > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > ----------------------------- > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 1 > no locks held by swapper/1/0. > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > Hardware name: Allwinner sun8i Family > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x60/0x90 > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > __lock_acquire from lock_acquire+0x10c/0x348 > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > arch_cpu_idle_dead from do_idle+0xbc/0x138 > do_idle from cpu_startup_entry+0x28/0x2c > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > secondary_start_kernel from 0x401018a0 > > The CPU is already reported as offline from RCU perspective in > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the > ASID spinlock. > > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as > online again while the spinlock is held. > > Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com> From an RCU perspective, this looks plausible. One question below. Thanx, Paul > --- > arch/arm/kernel/smp.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 3431c0553f45..6875e2c5dd50 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > { > unsigned int cpu = smp_processor_id(); > > + /* > + * Briefly report CPU as online again to avoid false positive > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > + * spinlock. > + */ > + rcutree_report_cpu_starting(cpu); > idle_task_exit(); > + rcutree_report_cpu_dead(); > > local_irq_disable(); Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain bitterly via lockdep if interrupts are enabled. And the call sites have interrupts disabled. So I don't understand what this local_irq_disable() is needed for. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 17:45 ` Paul E. McKenney @ 2024-03-07 17:49 ` Paul E. McKenney 2024-03-07 18:54 ` Russell King (Oracle) 1 sibling, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2024-03-07 17:49 UTC (permalink / raw) To: Stefan Wiehler Cc: Russell King, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote: > On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote: > > With CONFIG_PROVE_RCU_LIST=y and by executing > > > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > > > one can trigger the following Lockdep-RCU splat on ARM: > > > > ============================= > > WARNING: suspicious RCU usage > > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > > ----------------------------- > > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 2, debug_locks = 1 > > no locks held by swapper/1/0. > > > > stack backtrace: > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > > Hardware name: Allwinner sun8i Family > > unwind_backtrace from show_stack+0x10/0x14 > > show_stack from dump_stack_lvl+0x60/0x90 > > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > > __lock_acquire from lock_acquire+0x10c/0x348 > > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > > arch_cpu_idle_dead from do_idle+0xbc/0x138 > > do_idle from cpu_startup_entry+0x28/0x2c > > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > > secondary_start_kernel from 0x401018a0 > > > > The CPU is already reported as offline from RCU perspective in > > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above > > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the > > ASID spinlock. > > > > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as > > online again while the spinlock is held. > > > > Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com> > > From an RCU perspective, this looks plausible. One question > below. But one additional caution... If execution is delayed during that call to idle_task_exit(), RCU will stall and won't have a reasonable way of motivating this CPU. Such delays could be due to vCPU preemption or due to firmware grabbing the CPU. But this is only a caution, not opposition. After all, you could have the same problem with an online CPU that gets similarly delayed while its interrupts are disabled. Thanx, Paul > > --- > > arch/arm/kernel/smp.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > > index 3431c0553f45..6875e2c5dd50 100644 > > --- a/arch/arm/kernel/smp.c > > +++ b/arch/arm/kernel/smp.c > > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > > { > > unsigned int cpu = smp_processor_id(); > > > > + /* > > + * Briefly report CPU as online again to avoid false positive > > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > > + * spinlock. > > + */ > > + rcutree_report_cpu_starting(cpu); > > idle_task_exit(); > > + rcutree_report_cpu_dead(); > > > > local_irq_disable(); > > Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain > bitterly via lockdep if interrupts are enabled. And the call sites have > interrupts disabled. So I don't understand what this local_irq_disable() > is needed for. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 17:45 ` Paul E. McKenney 2024-03-07 17:49 ` Paul E. McKenney @ 2024-03-07 18:54 ` Russell King (Oracle) 2024-03-07 19:08 ` Paul E. McKenney 1 sibling, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2024-03-07 18:54 UTC (permalink / raw) To: Paul E. McKenney Cc: Stefan Wiehler, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote: > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > > index 3431c0553f45..6875e2c5dd50 100644 > > --- a/arch/arm/kernel/smp.c > > +++ b/arch/arm/kernel/smp.c > > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > > { > > unsigned int cpu = smp_processor_id(); > > > > + /* > > + * Briefly report CPU as online again to avoid false positive > > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > > + * spinlock. > > + */ > > + rcutree_report_cpu_starting(cpu); > > idle_task_exit(); > > + rcutree_report_cpu_dead(); > > > > local_irq_disable(); > > Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain > bitterly via lockdep if interrupts are enabled. And the call sites have > interrupts disabled. So I don't understand what this local_irq_disable() > is needed for. I think that's a question for this commit: commit e78a7614f3876ac649b3df608789cb6ef74d0480 Author: Peter Zijlstra <peterz@infradead.org> Date: Wed Jun 5 07:46:43 2019 -0700 Before this commit, arch_cpu_idle_dead() was called with IRQs enabled. This commit moved the local_irq_disable() before calling arch_cpu_idle_dead() but it seems no one looked at the various arch implementations to clean those up. Quite how arch people are supposed to spot this and clean up after such a commit, I'm not sure. The local_irq_disable() that you're asking about has been there ever since the inception of SMP on 32-bit ARM in this commit: commit a054a811597a17ffbe92bc4db04a4dc2f1b1ea55 Author: Russell King <rmk@dyn-67.arm.linux.org.uk> Date: Wed Nov 2 22:24:33 2005 +0000 Where cpu_die() was later renamed to arch_cpu_idle_dead(). So it's purely a case of a change being made to core code and arch code not receiving any fixups for it. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 18:54 ` Russell King (Oracle) @ 2024-03-07 19:08 ` Paul E. McKenney 0 siblings, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2024-03-07 19:08 UTC (permalink / raw) To: Russell King (Oracle) Cc: Stefan Wiehler, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Thu, Mar 07, 2024 at 06:54:38PM +0000, Russell King (Oracle) wrote: > On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote: > > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > > > index 3431c0553f45..6875e2c5dd50 100644 > > > --- a/arch/arm/kernel/smp.c > > > +++ b/arch/arm/kernel/smp.c > > > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > > > { > > > unsigned int cpu = smp_processor_id(); > > > > > > + /* > > > + * Briefly report CPU as online again to avoid false positive > > > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > > > + * spinlock. > > > + */ > > > + rcutree_report_cpu_starting(cpu); > > > idle_task_exit(); > > > + rcutree_report_cpu_dead(); > > > > > > local_irq_disable(); > > > > Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain > > bitterly via lockdep if interrupts are enabled. And the call sites have > > interrupts disabled. So I don't understand what this local_irq_disable() > > is needed for. > > I think that's a question for this commit: > > commit e78a7614f3876ac649b3df608789cb6ef74d0480 > Author: Peter Zijlstra <peterz@infradead.org> > Date: Wed Jun 5 07:46:43 2019 -0700 > > Before this commit, arch_cpu_idle_dead() was called with IRQs enabled. > This commit moved the local_irq_disable() before calling > arch_cpu_idle_dead() but it seems no one looked at the various arch > implementations to clean those up. Quite how arch people are supposed > to spot this and clean up after such a commit, I'm not sure. Telepathy? ;-) > The local_irq_disable() that you're asking about has been there ever > since the inception of SMP on 32-bit ARM in this commit: > > commit a054a811597a17ffbe92bc4db04a4dc2f1b1ea55 > Author: Russell King <rmk@dyn-67.arm.linux.org.uk> > Date: Wed Nov 2 22:24:33 2005 +0000 > > Where cpu_die() was later renamed to arch_cpu_idle_dead(). So it's > purely a case of a change being made to core code and arch code not > receiving any fixups for it. Thank you for the info! Thanx, Paul _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 16:09 [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat Stefan Wiehler 2024-03-07 17:45 ` Paul E. McKenney @ 2024-03-07 18:49 ` Boqun Feng 2024-03-08 10:20 ` Stefan Wiehler 2024-03-08 14:57 ` Joel Fernandes 2024-03-11 16:08 ` Russell King (Oracle) 3 siblings, 1 reply; 20+ messages in thread From: Boqun Feng @ 2024-03-07 18:49 UTC (permalink / raw) To: Stefan Wiehler Cc: Russell King, Paul E. McKenney, Joel Fernandes, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel, Frederic Weisbecker [Cc Frederic] On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote: > With CONFIG_PROVE_RCU_LIST=y and by executing > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > one can trigger the following Lockdep-RCU splat on ARM: > > ============================= > WARNING: suspicious RCU usage > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > ----------------------------- > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 1 > no locks held by swapper/1/0. > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > Hardware name: Allwinner sun8i Family > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x60/0x90 > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > __lock_acquire from lock_acquire+0x10c/0x348 > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > arch_cpu_idle_dead from do_idle+0xbc/0x138 > do_idle from cpu_startup_entry+0x28/0x2c > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > secondary_start_kernel from 0x401018a0 > > The CPU is already reported as offline from RCU perspective in > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the > ASID spinlock. > > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as I'm not sure this is a false-positive: if you traverse an RCU-list without RCU watching the CPU, then a grace period may pass, the next list can be garbage and you can go anywhere from the list. Or am I missing something that a CPU hotplug can block a grace period? But codewise, it looks good to me. Although I hope we can avoid calling rcutree_report_cpu_{starting,dead}() here and use something simple, however after a few attempts, I don't find a better way. Regards, Boqun > online again while the spinlock is held. > > Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com> > --- > arch/arm/kernel/smp.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 3431c0553f45..6875e2c5dd50 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > { > unsigned int cpu = smp_processor_id(); > > + /* > + * Briefly report CPU as online again to avoid false positive > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > + * spinlock. > + */ > + rcutree_report_cpu_starting(cpu); > idle_task_exit(); > + rcutree_report_cpu_dead(); > > local_irq_disable(); > > -- > 2.42.0 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 18:49 ` Boqun Feng @ 2024-03-08 10:20 ` Stefan Wiehler 0 siblings, 0 replies; 20+ messages in thread From: Stefan Wiehler @ 2024-03-08 10:20 UTC (permalink / raw) To: Boqun Feng Cc: Russell King, Paul E. McKenney, Joel Fernandes, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel, Frederic Weisbecker > I'm not sure this is a false-positive: if you traverse an RCU-list > without RCU watching the CPU, then a grace period may pass, the next > list can be garbage and you can go anywhere from the list. Or am I > missing something that a CPU hotplug can block a grace period? Thanks for the feedback! Yes, calling it a false positive is probably not correct, I will remove that. However, as Russell has explained in a previous email, the spinlock is fundamental for ASID handling and cannot be removed. @Russell, I will submit to the patch tracking system then. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 16:09 [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat Stefan Wiehler 2024-03-07 17:45 ` Paul E. McKenney 2024-03-07 18:49 ` Boqun Feng @ 2024-03-08 14:57 ` Joel Fernandes 2024-03-09 7:45 ` Stefan Wiehler 2024-03-11 16:08 ` Russell King (Oracle) 3 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2024-03-08 14:57 UTC (permalink / raw) To: Stefan Wiehler, Russell King, Paul E. McKenney, Josh Triplett, Boqun Feng Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On 3/7/2024 11:09 AM, Stefan Wiehler wrote: > With CONFIG_PROVE_RCU_LIST=y and by executing > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > one can trigger the following Lockdep-RCU splat on ARM: > > ============================= > WARNING: suspicious RCU usage > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > ----------------------------- > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 1 > no locks held by swapper/1/0. > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > Hardware name: Allwinner sun8i Family > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x60/0x90 > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > __lock_acquire from lock_acquire+0x10c/0x348 > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > arch_cpu_idle_dead from do_idle+0xbc/0x138 > do_idle from cpu_startup_entry+0x28/0x2c > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > secondary_start_kernel from 0x401018a0 > > The CPU is already reported as offline from RCU perspective in > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the > ASID spinlock. > > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as > online again while the spinlock is held. > > Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com> > --- > arch/arm/kernel/smp.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 3431c0553f45..6875e2c5dd50 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > { > unsigned int cpu = smp_processor_id(); > > + /* > + * Briefly report CPU as online again to avoid false positive > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > + * spinlock. > + */ > + rcutree_report_cpu_starting(cpu); > idle_task_exit(); > + rcutree_report_cpu_dead(); I agree with the problem but disagree with the patch because it feels like a terrible workaround. Can we just use arch_spin_lock() for the cpu_asid_lock? This might require acquiring the raw_lock within the raw_spinlock_t, but there is precedent: arch/powerpc/kvm/book3s_hv_rm_mmu.c:245: arch_spin_lock(&kvm->mmu_lock.rlock.raw_lock); IMO, lockdep tracking of this lock is not necessary or possible considering the hotplug situation. Or is there a reason you need lockdep working for the cpu_asid_lock? thanks, - Joel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-08 14:57 ` Joel Fernandes @ 2024-03-09 7:45 ` Stefan Wiehler 2024-03-09 9:57 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Stefan Wiehler @ 2024-03-09 7:45 UTC (permalink / raw) To: Joel Fernandes, Russell King, Paul E. McKenney, Josh Triplett, Boqun Feng Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel > I agree with the problem but disagree with the patch because it feels like a > terrible workaround. > > Can we just use arch_spin_lock() for the cpu_asid_lock? This might require > acquiring the raw_lock within the raw_spinlock_t, but there is precedent: > > arch/powerpc/kvm/book3s_hv_rm_mmu.c:245: > arch_spin_lock(&kvm->mmu_lock.rlock.raw_lock); > > IMO, lockdep tracking of this lock is not necessary or possible considering the > hotplug situation. > > Or is there a reason you need lockdep working for the cpu_asid_lock? I was not aware of this possibility to bypass lockdep tracing, but this seems to work and indeed looks like less of a workaround: diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c index 4204ffa2d104..4fc2c559f1b6 100644 --- a/arch/arm/mm/context.c +++ b/arch/arm/mm/context.c @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) && atomic64_xchg(&per_cpu(active_asids, cpu), asid)) goto switch_mm_fastpath; - raw_spin_lock_irqsave(&cpu_asid_lock, flags); + local_irq_save(flags); + arch_spin_lock(&cpu_asid_lock.raw_lock); /* Check that our ASID belongs to the current generation. */ asid = atomic64_read(&mm->context.id); if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) { @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) atomic64_set(&per_cpu(active_asids, cpu), asid); cpumask_set_cpu(cpu, mm_cpumask(mm)); - raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); + arch_spin_unlock(&cpu_asid_lock.raw_lock); + local_irq_restore(flags); switch_mm_fastpath: cpu_switch_mm(mm->pgd, mm); @Russell, what do you think? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-09 7:45 ` Stefan Wiehler @ 2024-03-09 9:57 ` Russell King (Oracle) 2024-03-12 22:14 ` Will Deacon 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2024-03-09 9:57 UTC (permalink / raw) To: Stefan Wiehler, Will Deacon Cc: Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote: > > I agree with the problem but disagree with the patch because it feels like a > > terrible workaround. > > > > Can we just use arch_spin_lock() for the cpu_asid_lock? This might require > > acquiring the raw_lock within the raw_spinlock_t, but there is precedent: > > > > arch/powerpc/kvm/book3s_hv_rm_mmu.c:245: > > arch_spin_lock(&kvm->mmu_lock.rlock.raw_lock); > > > > IMO, lockdep tracking of this lock is not necessary or possible considering the > > hotplug situation. > > > > Or is there a reason you need lockdep working for the cpu_asid_lock? > > I was not aware of this possibility to bypass lockdep tracing, but this seems > to work and indeed looks like less of a workaround: > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c > index 4204ffa2d104..4fc2c559f1b6 100644 > --- a/arch/arm/mm/context.c > +++ b/arch/arm/mm/context.c > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > && atomic64_xchg(&per_cpu(active_asids, cpu), asid)) > goto switch_mm_fastpath; > > - raw_spin_lock_irqsave(&cpu_asid_lock, flags); > + local_irq_save(flags); > + arch_spin_lock(&cpu_asid_lock.raw_lock); > /* Check that our ASID belongs to the current generation. */ > asid = atomic64_read(&mm->context.id); > if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) { > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > atomic64_set(&per_cpu(active_asids, cpu), asid); > cpumask_set_cpu(cpu, mm_cpumask(mm)); > - raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); > + arch_spin_unlock(&cpu_asid_lock.raw_lock); > + local_irq_restore(flags); > > switch_mm_fastpath: > cpu_switch_mm(mm->pgd, mm); > > @Russell, what do you think? I think this is Will Deacon's code, so we ought to hear from Will... -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-09 9:57 ` Russell King (Oracle) @ 2024-03-12 22:14 ` Will Deacon 2024-03-12 22:39 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Will Deacon @ 2024-03-12 22:14 UTC (permalink / raw) To: Russell King (Oracle) Cc: Stefan Wiehler, Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel Hi Russell, On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote: > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote: > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c > > index 4204ffa2d104..4fc2c559f1b6 100644 > > --- a/arch/arm/mm/context.c > > +++ b/arch/arm/mm/context.c > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > && atomic64_xchg(&per_cpu(active_asids, cpu), asid)) > > goto switch_mm_fastpath; > > > > - raw_spin_lock_irqsave(&cpu_asid_lock, flags); > > + local_irq_save(flags); > > + arch_spin_lock(&cpu_asid_lock.raw_lock); > > /* Check that our ASID belongs to the current generation. */ > > asid = atomic64_read(&mm->context.id); > > if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) { > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > > atomic64_set(&per_cpu(active_asids, cpu), asid); > > cpumask_set_cpu(cpu, mm_cpumask(mm)); > > - raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); > > + arch_spin_unlock(&cpu_asid_lock.raw_lock); > > + local_irq_restore(flags); > > > > switch_mm_fastpath: > > cpu_switch_mm(mm->pgd, mm); > > > > @Russell, what do you think? > > I think this is Will Deacon's code, so we ought to hear from Will... Thanks for adding me in. Using arch_spin_lock() really feels like a bodge to me. This code isn't run only on the hot-unplug path, but rather this is part of switch_mm() and we really should be able to have lockdep work properly there for the usual case. Now, do we actually need to worry about the ASID when switching to the init_mm? I'd have thought that would be confined to global (kernel) mappings, so I wonder whether we could avoid this slow path code altogether like we do on arm64 in __switch_mm(). But I must confess that I don't recall the details of the pre-LPAE MMU configuration... Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-12 22:14 ` Will Deacon @ 2024-03-12 22:39 ` Russell King (Oracle) 2024-03-13 0:32 ` Will Deacon 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2024-03-12 22:39 UTC (permalink / raw) To: Will Deacon Cc: Stefan Wiehler, Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Tue, Mar 12, 2024 at 10:14:40PM +0000, Will Deacon wrote: > Hi Russell, > > On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote: > > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote: > > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c > > > index 4204ffa2d104..4fc2c559f1b6 100644 > > > --- a/arch/arm/mm/context.c > > > +++ b/arch/arm/mm/context.c > > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > && atomic64_xchg(&per_cpu(active_asids, cpu), asid)) > > > goto switch_mm_fastpath; > > > > > > - raw_spin_lock_irqsave(&cpu_asid_lock, flags); > > > + local_irq_save(flags); > > > + arch_spin_lock(&cpu_asid_lock.raw_lock); > > > /* Check that our ASID belongs to the current generation. */ > > > asid = atomic64_read(&mm->context.id); > > > if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) { > > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > > > > atomic64_set(&per_cpu(active_asids, cpu), asid); > > > cpumask_set_cpu(cpu, mm_cpumask(mm)); > > > - raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); > > > + arch_spin_unlock(&cpu_asid_lock.raw_lock); > > > + local_irq_restore(flags); > > > > > > switch_mm_fastpath: > > > cpu_switch_mm(mm->pgd, mm); > > > > > > @Russell, what do you think? > > > > I think this is Will Deacon's code, so we ought to hear from Will... > > Thanks for adding me in. > > Using arch_spin_lock() really feels like a bodge to me. This code isn't > run only on the hot-unplug path, but rather this is part of switch_mm() > and we really should be able to have lockdep work properly there for > the usual case. > > Now, do we actually need to worry about the ASID when switching to the > init_mm? I'd have thought that would be confined to global (kernel) > mappings, so I wonder whether we could avoid this slow path code > altogether like we do on arm64 in __switch_mm(). But I must confess that > I don't recall the details of the pre-LPAE MMU configuration... As the init_mm shouldn't have any userspace mappings, isn't the ASID entirely redundant? Couldn't check_and_switch_context() just simply do the vmalloc seq check, set the reserved ASID, and then head to switch_mm_fastpath to call the mm switch code? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-12 22:39 ` Russell King (Oracle) @ 2024-03-13 0:32 ` Will Deacon 2024-03-13 9:58 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Will Deacon @ 2024-03-13 0:32 UTC (permalink / raw) To: Russell King (Oracle) Cc: Stefan Wiehler, Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Tue, Mar 12, 2024 at 10:39:30PM +0000, Russell King (Oracle) wrote: > On Tue, Mar 12, 2024 at 10:14:40PM +0000, Will Deacon wrote: > > On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote: > > > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote: > > > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c > > > > index 4204ffa2d104..4fc2c559f1b6 100644 > > > > --- a/arch/arm/mm/context.c > > > > +++ b/arch/arm/mm/context.c > > > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > > && atomic64_xchg(&per_cpu(active_asids, cpu), asid)) > > > > goto switch_mm_fastpath; > > > > > > > > - raw_spin_lock_irqsave(&cpu_asid_lock, flags); > > > > + local_irq_save(flags); > > > > + arch_spin_lock(&cpu_asid_lock.raw_lock); > > > > /* Check that our ASID belongs to the current generation. */ > > > > asid = atomic64_read(&mm->context.id); > > > > if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) { > > > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > > > > > > atomic64_set(&per_cpu(active_asids, cpu), asid); > > > > cpumask_set_cpu(cpu, mm_cpumask(mm)); > > > > - raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); > > > > + arch_spin_unlock(&cpu_asid_lock.raw_lock); > > > > + local_irq_restore(flags); > > > > > > > > switch_mm_fastpath: > > > > cpu_switch_mm(mm->pgd, mm); > > > > > > > > @Russell, what do you think? > > > > > > I think this is Will Deacon's code, so we ought to hear from Will... > > > > Thanks for adding me in. > > > > Using arch_spin_lock() really feels like a bodge to me. This code isn't > > run only on the hot-unplug path, but rather this is part of switch_mm() > > and we really should be able to have lockdep work properly there for > > the usual case. > > > > Now, do we actually need to worry about the ASID when switching to the > > init_mm? I'd have thought that would be confined to global (kernel) > > mappings, so I wonder whether we could avoid this slow path code > > altogether like we do on arm64 in __switch_mm(). But I must confess that > > I don't recall the details of the pre-LPAE MMU configuration... > > As the init_mm shouldn't have any userspace mappings, isn't the ASID > entirely redundant? Couldn't check_and_switch_context() just simply > do the vmalloc seq check, set the reserved ASID, and then head to > switch_mm_fastpath to call the mm switch code? Right, that's what I was thinking too, but I have some distant memories of the module space causing potential issues in some configurations. Does that ring a bell with you? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-13 0:32 ` Will Deacon @ 2024-03-13 9:58 ` Russell King (Oracle) 2024-08-09 13:37 ` Stefan Wiehler 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2024-03-13 9:58 UTC (permalink / raw) To: Will Deacon Cc: Stefan Wiehler, Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Wed, Mar 13, 2024 at 12:32:44AM +0000, Will Deacon wrote: > On Tue, Mar 12, 2024 at 10:39:30PM +0000, Russell King (Oracle) wrote: > > On Tue, Mar 12, 2024 at 10:14:40PM +0000, Will Deacon wrote: > > > On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote: > > > > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote: > > > > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c > > > > > index 4204ffa2d104..4fc2c559f1b6 100644 > > > > > --- a/arch/arm/mm/context.c > > > > > +++ b/arch/arm/mm/context.c > > > > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > > > && atomic64_xchg(&per_cpu(active_asids, cpu), asid)) > > > > > goto switch_mm_fastpath; > > > > > > > > > > - raw_spin_lock_irqsave(&cpu_asid_lock, flags); > > > > > + local_irq_save(flags); > > > > > + arch_spin_lock(&cpu_asid_lock.raw_lock); > > > > > /* Check that our ASID belongs to the current generation. */ > > > > > asid = atomic64_read(&mm->context.id); > > > > > if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) { > > > > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk) > > > > > > > > > > atomic64_set(&per_cpu(active_asids, cpu), asid); > > > > > cpumask_set_cpu(cpu, mm_cpumask(mm)); > > > > > - raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); > > > > > + arch_spin_unlock(&cpu_asid_lock.raw_lock); > > > > > + local_irq_restore(flags); > > > > > > > > > > switch_mm_fastpath: > > > > > cpu_switch_mm(mm->pgd, mm); > > > > > > > > > > @Russell, what do you think? > > > > > > > > I think this is Will Deacon's code, so we ought to hear from Will... > > > > > > Thanks for adding me in. > > > > > > Using arch_spin_lock() really feels like a bodge to me. This code isn't > > > run only on the hot-unplug path, but rather this is part of switch_mm() > > > and we really should be able to have lockdep work properly there for > > > the usual case. > > > > > > Now, do we actually need to worry about the ASID when switching to the > > > init_mm? I'd have thought that would be confined to global (kernel) > > > mappings, so I wonder whether we could avoid this slow path code > > > altogether like we do on arm64 in __switch_mm(). But I must confess that > > > I don't recall the details of the pre-LPAE MMU configuration... > > > > As the init_mm shouldn't have any userspace mappings, isn't the ASID > > entirely redundant? Couldn't check_and_switch_context() just simply > > do the vmalloc seq check, set the reserved ASID, and then head to > > switch_mm_fastpath to call the mm switch code? > > Right, that's what I was thinking too, but I have some distant memories > of the module space causing potential issues in some configurations. Does > that ring a bell with you? For 32-bit non-LPAE, I can't recall any issues, nor can I think of any because module space is just another few entries in the L1 page tables below the direct mapping (which isn't a problem because we don't use anything in hardware to separate the kernel space from user space in the page tables.) TTBCR is set to 0. For LPAE, there may be issues there because TTBR0 and TTBR1 are both used, and TTBCR.T1SZ is set non-zero to: arch/arm/include/asm/pgtable-3level-hwdef.h:#define TTBR1_SIZE (((PAGE_OFFSET >> 30) - 1) << 16) so I suspect that's where the problems may lie - but then module mappings should also exist in init_mm (swapper_pg_dir) and should be global. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-13 9:58 ` Russell King (Oracle) @ 2024-08-09 13:37 ` Stefan Wiehler 2024-08-09 13:48 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Stefan Wiehler @ 2024-08-09 13:37 UTC (permalink / raw) To: Russell King (Oracle), Will Deacon Cc: Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel I'd like to revive the discussion and quickly summarize our options to avoid a false-positive Lockdep-RCU splat when check_and_switch_context() acquires the ASID spinlock: 1. Briefly report CPU as online again via rcutree_report_cpu_{starting|dead}() 2. Replace raw_spin_lock*() by arch_spin_lock() 3. Remove ASID spinlock Both 1. and 2. are workarounds with different pros and cons/proponents. Regarding 3., the last comment form Russell is: > For 32-bit non-LPAE, I can't recall any issues, nor can I think of any > because module space is just another few entries in the L1 page tables > below the direct mapping (which isn't a problem because we don't use > anything in hardware to separate the kernel space from user space in > the page tables.) TTBCR is set to 0. > > For LPAE, there may be issues there because TTBR0 and TTBR1 are both > used, and TTBCR.T1SZ is set non-zero to: > > arch/arm/include/asm/pgtable-3level-hwdef.h:#define TTBR1_SIZE (((PAGE_OFFSET >> 30) - 1) << 16) > > so I suspect that's where the problems may lie - but then module > mappings should also exist in init_mm (swapper_pg_dir) and should > be global. Unfortunately I'm don't feel qualified to contribute to the discussion on option 3. Russell and Will, would you be able to spare some time to drive this further? Otherwise I would propose to make a decision on going for option 1 or 2. Kind regards, Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-08-09 13:37 ` Stefan Wiehler @ 2024-08-09 13:48 ` Russell King (Oracle) 2024-08-09 14:06 ` Stefan Wiehler 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2024-08-09 13:48 UTC (permalink / raw) To: Stefan Wiehler Cc: Will Deacon, Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Fri, Aug 09, 2024 at 03:37:38PM +0200, Stefan Wiehler wrote: > Unfortunately I'm don't feel qualified to contribute to the discussion > on option 3. Russell and Will, would you be able to spare some time to > drive this further? Otherwise I would propose to make a decision on > going for option 1 or 2. Highly unlikely that I'm going to have any time what so ever as I'm undergoing a series of operations which are affecting my eye sight (over the next 3/4 days, I have only one usable eye while the vision in the other settles post-op.) I have little idea at the moment how this is going to pan out, or how long I can spend in front of the screen. Worst case, it could be well into September before I'm able to resume any serious kernel work. E.g. if the brain is unable to resolve the 3D imagine from both eyes next week. Then there'll be the second eye to be done at the earliest 30th August. Then there's waiting for the vision to settle in that eye. Then there's working out whether I need sight correction to see the screen and possibly getting a new set of lenses for the glasses. So... I think it's best to count me out on being able to solve this problem. Sorry. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-08-09 13:48 ` Russell King (Oracle) @ 2024-08-09 14:06 ` Stefan Wiehler 0 siblings, 0 replies; 20+ messages in thread From: Stefan Wiehler @ 2024-08-09 14:06 UTC (permalink / raw) To: Russell King (Oracle) Cc: Will Deacon, Joel Fernandes, Paul E. McKenney, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel > Highly unlikely that I'm going to have any time what so ever as I'm > undergoing a series of operations which are affecting my eye sight > (over the next 3/4 days, I have only one usable eye while the vision > in the other settles post-op.) I have little idea at the moment how > this is going to pan out, or how long I can spend in front of the > screen. > > Worst case, it could be well into September before I'm able to resume > any serious kernel work. E.g. if the brain is unable to resolve the > 3D imagine from both eyes next week. Then there'll be the second eye > to be done at the earliest 30th August. Then there's waiting for the > vision to settle in that eye. Then there's working out whether I > need sight correction to see the screen and possibly getting a new > set of lenses for the glasses. > > So... I think it's best to count me out on being able to solve this > problem. > > Sorry. Oh wow. I might be just a stranger from the Internet, but... I sincerely wish you all the best for your surgery and the recovery process. It must be very stressful as a maintainer to get bombarded with all these requests - thank you for your great job and prioritize your health! Kind regards, Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-07 16:09 [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat Stefan Wiehler ` (2 preceding siblings ...) 2024-03-08 14:57 ` Joel Fernandes @ 2024-03-11 16:08 ` Russell King (Oracle) 2024-03-11 16:17 ` Stefan Wiehler 3 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2024-03-11 16:08 UTC (permalink / raw) To: Stefan Wiehler Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote: > With CONFIG_PROVE_RCU_LIST=y and by executing > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > one can trigger the following Lockdep-RCU splat on ARM: > > ============================= > WARNING: suspicious RCU usage > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > ----------------------------- > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 1 > no locks held by swapper/1/0. > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > Hardware name: Allwinner sun8i Family > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x60/0x90 > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > __lock_acquire from lock_acquire+0x10c/0x348 > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > arch_cpu_idle_dead from do_idle+0xbc/0x138 > do_idle from cpu_startup_entry+0x28/0x2c > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > secondary_start_kernel from 0x401018a0 > > The CPU is already reported as offline from RCU perspective in > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the > ASID spinlock. > > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as > online again while the spinlock is held. > > Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com> So what do I do about this? I see you submitted this to the patch system on the 8th March, but proposed a different approach on the 9th March. I don't see evidence that Paul is happy with the original approach either and there's no ack/r-b's on anything here. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-11 16:08 ` Russell King (Oracle) @ 2024-03-11 16:17 ` Stefan Wiehler 2024-03-11 19:01 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Stefan Wiehler @ 2024-03-11 16:17 UTC (permalink / raw) To: Russell King (Oracle) Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel > So what do I do about this? I see you submitted this to the patch system > on the 8th March, but proposed a different approach on the 9th March. I > don't see evidence that Paul is happy with the original approach either > and there's no ack/r-b's on anything here. I think we need to wait for feedback from Will Deacon regarding the new arch_spin_lock() based approach. So please abandon the original proposal in the patch system, at least for now… _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat 2024-03-11 16:17 ` Stefan Wiehler @ 2024-03-11 19:01 ` Paul E. McKenney 0 siblings, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2024-03-11 19:01 UTC (permalink / raw) To: Stefan Wiehler Cc: Russell King (Oracle), Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, linux-arm-kernel, rcu, linux-kernel On Mon, Mar 11, 2024 at 05:17:43PM +0100, Stefan Wiehler wrote: > > So what do I do about this? I see you submitted this to the patch system > > on the 8th March, but proposed a different approach on the 9th March. I > > don't see evidence that Paul is happy with the original approach either > > and there's no ack/r-b's on anything here. > > I think we need to wait for feedback from Will Deacon regarding the new > arch_spin_lock() based approach. So please abandon the original proposal in the > patch system, at least for now… I prefer the arch_spin_lock() version, but the other approach also works. I agree with Stefan that it would also be good to get Will's feedback. The downside of the arch_spin_lock() approach is that, in theory, it prevents lockdep from seeing deadlocks involving that lock acquisition. But in practice, that is a problem only if additional locks can be acquired while that lock is held, and there are no such additional locks, right? The downside of the original approach is that it further complicates RCU's interaction with offline CPUs, especially should people copy that approach in other pieces of offline code. So, again, I prefer the arch_spin_lock() approach. Thanx, Paul _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-08-09 14:07 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-07 16:09 [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat Stefan Wiehler 2024-03-07 17:45 ` Paul E. McKenney 2024-03-07 17:49 ` Paul E. McKenney 2024-03-07 18:54 ` Russell King (Oracle) 2024-03-07 19:08 ` Paul E. McKenney 2024-03-07 18:49 ` Boqun Feng 2024-03-08 10:20 ` Stefan Wiehler 2024-03-08 14:57 ` Joel Fernandes 2024-03-09 7:45 ` Stefan Wiehler 2024-03-09 9:57 ` Russell King (Oracle) 2024-03-12 22:14 ` Will Deacon 2024-03-12 22:39 ` Russell King (Oracle) 2024-03-13 0:32 ` Will Deacon 2024-03-13 9:58 ` Russell King (Oracle) 2024-08-09 13:37 ` Stefan Wiehler 2024-08-09 13:48 ` Russell King (Oracle) 2024-08-09 14:06 ` Stefan Wiehler 2024-03-11 16:08 ` Russell King (Oracle) 2024-03-11 16:17 ` Stefan Wiehler 2024-03-11 19:01 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).