* Lockdep-RCU splat in ARM CPU hotplug
@ 2024-03-05 16:00 Stefan Wiehler
2024-03-05 21:04 ` Russell King (Oracle)
0 siblings, 1 reply; 3+ messages in thread
From: Stefan Wiehler @ 2024-03-05 16:00 UTC (permalink / raw)
To: Russell King (Oracle); +Cc: linux-arm-kernel
Hi,
With CONFIG_PROVE_RCU_LIST=y and by executing
$ echo 0 > /sys/devices/system/cpu/cpu1/online
one can trigger the following Lockdep-RCU splat on ARM (reproducible on an Orange Pi PC in QEMU):
=============================
WARNING: suspicious RCU usage
6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
-----------------------------
kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
Hardware name: Allwinner sun8i Family
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x60/0x90
dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
__lock_acquire from lock_acquire+0x10c/0x348
lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
_raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
arch_cpu_idle_dead from do_idle+0xbc/0x138
do_idle from cpu_startup_entry+0x28/0x2c
cpu_startup_entry from secondary_start_kernel+0x11c/0x124
secondary_start_kernel from 0x401018a0
Originally the splat was found on an AXM5516 with v5.15, so the issue presumably exists for quite some time already on all ARM boards.
Lockdep-RCU is triggered by this call of raw_spin_lock_irqsave() in check_and_switch_context() while the CPU is already marked offline: https://elixir.bootlin.com/linux/v6.8-rc7/source/arch/arm/mm/context.c#L257
On ARM64, we have cpu_die_early() calling rcutree_report_cpu_dead() which presumably prevents such a splat from occurring: https://elixir.bootlin.com/linux/v6.8-rc7/source/arch/arm64/kernel/smp.c#L412
Simply calling rcutree_report_cpu_dead() in arch_cpu_idle_dead() on ARM seems to have no effect though. As my understanding of the CPU hotplugging subsystem on ARM is a bit limited, I would appreciate some help here.
Thanks and regards,
Stefan
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Lockdep-RCU splat in ARM CPU hotplug
2024-03-05 16:00 Lockdep-RCU splat in ARM CPU hotplug Stefan Wiehler
@ 2024-03-05 21:04 ` Russell King (Oracle)
2024-03-07 16:01 ` Stefan Wiehler
0 siblings, 1 reply; 3+ messages in thread
From: Russell King (Oracle) @ 2024-03-05 21:04 UTC (permalink / raw)
To: Stefan Wiehler; +Cc: linux-arm-kernel
On Tue, Mar 05, 2024 at 05:00:06PM +0100, Stefan Wiehler wrote:
> Hi,
>
> With CONFIG_PROVE_RCU_LIST=y and by executing
>
> $ echo 0 > /sys/devices/system/cpu/cpu1/online
>
> one can trigger the following Lockdep-RCU splat on ARM (reproducible on an Orange Pi PC in QEMU):
>
> =============================
> WARNING: suspicious RCU usage
> 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
> -----------------------------
> kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
>
> other info that might help us debug this:
>
>
> RCU used illegally from offline CPU!
> rcu_scheduler_active = 2, debug_locks = 1
> no locks held by swapper/1/0.
>
> stack backtrace:
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
> Hardware name: Allwinner sun8i Family
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x60/0x90
> dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
> lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
> __lock_acquire from lock_acquire+0x10c/0x348
> lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
> _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
> check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
> arch_cpu_idle_dead from do_idle+0xbc/0x138
> do_idle from cpu_startup_entry+0x28/0x2c
> cpu_startup_entry from secondary_start_kernel+0x11c/0x124
> secondary_start_kernel from 0x401018a0
>
> Originally the splat was found on an AXM5516 with v5.15, so the issue presumably exists for quite some time already on all ARM boards.
>
> Lockdep-RCU is triggered by this call of raw_spin_lock_irqsave() in check_and_switch_context() while the CPU is already marked offline: https://elixir.bootlin.com/linux/v6.8-rc7/source/arch/arm/mm/context.c#L257
>
> On ARM64, we have cpu_die_early() calling rcutree_report_cpu_dead() which presumably prevents such a splat from occurring: https://elixir.bootlin.com/linux/v6.8-rc7/source/arch/arm64/kernel/smp.c#L412
>
> Simply calling rcutree_report_cpu_dead() in arch_cpu_idle_dead() on ARM seems to have no effect though. As my understanding of the CPU hotplugging subsystem on ARM is a bit limited, I would appreciate some help here.
So I think this is down to what check_and_switch_context() is doing.
Tracing through the paths, idle_task_exit() is called from the
arch_cpu_idle_dead() path on both 32-bit ARM and x86. So this is legal
to do (if it wasn't then x86 would have problems.)
idle_task_exit() calls switch_mm(), which is an arch-defined function,
and this calls check_and_switch_context(). Anything which switch_mm()
calls has to be safe to be called from the arch_cpu_idle_dead() path.
We can't get rid of the spinlock in check_and_switch_context() as that
is fundamental to how the ASID handling works - removing it would
cause all sorts of races.
I don't see how we can solve this at the moment, not helped by my
limited RCU knowledge.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Lockdep-RCU splat in ARM CPU hotplug
2024-03-05 21:04 ` Russell King (Oracle)
@ 2024-03-07 16:01 ` Stefan Wiehler
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Wiehler @ 2024-03-07 16:01 UTC (permalink / raw)
To: Russell King (Oracle); +Cc: linux-arm-kernel
> So I think this is down to what check_and_switch_context() is doing.
>
> Tracing through the paths, idle_task_exit() is called from the
> arch_cpu_idle_dead() path on both 32-bit ARM and x86. So this is legal
> to do (if it wasn't then x86 would have problems.)
>
> idle_task_exit() calls switch_mm(), which is an arch-defined function,
> and this calls check_and_switch_context(). Anything which switch_mm()
> calls has to be safe to be called from the arch_cpu_idle_dead() path.
>
> We can't get rid of the spinlock in check_and_switch_context() as that
> is fundamental to how the ASID handling works - removing it would
> cause all sorts of races.
>
> I don't see how we can solve this at the moment, not helped by my
> limited RCU knowledge.
I apparently misunderstood the warning; it is actually the other way
round in the sense that the CPU is reported offline from RCU perspective
too early, not too late.
I think the the false-positive Lockdep-RCU splat can be avoided by
briefly reporting the CPU as online again while the spinlock is held. I
will send a patch and also ask the RCU maintainers for a review.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-03-07 16:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05 16:00 Lockdep-RCU splat in ARM CPU hotplug Stefan Wiehler
2024-03-05 21:04 ` Russell King (Oracle)
2024-03-07 16:01 ` Stefan Wiehler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).