* [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
@ 2026-06-04 18:24 Jonas Jelonek
2026-06-05 3:01 ` Huacai Chen
2026-06-05 6:42 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 9+ messages in thread
From: Jonas Jelonek @ 2026-06-04 18:24 UTC (permalink / raw)
To: Thomas Bogendoerfer, linux-mips
Cc: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
Jonas Jelonek, stable
smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
marks the CPU offline for the scheduler via set_cpu_online(false) but
never informs RCU, so RCU keeps expecting a quiescent state from CPUs
that are now spinning forever with interrupts disabled.
As long as nothing waits for an RCU grace period after smp_send_stop()
this is harmless, which is why it went unnoticed. Since commit
91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
however, irq_work_sync() calls synchronize_rcu() on architectures without
an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
false. That is the asm-generic default used by MIPS. Any irq_work_sync()
issued in the reboot/shutdown path after smp_send_stop() then blocks on
a grace period that can never complete, hanging the reboot:
WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
...
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu: Offline CPU 1 blocking current GP.
rcu: Offline CPU 2 blocking current GP.
rcu: Offline CPU 3 blocking current GP.
This issue popped up during kernel bump downstream in OpenWrt from
6.18.33 to 6.18.34, since the suspected change has been backported to
6.18 stable branch [1].
Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
waiting on the parked CPUs and grace periods can still complete.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
CC: stable@vger.kernel.org
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 4868e79f3b30..0f28b4a62e72 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -20,6 +20,7 @@
#include <linux/sched/mm.h>
#include <linux/cpumask.h>
#include <linux/cpu.h>
+#include <linux/rcupdate.h>
#include <linux/err.h>
#include <linux/ftrace.h>
#include <linux/irqdomain.h>
@@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
set_cpu_online(smp_processor_id(), false);
calculate_cpu_foreign_map();
local_irq_disable();
+ rcutree_report_cpu_dead();
while (1);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-04 18:24 [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu() Jonas Jelonek
@ 2026-06-05 3:01 ` Huacai Chen
2026-06-05 6:56 ` Jonas Jelonek
2026-06-05 6:42 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 9+ messages in thread
From: Huacai Chen @ 2026-06-05 3:01 UTC (permalink / raw)
To: Jonas Jelonek
Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
linux-rt-devel, linux-kernel, stable
Hi, Jonas,
On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
> marks the CPU offline for the scheduler via set_cpu_online(false) but
> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
> that are now spinning forever with interrupts disabled.
>
> As long as nothing waits for an RCU grace period after smp_send_stop()
> this is harmless, which is why it went unnoticed. Since commit
> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> however, irq_work_sync() calls synchronize_rcu() on architectures without
> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
> issued in the reboot/shutdown path after smp_send_stop() then blocks on
> a grace period that can never complete, hanging the reboot:
>
> WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
> ...
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> rcu: Offline CPU 1 blocking current GP.
> rcu: Offline CPU 2 blocking current GP.
> rcu: Offline CPU 3 blocking current GP.
>
> This issue popped up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, since the suspected change has been backported to
> 6.18 stable branch [1].
Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
on PREEMPT_RT") has been backported to as early as 6.1 LTS.
>
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> waiting on the parked CPUs and grace periods can still complete.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>
> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> CC: stable@vger.kernel.org
> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 4868e79f3b30..0f28b4a62e72 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -20,6 +20,7 @@
> #include <linux/sched/mm.h>
> #include <linux/cpumask.h>
> #include <linux/cpu.h>
> +#include <linux/rcupdate.h>
> #include <linux/err.h>
> #include <linux/ftrace.h>
> #include <linux/irqdomain.h>
> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
> set_cpu_online(smp_processor_id(), false);
> calculate_cpu_foreign_map();
> local_irq_disable();
> + rcutree_report_cpu_dead();
I'm not sure but maybe it is better to before local_irq_disable()?
Huacai
> while (1);
> }
>
> --
> 2.51.0
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-05 3:01 ` Huacai Chen
@ 2026-06-05 6:56 ` Jonas Jelonek
0 siblings, 0 replies; 9+ messages in thread
From: Jonas Jelonek @ 2026-06-05 6:56 UTC (permalink / raw)
To: Huacai Chen
Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
linux-rt-devel, linux-kernel, stable
Hi Huacai,
On 05.06.26 05:01, Huacai Chen wrote:
> Hi, Jonas,
>
> On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
>> marks the CPU offline for the scheduler via set_cpu_online(false) but
>> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
>> that are now spinning forever with interrupts disabled.
>>
>> As long as nothing waits for an RCU grace period after smp_send_stop()
>> this is harmless, which is why it went unnoticed. Since commit
>> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> however, irq_work_sync() calls synchronize_rcu() on architectures without
>> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
>> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
>> issued in the reboot/shutdown path after smp_send_stop() then blocks on
>> a grace period that can never complete, hanging the reboot:
>>
>> WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
>> ...
>> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>> rcu: Offline CPU 1 blocking current GP.
>> rcu: Offline CPU 2 blocking current GP.
>> rcu: Offline CPU 3 blocking current GP.
>>
>> This issue popped up during kernel bump downstream in OpenWrt from
>> 6.18.33 to 6.18.34, since the suspected change has been backported to
>> 6.18 stable branch [1].
> Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
> on PREEMPT_RT") has been backported to as early as 6.1 LTS.
Yes, as also pointed out by Sebastian I should adjust this paragraph
to be more accurate.
>> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
>> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
>> waiting on the parked CPUs and grace periods can still complete.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>>
>> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> CC: stable@vger.kernel.org
>> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>>
>> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
>> index 4868e79f3b30..0f28b4a62e72 100644
>> --- a/arch/mips/kernel/smp.c
>> +++ b/arch/mips/kernel/smp.c
>> @@ -20,6 +20,7 @@
>> #include <linux/sched/mm.h>
>> #include <linux/cpumask.h>
>> #include <linux/cpu.h>
>> +#include <linux/rcupdate.h>
>> #include <linux/err.h>
>> #include <linux/ftrace.h>
>> #include <linux/irqdomain.h>
>> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
>> set_cpu_online(smp_processor_id(), false);
>> calculate_cpu_foreign_map();
>> local_irq_disable();
>> + rcutree_report_cpu_dead();
> I'm not sure but maybe it is better to before local_irq_disable()?
rcutree_report_cpu_dead() starts with lockdep_assert_irqs_disabled() so
it needs IRQs disabled already.
> Huacai
>> while (1);
>> }
>>
>> --
>> 2.51.0
>>
>>
Best,
Jonas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-04 18:24 [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu() Jonas Jelonek
2026-06-05 3:01 ` Huacai Chen
@ 2026-06-05 6:42 ` Sebastian Andrzej Siewior
2026-06-05 7:12 ` Jonas Jelonek
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-05 6:42 UTC (permalink / raw)
To: Jonas Jelonek
Cc: Thomas Bogendoerfer, linux-mips, Clark Williams, Steven Rostedt,
Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
stable
On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote:
…
> This issue popped up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, since the suspected change has been backported to
> 6.18 stable branch [1].
I would avoid the link and simply write after the backport of the patch
or so.
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> waiting on the parked CPUs and grace periods can still complete.
This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
something else missing/ different?
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-05 6:42 ` Sebastian Andrzej Siewior
@ 2026-06-05 7:12 ` Jonas Jelonek
2026-06-05 10:34 ` Sebastian Andrzej Siewior
2026-06-05 14:00 ` Huacai Chen
0 siblings, 2 replies; 9+ messages in thread
From: Jonas Jelonek @ 2026-06-05 7:12 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Thomas Bogendoerfer, linux-mips, Clark Williams, Steven Rostedt,
Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
stable
Hi Sebastian,
I'm not an expert in this area so please correct me if some claim or
explanation is wrong.
On 05.06.26 08:42, Sebastian Andrzej Siewior wrote:
> On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote:
> …
>> This issue popped up during kernel bump downstream in OpenWrt from
>> 6.18.33 to 6.18.34, since the suspected change has been backported to
>> 6.18 stable branch [1].
> I would avoid the link and simply write after the backport of the patch
> or so.
Fine with that, I can adjust that in a v2.
>> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
>> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
>> waiting on the parked CPUs and grace periods can still complete.
> This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> something else missing/ different?
Those seem to be two different paths. To be honest I'm not confident
under which circumstances which of those paths is used to take down
a CPU. In my case, issuing a reboot command reaches smp_send_stop()
where the issue explained in the patch message then happens.
> Sebastian
Best,
Jonas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-05 7:12 ` Jonas Jelonek
@ 2026-06-05 10:34 ` Sebastian Andrzej Siewior
2026-06-05 11:12 ` Jonas Jelonek
2026-06-05 14:00 ` Huacai Chen
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-05 10:34 UTC (permalink / raw)
To: Jonas Jelonek
Cc: Thomas Bogendoerfer, linux-mips, Clark Williams, Steven Rostedt,
Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
stable
On 2026-06-05 09:12:09 [+0200], Jonas Jelonek wrote:
> Hi Sebastian,
Hi,
> >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> >> waiting on the parked CPUs and grace periods can still complete.
> > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> > something else missing/ different?
>
> Those seem to be two different paths. To be honest I'm not confident
> under which circumstances which of those paths is used to take down
> a CPU. In my case, issuing a reboot command reaches smp_send_stop()
> where the issue explained in the patch message then happens.
>
Does
echo 0 > /sys/devices/system/cpu/cpu1/online
lead to the same problem?
I missed that arm64 has also this but only if the online path fails kind
of early, see
04e613ded8c26 ("arm64: smp: Tell RCU about CPUs that fail to come online")
so this not the "normal" case but an exception. Mips seems to be doing
something different here. I am not sure if this is the only thing that
is missing.
> Best,
> Jonas
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-05 10:34 ` Sebastian Andrzej Siewior
@ 2026-06-05 11:12 ` Jonas Jelonek
2026-06-08 8:25 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Jonas Jelonek @ 2026-06-05 11:12 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Thomas Bogendoerfer, linux-mips, Clark Williams, Steven Rostedt,
Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
stable
Hi,
On 05.06.26 12:34, Sebastian Andrzej Siewior wrote:
> Does
> echo 0 > /sys/devices/system/cpu/cpu1/online
>
> lead to the same problem?
Funny, my device doesn't have this 'online' file, neither for the
other CPUs. So it seems this CPU hotplug isn't supported/used here?
Or am I missing a Kconfig option for that?
I'm working on a Realtek RTL931x SoC here, it has MIPS interAptiv
cores. I can provide more information if needed.
> I missed that arm64 has also this but only if the online path fails kind
> of early, see
> 04e613ded8c26 ("arm64: smp: Tell RCU about CPUs that fail to come online")
>
> so this not the "normal" case but an exception. Mips seems to be doing
> something different here. I am not sure if this is the only thing that
> is missing.
>
>> Best,
>> Jonas
> Sebastian
Best,
Jonas
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-05 11:12 ` Jonas Jelonek
@ 2026-06-08 8:25 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-08 8:25 UTC (permalink / raw)
To: Jonas Jelonek
Cc: Thomas Bogendoerfer, linux-mips, Clark Williams, Steven Rostedt,
Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
stable
On 2026-06-05 13:12:38 [+0200], Jonas Jelonek wrote:
> Hi,
Hi,
> On 05.06.26 12:34, Sebastian Andrzej Siewior wrote:
> > Does
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> >
> > lead to the same problem?
>
> Funny, my device doesn't have this 'online' file, neither for the
> other CPUs. So it seems this CPU hotplug isn't supported/used here?
> Or am I missing a Kconfig option for that?
>
> I'm working on a Realtek RTL931x SoC here, it has MIPS interAptiv
> cores. I can provide more information if needed.
looking at this again it sort of makes sense. The arm64 case is an
exception - not the default. It appears on mips it is the default
shutdown action and probably reboot.
I would have to take another look if mips is really the only arch doing
this or if other are affected, too.
No objections from my side. If you could update the commit message
noting that arm64 is not the default but an exception (the CPU was
going online and is aborting) and that MIPS shutdowns all CPUs but does
not use CPU-hotplug mechanism for doing so.
> Best,
> Jonas
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
2026-06-05 7:12 ` Jonas Jelonek
2026-06-05 10:34 ` Sebastian Andrzej Siewior
@ 2026-06-05 14:00 ` Huacai Chen
1 sibling, 0 replies; 9+ messages in thread
From: Huacai Chen @ 2026-06-05 14:00 UTC (permalink / raw)
To: Jonas Jelonek
Cc: Sebastian Andrzej Siewior, Thomas Bogendoerfer, linux-mips,
Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
linux-rt-devel, linux-kernel, stable
On Fri, Jun 5, 2026 at 3:28 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> Hi Sebastian,
>
> I'm not an expert in this area so please correct me if some claim or
> explanation is wrong.
>
> On 05.06.26 08:42, Sebastian Andrzej Siewior wrote:
> > On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote:
> > …
> >> This issue popped up during kernel bump downstream in OpenWrt from
> >> 6.18.33 to 6.18.34, since the suspected change has been backported to
> >> 6.18 stable branch [1].
> > I would avoid the link and simply write after the backport of the patch
> > or so.
>
> Fine with that, I can adjust that in a v2.
>
> >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> >> waiting on the parked CPUs and grace periods can still complete.
> > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> > something else missing/ different?
>
> Those seem to be two different paths. To be honest I'm not confident
> under which circumstances which of those paths is used to take down
> a CPU. In my case, issuing a reboot command reaches smp_send_stop()
> where the issue explained in the patch message then happens.
I think I know the reason. Halt/poweroff/reboot doesn't call cpu
hotplug functions to disable non-boot cpus, instead it only calls
migrate_to_reboot_cpu() and then goto the arch-specific code. And
arch-specific code also doesn't call cpu hotplug functions, it only
calls smp_send_stop() to send IPIs to non-boot cpus, then non-boot
cpus call stop_this_cpu(). This is why stop_this_cpu() needs
rcutree_report_cpu_dead().
Huacai
>
> > Sebastian
>
> Best,
> Jonas
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-08 8:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 18:24 [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu() Jonas Jelonek
2026-06-05 3:01 ` Huacai Chen
2026-06-05 6:56 ` Jonas Jelonek
2026-06-05 6:42 ` Sebastian Andrzej Siewior
2026-06-05 7:12 ` Jonas Jelonek
2026-06-05 10:34 ` Sebastian Andrzej Siewior
2026-06-05 11:12 ` Jonas Jelonek
2026-06-08 8:25 ` Sebastian Andrzej Siewior
2026-06-05 14:00 ` Huacai Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox