Linux MIPS Architecture development
 help / color / mirror / Atom feed
* [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
@ 2026-06-08  9:37 Jonas Jelonek
  2026-06-10  6:05 ` Huacai Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Jonas Jelonek @ 2026-06-08  9:37 UTC (permalink / raw)
  To: Thomas Bogendoerfer, linux-mips
  Cc: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Thomas Gleixner, Jiayuan Chen, linux-rt-devel, linux-kernel,
	Jonas Jelonek, stable

smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
marks the CPU offline for the scheduler via set_cpu_online(false) but
never informs RCU, so RCU keeps expecting a quiescent state from CPUs
that are now spinning forever with interrupts disabled.

As long as nothing waits for an RCU grace period after smp_send_stop()
this is harmless, which is why it went unnoticed. Since commit
91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
however, irq_work_sync() calls synchronize_rcu() on architectures without
an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
false. That is the asm-generic default used by MIPS. Any irq_work_sync()
issued in the reboot/shutdown path after smp_send_stop() then blocks on
a grace period that can never complete, hanging the reboot:

  WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
  ...
  rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
  rcu: Offline CPU 1 blocking current GP.
  rcu: Offline CPU 2 blocking current GP.
  rcu: Offline CPU 3 blocking current GP.

This issue was noticed on several Realtek MIPS switch SoCs (MIPS
interAptiv) and came up during kernel bump downstream in OpenWrt from
6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable
branch. The patch also has been backported all the way back to 6.1.

Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs
and grace periods can still complete. MIPS shuts down all CPUs here
without going through the CPU-hotplug mechanism, so this report is not
otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug
offline path is not unprecedented: arm64 does the same in cpu_die_early().
There it is an exception for a CPU that was coming online and is aborting
bringup, rather than the default shutdown action as on MIPS.

Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
CC: stable@vger.kernel.org
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 4868e79f3b30..0f28b4a62e72 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/sched/mm.h>
 #include <linux/cpumask.h>
 #include <linux/cpu.h>
+#include <linux/rcupdate.h>
 #include <linux/err.h>
 #include <linux/ftrace.h>
 #include <linux/irqdomain.h>
@@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
 	set_cpu_online(smp_processor_id(), false);
 	calculate_cpu_foreign_map();
 	local_irq_disable();
+	rcutree_report_cpu_dead();
 	while (1);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
  2026-06-08  9:37 [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu() Jonas Jelonek
@ 2026-06-10  6:05 ` Huacai Chen
  2026-06-15  7:00   ` Jonas Jelonek
  0 siblings, 1 reply; 7+ messages in thread
From: Huacai Chen @ 2026-06-10  6:05 UTC (permalink / raw)
  To: Jonas Jelonek
  Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
	linux-rt-devel, linux-kernel, stable

Hi, Jonas,

On Mon, Jun 8, 2026 at 5:37 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
> marks the CPU offline for the scheduler via set_cpu_online(false) but
> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
> that are now spinning forever with interrupts disabled.
>
> As long as nothing waits for an RCU grace period after smp_send_stop()
> this is harmless, which is why it went unnoticed. Since commit
> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> however, irq_work_sync() calls synchronize_rcu() on architectures without
> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
> issued in the reboot/shutdown path after smp_send_stop() then blocks on
> a grace period that can never complete, hanging the reboot:
>
>   WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
>   ...
>   rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>   rcu: Offline CPU 1 blocking current GP.
>   rcu: Offline CPU 2 blocking current GP.
>   rcu: Offline CPU 3 blocking current GP.
In theory LoongArch has the same problem, but I cannot reproduce,
should I enable PREEMPT_RT? Or there are some special configurations?

Huacai

>
> This issue was noticed on several Realtek MIPS switch SoCs (MIPS
> interAptiv) and came up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable
> branch. The patch also has been backported all the way back to 6.1.
>
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs
> and grace periods can still complete. MIPS shuts down all CPUs here
> without going through the CPU-hotplug mechanism, so this report is not
> otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug
> offline path is not unprecedented: arm64 does the same in cpu_die_early().
> There it is an exception for a CPU that was coming online and is aborting
> bringup, rather than the default shutdown action as on MIPS.
>
> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> CC: stable@vger.kernel.org
> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 4868e79f3b30..0f28b4a62e72 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -20,6 +20,7 @@
>  #include <linux/sched/mm.h>
>  #include <linux/cpumask.h>
>  #include <linux/cpu.h>
> +#include <linux/rcupdate.h>
>  #include <linux/err.h>
>  #include <linux/ftrace.h>
>  #include <linux/irqdomain.h>
> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
>         set_cpu_online(smp_processor_id(), false);
>         calculate_cpu_foreign_map();
>         local_irq_disable();
> +       rcutree_report_cpu_dead();
>         while (1);
>  }
>
> --
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
  2026-06-10  6:05 ` Huacai Chen
@ 2026-06-15  7:00   ` Jonas Jelonek
  2026-06-15  7:09     ` Huacai Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Jonas Jelonek @ 2026-06-15  7:00 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
	linux-rt-devel, linux-kernel, stable

Hi Huacai,

sorry for the reply delay.

On 10.06.26 08:05, Huacai Chen wrote:
> [...]
> In theory LoongArch has the same problem, but I cannot reproduce,
> should I enable PREEMPT_RT? Or there are some special configurations?

Sadly I cannot help with that. For MIPS, this seems to be the default
behavior.

> Huacai
>
>> This issue was noticed on several Realtek MIPS switch SoCs (MIPS
>> interAptiv) and came up during kernel bump downstream in OpenWrt from
>> 6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable
>> branch. The patch also has been backported all the way back to 6.1.
>>
>> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
>> generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs
>> and grace periods can still complete. MIPS shuts down all CPUs here
>> without going through the CPU-hotplug mechanism, so this report is not
>> otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug
>> offline path is not unprecedented: arm64 does the same in cpu_die_early().
>> There it is an exception for a CPU that was coming online and is aborting
>> bringup, rather than the default shutdown action as on MIPS.
>>
>> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> CC: stable@vger.kernel.org
>> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>>
>> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
>> index 4868e79f3b30..0f28b4a62e72 100644
>> --- a/arch/mips/kernel/smp.c
>> +++ b/arch/mips/kernel/smp.c
>> @@ -20,6 +20,7 @@
>>  #include <linux/sched/mm.h>
>>  #include <linux/cpumask.h>
>>  #include <linux/cpu.h>
>> +#include <linux/rcupdate.h>
>>  #include <linux/err.h>
>>  #include <linux/ftrace.h>
>>  #include <linux/irqdomain.h>
>> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
>>         set_cpu_online(smp_processor_id(), false);
>>         calculate_cpu_foreign_map();
>>         local_irq_disable();
>> +       rcutree_report_cpu_dead();
>>         while (1);
>>  }
>>
>> --
>> 2.51.0
>>
>>

Best,
Jonas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
  2026-06-15  7:00   ` Jonas Jelonek
@ 2026-06-15  7:09     ` Huacai Chen
  2026-06-15  7:16       ` Jonas Jelonek
  0 siblings, 1 reply; 7+ messages in thread
From: Huacai Chen @ 2026-06-15  7:09 UTC (permalink / raw)
  To: Jonas Jelonek
  Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
	linux-rt-devel, linux-kernel, stable

On Mon, Jun 15, 2026 at 3:00 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> Hi Huacai,
>
> sorry for the reply delay.
>
> On 10.06.26 08:05, Huacai Chen wrote:
> > [...]
> > In theory LoongArch has the same problem, but I cannot reproduce,
> > should I enable PREEMPT_RT? Or there are some special configurations?
>
> Sadly I cannot help with that. For MIPS, this seems to be the default
> behavior.
This patch fixes 91840be8f710, and 91840be8f710 adds synchronize_rcu()
in irq_work_sync(). Your problem is caused by this synchronize_rcu(),
right?

However, synchronize_rcu() only gets called in the
IS_ENABLED(CONFIG_PREEMPT_RT) case, so I think your configuration
needs PREEMPT_RT, right?

You said this is the default behavior, but PREEMPT_RT is not enabled by default.

Huacai

>
> > Huacai
> >
> >> This issue was noticed on several Realtek MIPS switch SoCs (MIPS
> >> interAptiv) and came up during kernel bump downstream in OpenWrt from
> >> 6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable
> >> branch. The patch also has been backported all the way back to 6.1.
> >>
> >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> >> generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs
> >> and grace periods can still complete. MIPS shuts down all CPUs here
> >> without going through the CPU-hotplug mechanism, so this report is not
> >> otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug
> >> offline path is not unprecedented: arm64 does the same in cpu_die_early().
> >> There it is an exception for a CPU that was coming online and is aborting
> >> bringup, rather than the default shutdown action as on MIPS.
> >>
> >> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> >> CC: stable@vger.kernel.org
> >> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
> >>
> >> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> >> index 4868e79f3b30..0f28b4a62e72 100644
> >> --- a/arch/mips/kernel/smp.c
> >> +++ b/arch/mips/kernel/smp.c
> >> @@ -20,6 +20,7 @@
> >>  #include <linux/sched/mm.h>
> >>  #include <linux/cpumask.h>
> >>  #include <linux/cpu.h>
> >> +#include <linux/rcupdate.h>
> >>  #include <linux/err.h>
> >>  #include <linux/ftrace.h>
> >>  #include <linux/irqdomain.h>
> >> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
> >>         set_cpu_online(smp_processor_id(), false);
> >>         calculate_cpu_foreign_map();
> >>         local_irq_disable();
> >> +       rcutree_report_cpu_dead();
> >>         while (1);
> >>  }
> >>
> >> --
> >> 2.51.0
> >>
> >>
>
> Best,
> Jonas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
  2026-06-15  7:09     ` Huacai Chen
@ 2026-06-15  7:16       ` Jonas Jelonek
  2026-06-15  7:30         ` Huacai Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Jonas Jelonek @ 2026-06-15  7:16 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
	linux-rt-devel, linux-kernel, stable

Hi Huacai,

On 15.06.26 09:09, Huacai Chen wrote:
> On Mon, Jun 15, 2026 at 3:00 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>> Hi Huacai,
>>
>> sorry for the reply delay.
>>
>> On 10.06.26 08:05, Huacai Chen wrote:
>>> [...]
>>> In theory LoongArch has the same problem, but I cannot reproduce,
>>> should I enable PREEMPT_RT? Or there are some special configurations?
>> Sadly I cannot help with that. For MIPS, this seems to be the default
>> behavior.
> This patch fixes 91840be8f710, and 91840be8f710 adds synchronize_rcu()
> in irq_work_sync(). Your problem is caused by this synchronize_rcu(),
> right?

Yes it is.

> However, synchronize_rcu() only gets called in the
> IS_ENABLED(CONFIG_PREEMPT_RT) case, so I think your configuration
> needs PREEMPT_RT, right?
>
> You said this is the default behavior, but PREEMPT_RT is not enabled by default.

The condition where this is added has two parts, see [1]. While PREEMPT_RT
isn't active for MIPS, arch_irq_work_has_interrupt gives false for MIPS (since
there is no implementation and it falls back to the generic one). This then
also calls synchronize_rcu.

> Huacai
>

Best,
Jonas

[1] https://elixir.bootlin.com/linux/v7.1-rc7/source/kernel/irq_work.c#L291-L302

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
  2026-06-15  7:16       ` Jonas Jelonek
@ 2026-06-15  7:30         ` Huacai Chen
  2026-06-15  7:40           ` Jonas Jelonek
  0 siblings, 1 reply; 7+ messages in thread
From: Huacai Chen @ 2026-06-15  7:30 UTC (permalink / raw)
  To: Jonas Jelonek
  Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
	linux-rt-devel, linux-kernel, stable

On Mon, Jun 15, 2026 at 3:16 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> Hi Huacai,
>
> On 15.06.26 09:09, Huacai Chen wrote:
> > On Mon, Jun 15, 2026 at 3:00 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
> >> Hi Huacai,
> >>
> >> sorry for the reply delay.
> >>
> >> On 10.06.26 08:05, Huacai Chen wrote:
> >>> [...]
> >>> In theory LoongArch has the same problem, but I cannot reproduce,
> >>> should I enable PREEMPT_RT? Or there are some special configurations?
> >> Sadly I cannot help with that. For MIPS, this seems to be the default
> >> behavior.
> > This patch fixes 91840be8f710, and 91840be8f710 adds synchronize_rcu()
> > in irq_work_sync(). Your problem is caused by this synchronize_rcu(),
> > right?
>
> Yes it is.
>
> > However, synchronize_rcu() only gets called in the
> > IS_ENABLED(CONFIG_PREEMPT_RT) case, so I think your configuration
> > needs PREEMPT_RT, right?
> >
> > You said this is the default behavior, but PREEMPT_RT is not enabled by default.
>
> The condition where this is added has two parts, see [1]. While PREEMPT_RT
> isn't active for MIPS, arch_irq_work_has_interrupt gives false for MIPS (since
> there is no implementation and it falls back to the generic one). This then
> also calls synchronize_rcu.
Sorry, this is my mistake, then what's your preemption model? There
are too many config files for MIPS now.

Huacai

>
> > Huacai
> >
>
> Best,
> Jonas
>
> [1] https://elixir.bootlin.com/linux/v7.1-rc7/source/kernel/irq_work.c#L291-L302

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
  2026-06-15  7:30         ` Huacai Chen
@ 2026-06-15  7:40           ` Jonas Jelonek
  0 siblings, 0 replies; 7+ messages in thread
From: Jonas Jelonek @ 2026-06-15  7:40 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Thomas Bogendoerfer, linux-mips, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Thomas Gleixner, Jiayuan Chen,
	linux-rt-devel, linux-kernel, stable

On 15.06.26 09:30, Huacai Chen wrote:
>> [...]
>>
>>> However, synchronize_rcu() only gets called in the
>>> IS_ENABLED(CONFIG_PREEMPT_RT) case, so I think your configuration
>>> needs PREEMPT_RT, right?
>>>
>>> You said this is the default behavior, but PREEMPT_RT is not enabled by default.
>> The condition where this is added has two parts, see [1]. While PREEMPT_RT
>> isn't active for MIPS, arch_irq_work_has_interrupt gives false for MIPS (since
>> there is no implementation and it falls back to the generic one). This then
>> also calls synchronize_rcu.
> Sorry, this is my mistake, then what's your preemption model? There
> are too many config files for MIPS now.

I'm using PREEMPT_NONE, apparently default for all targets in OpenWrt.

> Huacai
>
>>> Huacai
>>>
>> Best,
>> Jonas
>>
>> [1] https://elixir.bootlin.com/linux/v7.1-rc7/source/kernel/irq_work.c#L291-L302

Best,
Jonas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-15  7:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08  9:37 [PATCH v2] MIPS: smp: report dying CPU to RCU in stop_this_cpu() Jonas Jelonek
2026-06-10  6:05 ` Huacai Chen
2026-06-15  7:00   ` Jonas Jelonek
2026-06-15  7:09     ` Huacai Chen
2026-06-15  7:16       ` Jonas Jelonek
2026-06-15  7:30         ` Huacai Chen
2026-06-15  7:40           ` Jonas Jelonek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox