* [PATCH v3 1/3] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support
2026-06-15 2:35 [PATCH v3 0/3] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
@ 2026-06-15 2:35 ` Kiryl Shutsemau
2026-06-15 2:35 ` [PATCH v3 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-15 2:35 ` [PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
2 siblings, 0 replies; 6+ messages in thread
From: Kiryl Shutsemau @ 2026-06-15 2:35 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Add sdei_event_signal(), a thin wrapper over the SDEI_EVENT_SIGNAL call
(DEN0054) that makes the software-signalled event (event 0) pending on a
target PE -- delivered NMI-like even when that PE has interrupts masked.
It takes no locks, so it is safe to call from NMI / crash context.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
drivers/firmware/arm_sdei.c | 12 ++++++++++++
include/linux/arm_sdei.h | 6 ++++++
include/uapi/linux/arm_sdei.h | 1 +
3 files changed, 19 insertions(+)
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index f39ed7ba3a38..e3fd604d9894 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -339,6 +339,18 @@ static void _ipi_unmask_cpu(void *ignored)
sdei_unmask_local_cpu();
}
+/*
+ * Signal the software-signalled event (event 0) to @mpidr. Does nothing
+ * but the SMC -- no locks, no event lookup -- so it is safe from NMI /
+ * crash context (e.g. the cross-CPU NMI service).
+ */
+int sdei_event_signal(u32 event_num, u64 mpidr)
+{
+ return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_SIGNAL, event_num,
+ mpidr, 0, 0, 0, NULL);
+}
+NOKPROBE_SYMBOL(sdei_event_signal);
+
static void _ipi_private_reset(void *ignored)
{
int err;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index f652a5028b59..3f3ec01155e8 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -37,6 +37,12 @@ int sdei_event_unregister(u32 event_num);
int sdei_event_enable(u32 event_num);
int sdei_event_disable(u32 event_num);
+/*
+ * Signal the software-signalled event (event 0) to another PE, NMI-like.
+ * @mpidr is the target's MPIDR affinity.
+ */
+int sdei_event_signal(u32 event_num, u64 mpidr);
+
/* GHES register/unregister helpers */
int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *normal_cb,
sdei_event_callback *critical_cb);
diff --git a/include/uapi/linux/arm_sdei.h b/include/uapi/linux/arm_sdei.h
index af0630ba5437..22eb61612673 100644
--- a/include/uapi/linux/arm_sdei.h
+++ b/include/uapi/linux/arm_sdei.h
@@ -22,6 +22,7 @@
#define SDEI_1_0_FN_SDEI_PE_UNMASK SDEI_1_0_FN(0x0C)
#define SDEI_1_0_FN_SDEI_INTERRUPT_BIND SDEI_1_0_FN(0x0D)
#define SDEI_1_0_FN_SDEI_INTERRUPT_RELEASE SDEI_1_0_FN(0x0E)
+#define SDEI_1_0_FN_SDEI_EVENT_SIGNAL SDEI_1_0_FN(0x0F)
#define SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI_1_0_FN(0x11)
#define SDEI_1_0_FN_SDEI_SHARED_RESET SDEI_1_0_FN(0x12)
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v3 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64
2026-06-15 2:35 [PATCH v3 0/3] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-15 2:35 ` [PATCH v3 1/3] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
@ 2026-06-15 2:35 ` Kiryl Shutsemau
2026-06-15 10:18 ` Puranjay Mohan
2026-06-15 2:35 ` [PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
2 siblings, 1 reply; 6+ messages in thread
From: Kiryl Shutsemau @ 2026-06-15 2:35 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Deliver an NMI-like event to an interrupt-masked arm64 CPU via the
standard SDEI software-signalled event (event 0), without the pseudo-NMI
hot-path cost: register a handler for event 0 and poke a target with
sdei_event_signal(0, mpidr).
First user is arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
hung-task/soft-lockup dumps), which otherwise rides an IPI that can't
reach a masked CPU. Falls back to the IPI path when SDEI is absent; no
watchdog backend yet, so the stock detector is untouched.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
MAINTAINERS | 2 +-
arch/arm64/include/asm/nmi.h | 24 +++++
arch/arm64/kernel/smp.c | 11 +++
drivers/firmware/Kconfig | 19 ++++
drivers/firmware/Makefile | 1 +
drivers/firmware/arm_sdei_nmi.c | 149 ++++++++++++++++++++++++++++++++
6 files changed, 205 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/nmi.h
create mode 100644 drivers/firmware/arm_sdei_nmi.c
diff --git a/MAINTAINERS b/MAINTAINERS
index c8d4b913f26c..b5ddfb85dce9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24797,7 +24797,7 @@ M: James Morse <james.morse@arm.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
F: Documentation/devicetree/bindings/arm/firmware/sdei.txt
-F: drivers/firmware/arm_sdei.c
+F: drivers/firmware/arm_sdei*
F: include/linux/arm_sdei.h
F: include/uapi/linux/arm_sdei.h
diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
new file mode 100644
index 000000000000..9366be419d18
--- /dev/null
+++ b/arch/arm64/include/asm/nmi.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_NMI_H
+#define __ASM_NMI_H
+
+#include <linux/cpumask.h>
+
+/*
+ * Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
+ * its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
+ * drivers/firmware/arm_sdei_nmi.c implements them when active; a future
+ * FEAT_NMI provider could slot in here too. The stubs let callers stay
+ * unconditional when ARM_SDEI_NMI is off.
+ */
+#ifdef CONFIG_ARM_SDEI_NMI
+bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
+#else
+static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+ int exclude_cpu)
+{
+ return false;
+}
+#endif
+
+#endif /* __ASM_NMI_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 1aa324104afb..a670434a8cae 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -45,6 +45,7 @@
#include <asm/daifflags.h>
#include <asm/kvm_mmu.h>
#include <asm/mmu_context.h>
+#include <asm/nmi.h>
#include <asm/numa.h>
#include <asm/processor.h>
#include <asm/smp_plat.h>
@@ -927,6 +928,16 @@ static void arm64_backtrace_ipi(cpumask_t *mask)
void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
{
+ /*
+ * Prefer the SDEI cross-CPU NMI provider when active: firmware
+ * dispatches the event out of EL3 and reaches CPUs that have
+ * interrupts locally masked, without the per-IRQ-mask cost that
+ * pseudo-NMI pays for the same reach. The plain IPI path below
+ * can't reach such a CPU unless pseudo-NMI is enabled.
+ */
+ if (sdei_nmi_trigger_cpumask_backtrace(mask, exclude_cpu))
+ return;
+
/*
* NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the name,
* nothing about it truly needs to be implemented using an NMI, it's
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index bbd2155d8483..6501087ff90d 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -36,6 +36,25 @@ config ARM_SDE_INTERFACE
standard for registering callbacks from the platform firmware
into the OS. This is typically used to implement RAS notifications.
+config ARM_SDEI_NMI
+ bool "SDEI-based cross-CPU NMI service (arm64)"
+ depends on ARM64 && ARM_SDE_INTERFACE
+ help
+ Provides SDEI-based cross-CPU NMI delivery for hooks that need
+ to reach interrupt-masked CPUs on silicon that lacks FEAT_NMI:
+
+ - arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
+ hardlockup_all_cpu_backtrace, soft-lockup secondary dumps,
+ hung-task auxiliary dumps)
+
+ The driver registers a handler for the SDEI software-signalled
+ event (event 0) and reaches a target CPU by signalling it with
+ SDEI_EVENT_SIGNAL. Firmware delivers the event out of EL3
+ regardless of the target's PSTATE.DAIF -- forced delivery into a
+ CPU wedged with interrupts locally masked.
+
+ If unsure, say N.
+
config EDD
tristate "BIOS Enhanced Disk Drive calls determine boot disk"
depends on X86
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 4ddec2820c96..be46f1e1dc77 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -4,6 +4,7 @@
#
obj-$(CONFIG_ARM_SCPI_PROTOCOL) += arm_scpi.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += arm_sdei.o
+obj-$(CONFIG_ARM_SDEI_NMI) += arm_sdei_nmi.o
obj-$(CONFIG_DMI) += dmi_scan.o
obj-$(CONFIG_DMI_SYSFS) += dmi-sysfs.o
obj-$(CONFIG_EDD) += edd.o
diff --git a/drivers/firmware/arm_sdei_nmi.c b/drivers/firmware/arm_sdei_nmi.c
new file mode 100644
index 000000000000..a82776e7b55a
--- /dev/null
+++ b/drivers/firmware/arm_sdei_nmi.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm64 SDEI-based cross-CPU NMI service.
+ *
+ * Delivering an "NMI-shaped" event to an EL1 context that has locally
+ * masked interrupts, on silicon without FEAT_NMI, can be done two ways:
+ *
+ * - pseudo-NMI: mask "interrupts" via the GIC priority register
+ * (ICC_PMR_EL1) instead of PSTATE.DAIF, leaving a high-priority band
+ * deliverable. Functionally this works -- but it reimplements every
+ * local_irq_disable()/enable() and exception entry/exit as a PMR
+ * write plus synchronisation, a cost paid on that hot path forever,
+ * whether or not an NMI is ever delivered.
+ *
+ * - SDEI: leave interrupt masking as the cheap PSTATE.DAIF operation
+ * and have the firmware bounce an EL3-routed Group-0 SGI back to
+ * NS-EL1 as an event callback. The cost is a firmware round-trip,
+ * but only at the rare moment delivery is actually needed.
+ *
+ * This driver takes the second path: it keeps the IRQ-mask hot path
+ * free and pays only when it fires, which is what makes cross-CPU NMI
+ * affordable on hardware where the pseudo-NMI tax isn't, until FEAT_NMI
+ * makes NMI masking cheap in the architecture itself.
+ *
+ * Capabilities provided:
+ *
+ * - sdei_nmi_trigger_cpumask_backtrace() — override for arm64's
+ * arch_trigger_cpumask_backtrace(), so sysrq-l, RCU stall dumps,
+ * hardlockup_all_cpu_backtrace, soft-lockup/hung-task secondary
+ * dumps all reach interrupt-masked CPUs.
+ *
+ * Delivery uses the standard SDEI software-signalled event (event 0) and
+ * SDEI_EVENT_SIGNAL. We register a handler for event 0, enable it, and
+ * poke a target CPU with sdei_event_signal(0, mpidr): firmware makes
+ * event 0 pending on that PE and dispatches the handler NMI-like,
+ * regardless of the target's DAIF.
+ * Availability is simply whether event 0 registers and enables -- if SDEI
+ * and its software-signalled event are present we use it, otherwise the
+ * driver stays inert.
+ */
+
+#define pr_fmt(fmt) "sdei_nmi: " fmt
+
+#include <linux/arm_sdei.h>
+#include <linux/cpumask.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/nmi.h>
+#include <linux/printk.h>
+#include <linux/ptrace.h>
+#include <linux/smp.h>
+#include <linux/types.h>
+
+#include <asm/nmi.h>
+#include <asm/smp_plat.h>
+
+static bool sdei_nmi_available;
+
+#define SDEI_NMI_EVENT 0
+
+static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
+{
+ /*
+ * nmi_cpu_backtrace() no-ops unless this CPU's bit is set in the
+ * global backtrace mask (driven by nmi_trigger_cpumask_backtrace()),
+ * so a fire that reaches a CPU not being backtraced is harmless.
+ */
+ nmi_cpu_backtrace(regs);
+ return SDEI_EV_HANDLED;
+}
+NOKPROBE_SYMBOL(sdei_nmi_handler);
+
+static void sdei_nmi_fire(unsigned int target_cpu)
+{
+ int err = sdei_event_signal(SDEI_NMI_EVENT, cpu_logical_map(target_cpu));
+
+ if (err)
+ pr_warn("SDEI_EVENT_SIGNAL to CPU %u failed: %d\n",
+ target_cpu, err);
+}
+
+/*
+ * Raise callback for nmi_trigger_cpumask_backtrace(): signal event 0
+ * at every CPU still pending in @mask. The framework excludes the local
+ * CPU from @mask before calling us.
+ */
+static void sdei_nmi_raise_backtrace(cpumask_t *mask)
+{
+ unsigned int cpu;
+
+ for_each_cpu(cpu, mask)
+ sdei_nmi_fire(cpu);
+}
+
+/*
+ * Override hook for arch_trigger_cpumask_backtrace() (see
+ * arch/arm64/kernel/smp.c). Returns true when SDEI handled the request,
+ * which is the case whenever SDEI is active; on a false return the arch
+ * falls back to its regular-IRQ (or pseudo-NMI, if enabled) IPI.
+ *
+ * On a kernel built without paying the pseudo-NMI hot-path cost (the
+ * usual case for this driver's target), the IPI can't reach a CPU that
+ * has interrupts masked -- so the backtrace of the one CPU you care
+ * about comes back empty. SDEI is dispatched out of EL3 and lands
+ * regardless of the target's DAIF, without taxing the IRQ-mask path.
+ */
+bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
+{
+ if (!sdei_nmi_available)
+ return false;
+
+ nmi_trigger_cpumask_backtrace(mask, exclude_cpu,
+ sdei_nmi_raise_backtrace);
+ return true;
+}
+
+/*
+ * device_initcall (after arch_initcall(sdei_init), so the SDEI subsystem
+ * is up): probe the firmware, register the event, and turn on the
+ * cross-CPU service. If the probe fails the driver stays inert and the
+ * override hooks decline, leaving the arch's own paths in place.
+ */
+static int __init sdei_nmi_init(void)
+{
+ int err;
+
+ err = sdei_event_register(SDEI_NMI_EVENT, sdei_nmi_handler, NULL);
+ if (err) {
+ pr_err("sdei_event_register(%u) failed: %d\n",
+ SDEI_NMI_EVENT, err);
+ return 0;
+ }
+
+ err = sdei_event_enable(SDEI_NMI_EVENT);
+ if (err) {
+ pr_err("sdei_event_enable(%u) failed: %d\n",
+ SDEI_NMI_EVENT, err);
+ sdei_event_unregister(SDEI_NMI_EVENT);
+ return 0;
+ }
+
+ sdei_nmi_available = true;
+ pr_info("using SDEI cross-CPU NMI (SDEI_EVENT_SIGNAL, event %u)\n",
+ SDEI_NMI_EVENT);
+
+ return 0;
+}
+device_initcall(sdei_nmi_init);
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v3 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64
2026-06-15 2:35 ` [PATCH v3 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
@ 2026-06-15 10:18 ` Puranjay Mohan
0 siblings, 0 replies; 6+ messages in thread
From: Puranjay Mohan @ 2026-06-15 10:18 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
Marc Zyngier, Doug Anderson, Petr Mladek, Thomas Gleixner,
Andrew Morton, Baoquan He, Usama Arif, Breno Leitao,
Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
linux-arm-kernel, linux-kernel, Kiryl Shutsemau (Meta)
On Mon, Jun 15, 2026 at 4:35 AM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> Deliver an NMI-like event to an interrupt-masked arm64 CPU via the
> standard SDEI software-signalled event (event 0), without the pseudo-NMI
> hot-path cost: register a handler for event 0 and poke a target with
> sdei_event_signal(0, mpidr).
>
> First user is arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
> hung-task/soft-lockup dumps), which otherwise rides an IPI that can't
> reach a masked CPU. Falls back to the IPI path when SDEI is absent; no
> watchdog backend yet, so the stock detector is untouched.
>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> ---
> MAINTAINERS | 2 +-
> arch/arm64/include/asm/nmi.h | 24 +++++
> arch/arm64/kernel/smp.c | 11 +++
> drivers/firmware/Kconfig | 19 ++++
> drivers/firmware/Makefile | 1 +
> drivers/firmware/arm_sdei_nmi.c | 149 ++++++++++++++++++++++++++++++++
> 6 files changed, 205 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/include/asm/nmi.h
> create mode 100644 drivers/firmware/arm_sdei_nmi.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c8d4b913f26c..b5ddfb85dce9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -24797,7 +24797,7 @@ M: James Morse <james.morse@arm.com>
> L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
> S: Maintained
> F: Documentation/devicetree/bindings/arm/firmware/sdei.txt
> -F: drivers/firmware/arm_sdei.c
> +F: drivers/firmware/arm_sdei*
> F: include/linux/arm_sdei.h
> F: include/uapi/linux/arm_sdei.h
>
> diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
> new file mode 100644
> index 000000000000..9366be419d18
> --- /dev/null
> +++ b/arch/arm64/include/asm/nmi.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_NMI_H
> +#define __ASM_NMI_H
> +
> +#include <linux/cpumask.h>
> +
> +/*
> + * Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
> + * its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
> + * drivers/firmware/arm_sdei_nmi.c implements them when active; a future
> + * FEAT_NMI provider could slot in here too. The stubs let callers stay
> + * unconditional when ARM_SDEI_NMI is off.
> + */
> +#ifdef CONFIG_ARM_SDEI_NMI
> +bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
> +#else
> +static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> + int exclude_cpu)
> +{
> + return false;
> +}
> +#endif
> +
> +#endif /* __ASM_NMI_H */
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 1aa324104afb..a670434a8cae 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -45,6 +45,7 @@
> #include <asm/daifflags.h>
> #include <asm/kvm_mmu.h>
> #include <asm/mmu_context.h>
> +#include <asm/nmi.h>
> #include <asm/numa.h>
> #include <asm/processor.h>
> #include <asm/smp_plat.h>
> @@ -927,6 +928,16 @@ static void arm64_backtrace_ipi(cpumask_t *mask)
>
> void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
> {
> + /*
> + * Prefer the SDEI cross-CPU NMI provider when active: firmware
> + * dispatches the event out of EL3 and reaches CPUs that have
> + * interrupts locally masked, without the per-IRQ-mask cost that
> + * pseudo-NMI pays for the same reach. The plain IPI path below
> + * can't reach such a CPU unless pseudo-NMI is enabled.
> + */
> + if (sdei_nmi_trigger_cpumask_backtrace(mask, exclude_cpu))
> + return;
> +
> /*
> * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the name,
> * nothing about it truly needs to be implemented using an NMI, it's
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index bbd2155d8483..6501087ff90d 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -36,6 +36,25 @@ config ARM_SDE_INTERFACE
> standard for registering callbacks from the platform firmware
> into the OS. This is typically used to implement RAS notifications.
>
> +config ARM_SDEI_NMI
> + bool "SDEI-based cross-CPU NMI service (arm64)"
> + depends on ARM64 && ARM_SDE_INTERFACE
> + help
> + Provides SDEI-based cross-CPU NMI delivery for hooks that need
> + to reach interrupt-masked CPUs on silicon that lacks FEAT_NMI:
> +
> + - arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
> + hardlockup_all_cpu_backtrace, soft-lockup secondary dumps,
> + hung-task auxiliary dumps)
> +
> + The driver registers a handler for the SDEI software-signalled
> + event (event 0) and reaches a target CPU by signalling it with
> + SDEI_EVENT_SIGNAL. Firmware delivers the event out of EL3
> + regardless of the target's PSTATE.DAIF -- forced delivery into a
> + CPU wedged with interrupts locally masked.
> +
> + If unsure, say N.
> +
> config EDD
> tristate "BIOS Enhanced Disk Drive calls determine boot disk"
> depends on X86
> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
> index 4ddec2820c96..be46f1e1dc77 100644
> --- a/drivers/firmware/Makefile
> +++ b/drivers/firmware/Makefile
> @@ -4,6 +4,7 @@
> #
> obj-$(CONFIG_ARM_SCPI_PROTOCOL) += arm_scpi.o
> obj-$(CONFIG_ARM_SDE_INTERFACE) += arm_sdei.o
> +obj-$(CONFIG_ARM_SDEI_NMI) += arm_sdei_nmi.o
> obj-$(CONFIG_DMI) += dmi_scan.o
> obj-$(CONFIG_DMI_SYSFS) += dmi-sysfs.o
> obj-$(CONFIG_EDD) += edd.o
> diff --git a/drivers/firmware/arm_sdei_nmi.c b/drivers/firmware/arm_sdei_nmi.c
> new file mode 100644
> index 000000000000..a82776e7b55a
> --- /dev/null
> +++ b/drivers/firmware/arm_sdei_nmi.c
> @@ -0,0 +1,149 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm64 SDEI-based cross-CPU NMI service.
> + *
> + * Delivering an "NMI-shaped" event to an EL1 context that has locally
> + * masked interrupts, on silicon without FEAT_NMI, can be done two ways:
> + *
> + * - pseudo-NMI: mask "interrupts" via the GIC priority register
> + * (ICC_PMR_EL1) instead of PSTATE.DAIF, leaving a high-priority band
> + * deliverable. Functionally this works -- but it reimplements every
> + * local_irq_disable()/enable() and exception entry/exit as a PMR
> + * write plus synchronisation, a cost paid on that hot path forever,
> + * whether or not an NMI is ever delivered.
> + *
> + * - SDEI: leave interrupt masking as the cheap PSTATE.DAIF operation
> + * and have the firmware bounce an EL3-routed Group-0 SGI back to
> + * NS-EL1 as an event callback. The cost is a firmware round-trip,
> + * but only at the rare moment delivery is actually needed.
> + *
> + * This driver takes the second path: it keeps the IRQ-mask hot path
> + * free and pays only when it fires, which is what makes cross-CPU NMI
> + * affordable on hardware where the pseudo-NMI tax isn't, until FEAT_NMI
> + * makes NMI masking cheap in the architecture itself.
> + *
> + * Capabilities provided:
> + *
> + * - sdei_nmi_trigger_cpumask_backtrace() — override for arm64's
> + * arch_trigger_cpumask_backtrace(), so sysrq-l, RCU stall dumps,
> + * hardlockup_all_cpu_backtrace, soft-lockup/hung-task secondary
> + * dumps all reach interrupt-masked CPUs.
> + *
> + * Delivery uses the standard SDEI software-signalled event (event 0) and
> + * SDEI_EVENT_SIGNAL. We register a handler for event 0, enable it, and
> + * poke a target CPU with sdei_event_signal(0, mpidr): firmware makes
> + * event 0 pending on that PE and dispatches the handler NMI-like,
> + * regardless of the target's DAIF.
> + * Availability is simply whether event 0 registers and enables -- if SDEI
> + * and its software-signalled event are present we use it, otherwise the
> + * driver stays inert.
> + */
> +
> +#define pr_fmt(fmt) "sdei_nmi: " fmt
> +
> +#include <linux/arm_sdei.h>
> +#include <linux/cpumask.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/kprobes.h>
> +#include <linux/nmi.h>
> +#include <linux/printk.h>
> +#include <linux/ptrace.h>
> +#include <linux/smp.h>
> +#include <linux/types.h>
> +
> +#include <asm/nmi.h>
> +#include <asm/smp_plat.h>
> +
> +static bool sdei_nmi_available;
> +
> +#define SDEI_NMI_EVENT 0
> +
> +static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
> +{
> + /*
> + * nmi_cpu_backtrace() no-ops unless this CPU's bit is set in the
> + * global backtrace mask (driven by nmi_trigger_cpumask_backtrace()),
> + * so a fire that reaches a CPU not being backtraced is harmless.
> + */
> + nmi_cpu_backtrace(regs);
> + return SDEI_EV_HANDLED;
> +}
> +NOKPROBE_SYMBOL(sdei_nmi_handler);
> +
> +static void sdei_nmi_fire(unsigned int target_cpu)
> +{
> + int err = sdei_event_signal(SDEI_NMI_EVENT, cpu_logical_map(target_cpu));
> +
> + if (err)
> + pr_warn("SDEI_EVENT_SIGNAL to CPU %u failed: %d\n",
> + target_cpu, err);
> +}
> +
> +/*
> + * Raise callback for nmi_trigger_cpumask_backtrace(): signal event 0
> + * at every CPU still pending in @mask. The framework excludes the local
> + * CPU from @mask before calling us.
> + */
> +static void sdei_nmi_raise_backtrace(cpumask_t *mask)
> +{
> + unsigned int cpu;
> +
> + for_each_cpu(cpu, mask)
> + sdei_nmi_fire(cpu);
> +}
> +
> +/*
> + * Override hook for arch_trigger_cpumask_backtrace() (see
> + * arch/arm64/kernel/smp.c). Returns true when SDEI handled the request,
> + * which is the case whenever SDEI is active; on a false return the arch
> + * falls back to its regular-IRQ (or pseudo-NMI, if enabled) IPI.
> + *
> + * On a kernel built without paying the pseudo-NMI hot-path cost (the
> + * usual case for this driver's target), the IPI can't reach a CPU that
> + * has interrupts masked -- so the backtrace of the one CPU you care
> + * about comes back empty. SDEI is dispatched out of EL3 and lands
> + * regardless of the target's DAIF, without taxing the IRQ-mask path.
> + */
> +bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
> +{
> + if (!sdei_nmi_available)
> + return false;
> +
> + nmi_trigger_cpumask_backtrace(mask, exclude_cpu,
> + sdei_nmi_raise_backtrace);
> + return true;
> +}
> +
> +/*
> + * device_initcall (after arch_initcall(sdei_init), so the SDEI subsystem
> + * is up): probe the firmware, register the event, and turn on the
> + * cross-CPU service. If the probe fails the driver stays inert and the
> + * override hooks decline, leaving the arch's own paths in place.
> + */
> +static int __init sdei_nmi_init(void)
> +{
> + int err;
> +
> + err = sdei_event_register(SDEI_NMI_EVENT, sdei_nmi_handler, NULL);
> + if (err) {
> + pr_err("sdei_event_register(%u) failed: %d\n",
> + SDEI_NMI_EVENT, err);
> + return 0;
> + }
This initcall runs unconditionally whenever ARM_SDEI_NMI is built in,
which includes the many arm64 systems that have no SDEI at all. On
those, sdei_event_register() -> sdei_event_create() ->
invoke_sdei_fn() returns -EIO, and the core already complains:
pr_warn("Failed to create event %u: %d\n", event_num, err);
(that one isn't gated on err != -EIO, unlike sdei_mask_local_cpu() & friends).
We then add a second pr_err() on top, so every boot on a non-SDEI box
with this config gets two alarming lines for what is just "no firmware
support". At minimum, don't shout for -EIO/-EOPNOTSUPP here. Better,
skip the probe when SDEI isn't present there's no exported predicate
today, but -EIO is the de-facto one.
> + err = sdei_event_enable(SDEI_NMI_EVENT);
> + if (err) {
> + pr_err("sdei_event_enable(%u) failed: %d\n",
> + SDEI_NMI_EVENT, err);
> + sdei_event_unregister(SDEI_NMI_EVENT);
> + return 0;
> + }
> +
> + sdei_nmi_available = true;
> + pr_info("using SDEI cross-CPU NMI (SDEI_EVENT_SIGNAL, event %u)\n",
> + SDEI_NMI_EVENT);
> +
> + return 0;
> +}
> +device_initcall(sdei_nmi_init);
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort
2026-06-15 2:35 [PATCH v3 0/3] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-15 2:35 ` [PATCH v3 1/3] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-15 2:35 ` [PATCH v3 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
@ 2026-06-15 2:35 ` Kiryl Shutsemau
2026-06-15 10:25 ` Puranjay Mohan
2 siblings, 1 reply; 6+ messages in thread
From: Kiryl Shutsemau @ 2026-06-15 2:35 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
A CPU wedged with interrupts masked ignores the stop IPI, and without
pseudo-NMI there is no NMI IPI to escalate to: a reboot proceeds with
the CPU still running, and a kdump misses its registers.
Add a third rung to smp_send_stop(): once the IPI (and pseudo-NMI IPI,
if enabled) rungs have run, signal SDEI event 0 at whatever stayed
online. Firmware delivers it regardless of the target's DAIF, so it
reaches a CPU a plain IPI cannot; the target acks by going offline,
which the caller already polls for.
Fold the stop bookkeeping into one arm64_nmi_cpu_stop(regs,
die_on_crash), shared by the stop IPI handlers, panic_smp_self_stop()
and the SDEI handler, replacing the near-duplicate local_cpu_stop() and
ipi_cpu_crash_stop(). @die_on_crash is the only difference: the IPI
handlers pass true and PSCI CPU_OFF the CPU on a crash stop so a capture
kernel can reclaim it; the SDEI handler and self-stop pass false and
park. The SDEI park is required, not conservative -- its handler runs
inside an SDEI event that is never completed (completing it resumes the
wedged context), and a CPU_OFF from that unfinished-event context wedges
EL3 on some firmware (left as a follow-up). The dump is unaffected; only
re-onlining the CPU in an SMP capture kernel is lost.
Suggested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
---
arch/arm64/include/asm/nmi.h | 24 +++++++
arch/arm64/kernel/smp.c | 109 +++++++++++++++++++++-----------
drivers/firmware/Kconfig | 2 +
drivers/firmware/arm_sdei_nmi.c | 75 ++++++++++++++++++++++
4 files changed, 172 insertions(+), 38 deletions(-)
diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
index 9366be419d18..2e8974ff8d63 100644
--- a/arch/arm64/include/asm/nmi.h
+++ b/arch/arm64/include/asm/nmi.h
@@ -4,21 +4,45 @@
#include <linux/cpumask.h>
+struct pt_regs;
+
/*
* Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
* its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
* drivers/firmware/arm_sdei_nmi.c implements them when active; a future
* FEAT_NMI provider could slot in here too. The stubs let callers stay
* unconditional when ARM_SDEI_NMI is off.
+ *
+ * sdei_nmi_active() lets a caller test for the service before committing
+ * to (and waiting on) the SDEI stop rung; sdei_nmi_stop_cpus() then signals
+ * the targets, which ack by going offline.
*/
#ifdef CONFIG_ARM_SDEI_NMI
bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
+bool sdei_nmi_active(void);
+void sdei_nmi_stop_cpus(const cpumask_t *mask);
#else
static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
int exclude_cpu)
{
return false;
}
+
+static inline bool sdei_nmi_active(void)
+{
+ return false;
+}
+
+static inline void sdei_nmi_stop_cpus(const cpumask_t *mask) { }
#endif
+/*
+ * The common "stop this CPU" entry every arm64 stop path funnels through:
+ * the regular/pseudo-NMI stop IPI handlers, panic_smp_self_stop(), and the
+ * SDEI cross-CPU NMI handler. @die_on_crash powers the CPU off on the kdump
+ * crash path (IPI handlers) instead of parking it (SDEI / self-stop).
+ * Defined in arch/arm64/kernel/smp.c.
+ */
+void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash);
+
#endif /* __ASM_NMI_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index a670434a8cae..e85a4ba18d5c 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -33,6 +33,7 @@
#include <linux/kernel_stat.h>
#include <linux/kexec.h>
#include <linux/kgdb.h>
+#include <linux/kprobes.h>
#include <linux/kvm_host.h>
#include <linux/nmi.h>
@@ -862,14 +863,58 @@ void arch_irq_work_raise(void)
}
#endif
-static void __noreturn local_cpu_stop(unsigned int cpu)
+/*
+ * Bring the local CPU to a stop, saving its register state into the vmcore
+ * on the kdump crash path first. The single point every arm64 stop path
+ * funnels through, so the bookkeeping (mask interrupts, mark offline, mask
+ * SDEI, optionally power off) lives in one place:
+ *
+ * - the regular IPI_CPU_STOP and pseudo-NMI IPI_CPU_STOP_NMI handlers;
+ * - panic_smp_self_stop(), a CPU parking itself on a parallel panic();
+ * - the SDEI cross-CPU NMI handler (drivers/firmware/arm_sdei_nmi.c),
+ * which reaches CPUs the stop IPIs could not.
+ *
+ * @regs is the register state to record in the vmcore on a crash stop; NULL
+ * means "capture the current context". @die_on_crash decides the kdump crash
+ * path: the IPI stop handlers pass true and power the CPU off (PSCI CPU_OFF,
+ * via __cpu_try_die()) so a capture kernel can reclaim it. The SDEI handler
+ * and panic_smp_self_stop() pass false and only park. For SDEI that is
+ * required, not just conservative: it runs inside an SDEI event that is
+ * deliberately never completed (completing it has firmware resume the wedged
+ * context), and a CPU_OFF from that not-yet-completed context wedges EL3 on
+ * some firmware -- a documented follow-up. Parking also matches this path's
+ * own fallback when CPU_OFF is unavailable.
+ */
+void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash)
{
+ unsigned int cpu = smp_processor_id();
+ bool crash = IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop;
+
+ /*
+ * Use local_daif_mask() instead of local_irq_disable() to make sure
+ * that pseudo-NMIs are disabled. The "stop" code starts with an IRQ
+ * and falls back to NMI (which might be pseudo). If the IRQ finally
+ * goes through right as we're timing out then the NMI could interrupt
+ * us. It's better to prevent the NMI and let the IRQ finish since the
+ * pt_regs will be better.
+ */
+ local_daif_mask();
+
+ if (crash)
+ crash_save_cpu(regs, cpu);
+
+ /* the ack a stop requester (e.g. smp_send_stop()) polls for */
set_cpu_online(cpu, false);
- local_daif_mask();
sdei_mask_local_cpu();
+
+ if (crash && die_on_crash)
+ __cpu_try_die(cpu);
+
+ /* just in case */
cpu_park_loop();
}
+NOKPROBE_SYMBOL(arm64_nmi_cpu_stop);
/*
* We need to implement panic_smp_self_stop() for parallel panic() calls, so
@@ -878,36 +923,7 @@ static void __noreturn local_cpu_stop(unsigned int cpu)
*/
void __noreturn panic_smp_self_stop(void)
{
- local_cpu_stop(smp_processor_id());
-}
-
-static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
-{
-#ifdef CONFIG_KEXEC_CORE
- /*
- * Use local_daif_mask() instead of local_irq_disable() to make sure
- * that pseudo-NMIs are disabled. The "crash stop" code starts with
- * an IRQ and falls back to NMI (which might be pseudo). If the IRQ
- * finally goes through right as we're timing out then the NMI could
- * interrupt us. It's better to prevent the NMI and let the IRQ
- * finish since the pt_regs will be better.
- */
- local_daif_mask();
-
- crash_save_cpu(regs, cpu);
-
- set_cpu_online(cpu, false);
-
- sdei_mask_local_cpu();
-
- if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
- __cpu_try_die(cpu);
-
- /* just in case */
- cpu_park_loop();
-#else
- BUG();
-#endif
+ arm64_nmi_cpu_stop(NULL, false);
}
static void arm64_send_ipi(const cpumask_t *mask, unsigned int nr)
@@ -984,12 +1000,7 @@ static void do_handle_IPI(int ipinr)
case IPI_CPU_STOP:
case IPI_CPU_STOP_NMI:
- if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) {
- ipi_cpu_crash_stop(cpu, get_irq_regs());
- unreachable();
- } else {
- local_cpu_stop(cpu);
- }
+ arm64_nmi_cpu_stop(get_irq_regs(), true);
break;
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
@@ -1263,6 +1274,28 @@ void smp_send_stop(void)
udelay(1);
}
+ /*
+ * If CPUs are *still* online, try the SDEI cross-CPU NMI. Firmware
+ * delivers it regardless of the target's DAIF state, so it reaches
+ * a CPU spinning with interrupts masked, which neither rung above
+ * could (without pseudo-NMI there is no NMI rung at all). Allow
+ * 100ms: a firmware round-trip per CPU, with headroom.
+ */
+ if (num_other_online_cpus() && sdei_nmi_active()) {
+ /* re-snapshot after the rungs above took CPUs offline */
+ smp_rmb();
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ pr_info("SMP: retry stop with SDEI NMI for CPUs %*pbl\n",
+ cpumask_pr_args(&mask));
+
+ sdei_nmi_stop_cpus(&mask);
+ timeout = USEC_PER_MSEC * 100;
+ while (num_other_online_cpus() && timeout--)
+ udelay(1);
+ }
+
if (num_other_online_cpus()) {
smp_rmb();
cpumask_copy(&mask, cpu_online_mask);
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 6501087ff90d..ab0ee36d46e7 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -46,6 +46,8 @@ config ARM_SDEI_NMI
- arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
hardlockup_all_cpu_backtrace, soft-lockup secondary dumps,
hung-task auxiliary dumps)
+ - smp_send_stop() escalation (reboot/halt and the
+ panic / kdump crash stop)
The driver registers a handler for the SDEI software-signalled
event (event 0) and reaches a target CPU by signalling it with
diff --git a/drivers/firmware/arm_sdei_nmi.c b/drivers/firmware/arm_sdei_nmi.c
index a82776e7b55a..b2a69be6008f 100644
--- a/drivers/firmware/arm_sdei_nmi.c
+++ b/drivers/firmware/arm_sdei_nmi.c
@@ -29,6 +29,11 @@
* hardlockup_all_cpu_backtrace, soft-lockup/hung-task secondary
* dumps all reach interrupt-masked CPUs.
*
+ * - sdei_nmi_stop_cpus() — the last rung of smp_send_stop()'s
+ * escalation (reboot/halt and the panic/kdump crash stop alike),
+ * reaching CPUs that ignored the stop IPIs; on the kdump path the
+ * wedged context is captured into the vmcore before the CPU parks.
+ *
* Delivery uses the standard SDEI software-signalled event (event 0) and
* SDEI_EVENT_SIGNAL. We register a handler for event 0, enable it, and
* poke a target CPU with sdei_event_signal(0, mpidr): firmware makes
@@ -59,8 +64,51 @@ static bool sdei_nmi_available;
#define SDEI_NMI_EVENT 0
+/*
+ * Backtrace and stop both ride SDEI event 0. That is not a chosen economy:
+ * event 0 is the only architecturally software-signalled event -- the sole
+ * event SDEI_EVENT_SIGNAL can target at an arbitrary PE. Every other event
+ * number is a firmware/platform interrupt-bound event, not something the
+ * kernel can raise cross-CPU, so a dedicated "stop" event would need
+ * firmware to define and bind it -- exactly the firmware dependency this
+ * driver sets out to avoid.
+ *
+ * Sharing one event means the handler must tell a stop apart from a
+ * backtrace. A stop is terminal and system-wide -- sdei_nmi_stop_cpus() is
+ * only reached from smp_send_stop() (reboot/halt/panic/kdump), which never
+ * returns -- so once a stop is requested, every later event-0 fire is a
+ * stop too. A single write-once flag therefore carries as much as a
+ * per-CPU mask would: sdei_nmi_stop_cpus() sets it before signalling, and
+ * the handler reads a set flag as "stop this CPU" and a clear flag as
+ * "backtrace" (handled by nmi_cpu_backtrace(), which self-gates on the
+ * framework's backtrace mask). A backtrace fire that races in after a stop
+ * has begun just stops that CPU instead -- harmless, it is going down.
+ */
+static bool sdei_nmi_stopping;
+
static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
{
+ if (READ_ONCE(sdei_nmi_stopping)) {
+ /*
+ * Never returns, and deliberately never completes the SDEI
+ * event: SDEI_EVENT_COMPLETE has firmware restore the
+ * interrupted context, which would land the CPU back in
+ * the wedged loop (or in do_idle, which BUGs at
+ * cpuhp_report_idle_dead once it sees itself offline).
+ * Returning a modified pt_regs doesn't help --
+ * arch/arm64/kernel/sdei.c::do_sdei_event only honours a PC
+ * override via its IRQ-state heuristic and otherwise hands
+ * EL3 its own saved-context slot back.
+ *
+ * Trade-off: EL3 retains ~one saved-context slot per parked
+ * CPU until the next hardware reset (~hundreds of bytes per
+ * CPU). Recoverability is unchanged versus an IPI-stopped
+ * CPU: neither comes back without a reset.
+ */
+ arm64_nmi_cpu_stop(regs, false);
+ /* unreachable */
+ }
+
/*
* nmi_cpu_backtrace() no-ops unless this CPU's bit is set in the
* global backtrace mask (driven by nmi_trigger_cpumask_backtrace()),
@@ -115,6 +163,33 @@ bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
return true;
}
+bool sdei_nmi_active(void)
+{
+ return sdei_nmi_available;
+}
+
+/*
+ * Last rung of the stop escalation in smp_send_stop() (see
+ * arch/arm64/kernel/smp.c). The caller runs the regular stop IPI (and
+ * the pseudo-NMI stop IPI, where available) first; @mask holds whatever
+ * stayed online through those -- typically CPUs wedged with interrupts
+ * masked, unreachable by an IPI. Mark the stop in progress and signal
+ * event 0 at each target; a target acks by marking itself offline, which
+ * the caller polls for. The caller has already confirmed sdei_nmi_active().
+ */
+void sdei_nmi_stop_cpus(const cpumask_t *mask)
+{
+ unsigned int cpu;
+
+ WRITE_ONCE(sdei_nmi_stopping, true);
+
+ /* Publish the flag before the SMCs make targets read it */
+ smp_wmb();
+
+ for_each_cpu(cpu, mask)
+ sdei_nmi_fire(cpu);
+}
+
/*
* device_initcall (after arch_initcall(sdei_init), so the SDEI subsystem
* is up): probe the firmware, register the event, and turn on the
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort
2026-06-15 2:35 ` [PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
@ 2026-06-15 10:25 ` Puranjay Mohan
0 siblings, 0 replies; 6+ messages in thread
From: Puranjay Mohan @ 2026-06-15 10:25 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
Marc Zyngier, Doug Anderson, Petr Mladek, Thomas Gleixner,
Andrew Morton, Baoquan He, Usama Arif, Breno Leitao,
Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
linux-arm-kernel, linux-kernel, Kiryl Shutsemau (Meta)
On Mon, Jun 15, 2026 at 4:36 AM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> A CPU wedged with interrupts masked ignores the stop IPI, and without
> pseudo-NMI there is no NMI IPI to escalate to: a reboot proceeds with
> the CPU still running, and a kdump misses its registers.
>
> Add a third rung to smp_send_stop(): once the IPI (and pseudo-NMI IPI,
> if enabled) rungs have run, signal SDEI event 0 at whatever stayed
> online. Firmware delivers it regardless of the target's DAIF, so it
> reaches a CPU a plain IPI cannot; the target acks by going offline,
> which the caller already polls for.
>
> Fold the stop bookkeeping into one arm64_nmi_cpu_stop(regs,
> die_on_crash), shared by the stop IPI handlers, panic_smp_self_stop()
> and the SDEI handler, replacing the near-duplicate local_cpu_stop() and
> ipi_cpu_crash_stop(). @die_on_crash is the only difference: the IPI
> handlers pass true and PSCI CPU_OFF the CPU on a crash stop so a capture
> kernel can reclaim it; the SDEI handler and self-stop pass false and
> park. The SDEI park is required, not conservative -- its handler runs
> inside an SDEI event that is never completed (completing it resumes the
> wedged context), and a CPU_OFF from that unfinished-event context wedges
> EL3 on some firmware (left as a follow-up). The dump is unaffected; only
> re-onlining the CPU in an SMP capture kernel is lost.
>
> Suggested-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> ---
> arch/arm64/include/asm/nmi.h | 24 +++++++
> arch/arm64/kernel/smp.c | 109 +++++++++++++++++++++-----------
> drivers/firmware/Kconfig | 2 +
> drivers/firmware/arm_sdei_nmi.c | 75 ++++++++++++++++++++++
> 4 files changed, 172 insertions(+), 38 deletions(-)
>
> diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
> index 9366be419d18..2e8974ff8d63 100644
> --- a/arch/arm64/include/asm/nmi.h
> +++ b/arch/arm64/include/asm/nmi.h
> @@ -4,21 +4,45 @@
>
> #include <linux/cpumask.h>
>
> +struct pt_regs;
> +
> /*
> * Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
> * its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
> * drivers/firmware/arm_sdei_nmi.c implements them when active; a future
> * FEAT_NMI provider could slot in here too. The stubs let callers stay
> * unconditional when ARM_SDEI_NMI is off.
> + *
> + * sdei_nmi_active() lets a caller test for the service before committing
> + * to (and waiting on) the SDEI stop rung; sdei_nmi_stop_cpus() then signals
> + * the targets, which ack by going offline.
> */
> #ifdef CONFIG_ARM_SDEI_NMI
> bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
> +bool sdei_nmi_active(void);
> +void sdei_nmi_stop_cpus(const cpumask_t *mask);
> #else
> static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> int exclude_cpu)
> {
> return false;
> }
> +
> +static inline bool sdei_nmi_active(void)
> +{
> + return false;
> +}
> +
> +static inline void sdei_nmi_stop_cpus(const cpumask_t *mask) { }
> #endif
>
> +/*
> + * The common "stop this CPU" entry every arm64 stop path funnels through:
> + * the regular/pseudo-NMI stop IPI handlers, panic_smp_self_stop(), and the
> + * SDEI cross-CPU NMI handler. @die_on_crash powers the CPU off on the kdump
> + * crash path (IPI handlers) instead of parking it (SDEI / self-stop).
> + * Defined in arch/arm64/kernel/smp.c.
> + */
> +void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash);
> +
> #endif /* __ASM_NMI_H */
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index a670434a8cae..e85a4ba18d5c 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -33,6 +33,7 @@
> #include <linux/kernel_stat.h>
> #include <linux/kexec.h>
> #include <linux/kgdb.h>
> +#include <linux/kprobes.h>
> #include <linux/kvm_host.h>
> #include <linux/nmi.h>
>
> @@ -862,14 +863,58 @@ void arch_irq_work_raise(void)
> }
> #endif
>
> -static void __noreturn local_cpu_stop(unsigned int cpu)
> +/*
> + * Bring the local CPU to a stop, saving its register state into the vmcore
> + * on the kdump crash path first. The single point every arm64 stop path
> + * funnels through, so the bookkeeping (mask interrupts, mark offline, mask
> + * SDEI, optionally power off) lives in one place:
> + *
> + * - the regular IPI_CPU_STOP and pseudo-NMI IPI_CPU_STOP_NMI handlers;
> + * - panic_smp_self_stop(), a CPU parking itself on a parallel panic();
> + * - the SDEI cross-CPU NMI handler (drivers/firmware/arm_sdei_nmi.c),
> + * which reaches CPUs the stop IPIs could not.
> + *
> + * @regs is the register state to record in the vmcore on a crash stop; NULL
> + * means "capture the current context". @die_on_crash decides the kdump crash
> + * path: the IPI stop handlers pass true and power the CPU off (PSCI CPU_OFF,
> + * via __cpu_try_die()) so a capture kernel can reclaim it. The SDEI handler
> + * and panic_smp_self_stop() pass false and only park. For SDEI that is
> + * required, not just conservative: it runs inside an SDEI event that is
> + * deliberately never completed (completing it has firmware resume the wedged
> + * context), and a CPU_OFF from that not-yet-completed context wedges EL3 on
> + * some firmware -- a documented follow-up. Parking also matches this path's
> + * own fallback when CPU_OFF is unavailable.
> + */
> +void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash)
> {
> + unsigned int cpu = smp_processor_id();
> + bool crash = IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop;
> +
> + /*
> + * Use local_daif_mask() instead of local_irq_disable() to make sure
> + * that pseudo-NMIs are disabled. The "stop" code starts with an IRQ
> + * and falls back to NMI (which might be pseudo). If the IRQ finally
> + * goes through right as we're timing out then the NMI could interrupt
> + * us. It's better to prevent the NMI and let the IRQ finish since the
> + * pt_regs will be better.
> + */
> + local_daif_mask();
> +
> + if (crash)
> + crash_save_cpu(regs, cpu);
> +
> + /* the ack a stop requester (e.g. smp_send_stop()) polls for */
> set_cpu_online(cpu, false);
>
> - local_daif_mask();
> sdei_mask_local_cpu();
> +
> + if (crash && die_on_crash)
> + __cpu_try_die(cpu);
> +
> + /* just in case */
> cpu_park_loop();
> }
> +NOKPROBE_SYMBOL(arm64_nmi_cpu_stop);
>
> /*
> * We need to implement panic_smp_self_stop() for parallel panic() calls, so
> @@ -878,36 +923,7 @@ static void __noreturn local_cpu_stop(unsigned int cpu)
> */
> void __noreturn panic_smp_self_stop(void)
> {
> - local_cpu_stop(smp_processor_id());
> -}
> -
> -static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
> -{
> -#ifdef CONFIG_KEXEC_CORE
> - /*
> - * Use local_daif_mask() instead of local_irq_disable() to make sure
> - * that pseudo-NMIs are disabled. The "crash stop" code starts with
> - * an IRQ and falls back to NMI (which might be pseudo). If the IRQ
> - * finally goes through right as we're timing out then the NMI could
> - * interrupt us. It's better to prevent the NMI and let the IRQ
> - * finish since the pt_regs will be better.
> - */
> - local_daif_mask();
> -
> - crash_save_cpu(regs, cpu);
> -
> - set_cpu_online(cpu, false);
> -
> - sdei_mask_local_cpu();
> -
> - if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
> - __cpu_try_die(cpu);
> -
> - /* just in case */
> - cpu_park_loop();
> -#else
> - BUG();
> -#endif
> + arm64_nmi_cpu_stop(NULL, false);
> }
panic_smp_self_stop() passes regs == NULL. If a second CPU panics
after the primary has already set crash_stop, it loses the panic_cpu
cmpxchg and calls panic_smp_self_stop(); if it was running with
interrupts masked it never took the stop IPI, so it gets here with
crash_stop == 1. crash is then true and we do crash_save_cpu(NULL,
cpu), which ends up in elf_core_copy_regs(), and on arm64 that is just
*(struct user_pt_regs *)&(dest) = (regs)->user_regs;
so a straight NULL deref -> synchronous abort while we're in the
middle of crashing. The old local_cpu_stop() never called
crash_save_cpu(), so this is a new regression from the unification.
The comment above the function says NULL means "capture the current
context", but crash_save_cpu() doesn't do that, it just dereferences
regs. If that's the intent, materialise it:
if (crash) {
struct pt_regs local;
if (!regs) {
crash_setup_regs(&local, NULL);
regs = &local;
}
crash_save_cpu(regs, cpu);
}
crash_setup_regs(..., NULL) is the existing "capture current" helper. Or just
skip the save when regs is NULL if the self-stop registers aren't
worth having.
^ permalink raw reply [flat|nested] 6+ messages in thread