* [PATCH v4 1/4] firmware: arm_sdei: add sdei_is_present()
2026-06-17 19:20 [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
@ 2026-06-17 19:20 ` Kiryl Shutsemau
2026-06-17 20:02 ` Doug Anderson
2026-06-17 19:20 ` [PATCH v4 2/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Kiryl Shutsemau @ 2026-06-17 19:20 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
invoke_sdei_fn() returns -EIO when no SDEI conduit was probed, and the
core warns ("Failed to create event ...") on any registration that hits
that. An optional consumer that registers an event from an unconditional
initcall would therefore make every boot on a non-SDEI system emit that
warning for what is simply absent firmware.
Expose whether SDEI firmware is present so such a consumer can skip
registration -- and the warning -- when there is nothing to talk to.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
---
drivers/firmware/arm_sdei.c | 10 ++++++++++
include/linux/arm_sdei.h | 3 +++
2 files changed, 13 insertions(+)
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index f39ed7ba3a38..c161cf263547 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -339,6 +339,16 @@ static void _ipi_unmask_cpu(void *ignored)
sdei_unmask_local_cpu();
}
+/*
+ * Was SDEI firmware probed and is it usable? Lets optional consumers skip
+ * registering an event -- and the warning a failed registration emits -- on
+ * systems with no SDEI.
+ */
+bool sdei_is_present(void)
+{
+ return sdei_firmware_call;
+}
+
static void _ipi_private_reset(void *ignored)
{
int err;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index f652a5028b59..b07113eeeff7 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -37,6 +37,9 @@ int sdei_event_unregister(u32 event_num);
int sdei_event_enable(u32 event_num);
int sdei_event_disable(u32 event_num);
+/* Was SDEI firmware probed and usable? */
+bool sdei_is_present(void);
+
/* GHES register/unregister helpers */
int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *normal_cb,
sdei_event_callback *critical_cb);
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v4 1/4] firmware: arm_sdei: add sdei_is_present()
2026-06-17 19:20 ` [PATCH v4 1/4] firmware: arm_sdei: add sdei_is_present() Kiryl Shutsemau
@ 2026-06-17 20:02 ` Doug Anderson
0 siblings, 0 replies; 7+ messages in thread
From: Doug Anderson @ 2026-06-17 20:02 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
Marc Zyngier, Petr Mladek, Thomas Gleixner, Andrew Morton,
Baoquan He, Puranjay Mohan, Usama Arif, Breno Leitao,
Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
linux-arm-kernel, linux-kernel, Kiryl Shutsemau (Meta)
Hi,
On Wed, Jun 17, 2026 at 12:20 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> invoke_sdei_fn() returns -EIO when no SDEI conduit was probed, and the
> core warns ("Failed to create event ...") on any registration that hits
> that. An optional consumer that registers an event from an unconditional
> initcall would therefore make every boot on a non-SDEI system emit that
> warning for what is simply absent firmware.
>
> Expose whether SDEI firmware is present so such a consumer can skip
> registration -- and the warning -- when there is nothing to talk to.
>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> ---
> drivers/firmware/arm_sdei.c | 10 ++++++++++
> include/linux/arm_sdei.h | 3 +++
> 2 files changed, 13 insertions(+)
From a non-expert in SDEI, but FWIW:
Reviewed-by: Douglas Anderson <dianders@chromium.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 2/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support
2026-06-17 19:20 [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 1/4] firmware: arm_sdei: add sdei_is_present() Kiryl Shutsemau
@ 2026-06-17 19:20 ` Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 3/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 4/4] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
3 siblings, 0 replies; 7+ messages in thread
From: Kiryl Shutsemau @ 2026-06-17 19:20 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Add sdei_event_signal(), a thin wrapper over the SDEI_EVENT_SIGNAL call
(DEN0054) that makes the software-signalled event (event 0) pending on a
target PE -- delivered NMI-like even when that PE has interrupts masked.
It takes no locks, so it is safe to call from NMI / crash context.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
drivers/firmware/arm_sdei.c | 12 ++++++++++++
include/linux/arm_sdei.h | 6 ++++++
include/uapi/linux/arm_sdei.h | 1 +
3 files changed, 19 insertions(+)
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index c161cf263547..e8dd2f0f3919 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -339,6 +339,18 @@ static void _ipi_unmask_cpu(void *ignored)
sdei_unmask_local_cpu();
}
+/*
+ * Signal the software-signalled event (event 0) to @mpidr. Does nothing
+ * but the SMC -- no locks, no event lookup -- so it is safe from NMI /
+ * crash context (e.g. the cross-CPU NMI service).
+ */
+int sdei_event_signal(u32 event_num, u64 mpidr)
+{
+ return invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_SIGNAL, event_num,
+ mpidr, 0, 0, 0, NULL);
+}
+NOKPROBE_SYMBOL(sdei_event_signal);
+
/*
* Was SDEI firmware probed and is it usable? Lets optional consumers skip
* registering an event -- and the warning a failed registration emits -- on
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index b07113eeeff7..b9dc21c241be 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -37,6 +37,12 @@ int sdei_event_unregister(u32 event_num);
int sdei_event_enable(u32 event_num);
int sdei_event_disable(u32 event_num);
+/*
+ * Signal the software-signalled event (event 0) to another PE, NMI-like.
+ * @mpidr is the target's MPIDR affinity.
+ */
+int sdei_event_signal(u32 event_num, u64 mpidr);
+
/* Was SDEI firmware probed and usable? */
bool sdei_is_present(void);
diff --git a/include/uapi/linux/arm_sdei.h b/include/uapi/linux/arm_sdei.h
index af0630ba5437..22eb61612673 100644
--- a/include/uapi/linux/arm_sdei.h
+++ b/include/uapi/linux/arm_sdei.h
@@ -22,6 +22,7 @@
#define SDEI_1_0_FN_SDEI_PE_UNMASK SDEI_1_0_FN(0x0C)
#define SDEI_1_0_FN_SDEI_INTERRUPT_BIND SDEI_1_0_FN(0x0D)
#define SDEI_1_0_FN_SDEI_INTERRUPT_RELEASE SDEI_1_0_FN(0x0E)
+#define SDEI_1_0_FN_SDEI_EVENT_SIGNAL SDEI_1_0_FN(0x0F)
#define SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI_1_0_FN(0x11)
#define SDEI_1_0_FN_SDEI_SHARED_RESET SDEI_1_0_FN(0x12)
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v4 3/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64
2026-06-17 19:20 [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 1/4] firmware: arm_sdei: add sdei_is_present() Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 2/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
@ 2026-06-17 19:20 ` Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 4/4] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
3 siblings, 0 replies; 7+ messages in thread
From: Kiryl Shutsemau @ 2026-06-17 19:20 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Deliver an NMI-like event to an interrupt-masked arm64 CPU via the
standard SDEI software-signalled event (event 0), without the pseudo-NMI
hot-path cost: register a handler for event 0 and poke a target with
sdei_event_signal(0, mpidr).
First user is arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
hung-task/soft-lockup dumps), which otherwise rides an IPI that can't
reach a masked CPU. Falls back to the IPI path when SDEI is absent; no
watchdog backend yet, so the stock detector is untouched.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
MAINTAINERS | 2 +-
arch/arm64/include/asm/nmi.h | 24 +++++
arch/arm64/kernel/smp.c | 11 +++
drivers/firmware/Kconfig | 19 ++++
drivers/firmware/Makefile | 1 +
drivers/firmware/arm_sdei_nmi.c | 152 ++++++++++++++++++++++++++++++++
6 files changed, 208 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/nmi.h
create mode 100644 drivers/firmware/arm_sdei_nmi.c
diff --git a/MAINTAINERS b/MAINTAINERS
index c8d4b913f26c..b5ddfb85dce9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24797,7 +24797,7 @@ M: James Morse <james.morse@arm.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
F: Documentation/devicetree/bindings/arm/firmware/sdei.txt
-F: drivers/firmware/arm_sdei.c
+F: drivers/firmware/arm_sdei*
F: include/linux/arm_sdei.h
F: include/uapi/linux/arm_sdei.h
diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
new file mode 100644
index 000000000000..9366be419d18
--- /dev/null
+++ b/arch/arm64/include/asm/nmi.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_NMI_H
+#define __ASM_NMI_H
+
+#include <linux/cpumask.h>
+
+/*
+ * Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
+ * its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
+ * drivers/firmware/arm_sdei_nmi.c implements them when active; a future
+ * FEAT_NMI provider could slot in here too. The stubs let callers stay
+ * unconditional when ARM_SDEI_NMI is off.
+ */
+#ifdef CONFIG_ARM_SDEI_NMI
+bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
+#else
+static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+ int exclude_cpu)
+{
+ return false;
+}
+#endif
+
+#endif /* __ASM_NMI_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 1aa324104afb..a670434a8cae 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -45,6 +45,7 @@
#include <asm/daifflags.h>
#include <asm/kvm_mmu.h>
#include <asm/mmu_context.h>
+#include <asm/nmi.h>
#include <asm/numa.h>
#include <asm/processor.h>
#include <asm/smp_plat.h>
@@ -927,6 +928,16 @@ static void arm64_backtrace_ipi(cpumask_t *mask)
void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
{
+ /*
+ * Prefer the SDEI cross-CPU NMI provider when active: firmware
+ * dispatches the event out of EL3 and reaches CPUs that have
+ * interrupts locally masked, without the per-IRQ-mask cost that
+ * pseudo-NMI pays for the same reach. The plain IPI path below
+ * can't reach such a CPU unless pseudo-NMI is enabled.
+ */
+ if (sdei_nmi_trigger_cpumask_backtrace(mask, exclude_cpu))
+ return;
+
/*
* NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the name,
* nothing about it truly needs to be implemented using an NMI, it's
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index bbd2155d8483..6501087ff90d 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -36,6 +36,25 @@ config ARM_SDE_INTERFACE
standard for registering callbacks from the platform firmware
into the OS. This is typically used to implement RAS notifications.
+config ARM_SDEI_NMI
+ bool "SDEI-based cross-CPU NMI service (arm64)"
+ depends on ARM64 && ARM_SDE_INTERFACE
+ help
+ Provides SDEI-based cross-CPU NMI delivery for hooks that need
+ to reach interrupt-masked CPUs on silicon that lacks FEAT_NMI:
+
+ - arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
+ hardlockup_all_cpu_backtrace, soft-lockup secondary dumps,
+ hung-task auxiliary dumps)
+
+ The driver registers a handler for the SDEI software-signalled
+ event (event 0) and reaches a target CPU by signalling it with
+ SDEI_EVENT_SIGNAL. Firmware delivers the event out of EL3
+ regardless of the target's PSTATE.DAIF -- forced delivery into a
+ CPU wedged with interrupts locally masked.
+
+ If unsure, say N.
+
config EDD
tristate "BIOS Enhanced Disk Drive calls determine boot disk"
depends on X86
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 4ddec2820c96..be46f1e1dc77 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -4,6 +4,7 @@
#
obj-$(CONFIG_ARM_SCPI_PROTOCOL) += arm_scpi.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += arm_sdei.o
+obj-$(CONFIG_ARM_SDEI_NMI) += arm_sdei_nmi.o
obj-$(CONFIG_DMI) += dmi_scan.o
obj-$(CONFIG_DMI_SYSFS) += dmi-sysfs.o
obj-$(CONFIG_EDD) += edd.o
diff --git a/drivers/firmware/arm_sdei_nmi.c b/drivers/firmware/arm_sdei_nmi.c
new file mode 100644
index 000000000000..2a78cee203fb
--- /dev/null
+++ b/drivers/firmware/arm_sdei_nmi.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm64 SDEI-based cross-CPU NMI service.
+ *
+ * Delivering an "NMI-shaped" event to an EL1 context that has locally
+ * masked interrupts, on silicon without FEAT_NMI, can be done two ways:
+ *
+ * - pseudo-NMI: mask "interrupts" via the GIC priority register
+ * (ICC_PMR_EL1) instead of PSTATE.DAIF, leaving a high-priority band
+ * deliverable. Functionally this works -- but it reimplements every
+ * local_irq_disable()/enable() and exception entry/exit as a PMR
+ * write plus synchronisation, a cost paid on that hot path forever,
+ * whether or not an NMI is ever delivered.
+ *
+ * - SDEI: leave interrupt masking as the cheap PSTATE.DAIF operation
+ * and have the firmware bounce an EL3-routed Group-0 SGI back to
+ * NS-EL1 as an event callback. The cost is a firmware round-trip,
+ * but only at the rare moment delivery is actually needed.
+ *
+ * This driver takes the second path: it keeps the IRQ-mask hot path
+ * free and pays only when it fires, which is what makes cross-CPU NMI
+ * affordable on hardware where the pseudo-NMI tax isn't, until FEAT_NMI
+ * makes NMI masking cheap in the architecture itself.
+ *
+ * Capabilities provided:
+ *
+ * - sdei_nmi_trigger_cpumask_backtrace() — override for arm64's
+ * arch_trigger_cpumask_backtrace(), so sysrq-l, RCU stall dumps,
+ * hardlockup_all_cpu_backtrace, soft-lockup/hung-task secondary
+ * dumps all reach interrupt-masked CPUs.
+ *
+ * Delivery uses the standard SDEI software-signalled event (event 0) and
+ * SDEI_EVENT_SIGNAL. We register a handler for event 0, enable it, and
+ * poke a target CPU with sdei_event_signal(0, mpidr): firmware makes
+ * event 0 pending on that PE and dispatches the handler NMI-like,
+ * regardless of the target's DAIF.
+ * Availability is simply whether event 0 registers and enables -- if SDEI
+ * and its software-signalled event are present we use it, otherwise the
+ * driver stays inert.
+ */
+
+#define pr_fmt(fmt) "sdei_nmi: " fmt
+
+#include <linux/arm_sdei.h>
+#include <linux/cpumask.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/nmi.h>
+#include <linux/printk.h>
+#include <linux/ptrace.h>
+#include <linux/smp.h>
+#include <linux/types.h>
+
+#include <asm/nmi.h>
+#include <asm/smp_plat.h>
+
+static bool sdei_nmi_available;
+
+#define SDEI_NMI_EVENT 0
+
+static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
+{
+ /*
+ * nmi_cpu_backtrace() no-ops unless this CPU's bit is set in the
+ * global backtrace mask (driven by nmi_trigger_cpumask_backtrace()),
+ * so a fire that reaches a CPU not being backtraced is harmless.
+ */
+ nmi_cpu_backtrace(regs);
+ return SDEI_EV_HANDLED;
+}
+NOKPROBE_SYMBOL(sdei_nmi_handler);
+
+static void sdei_nmi_fire(unsigned int target_cpu)
+{
+ int err = sdei_event_signal(SDEI_NMI_EVENT, cpu_logical_map(target_cpu));
+
+ if (err)
+ pr_warn("SDEI_EVENT_SIGNAL to CPU %u failed: %d\n",
+ target_cpu, err);
+}
+
+/*
+ * Raise callback for nmi_trigger_cpumask_backtrace(): signal event 0
+ * at every CPU still pending in @mask. The framework excludes the local
+ * CPU from @mask before calling us.
+ */
+static void sdei_nmi_raise_backtrace(cpumask_t *mask)
+{
+ unsigned int cpu;
+
+ for_each_cpu(cpu, mask)
+ sdei_nmi_fire(cpu);
+}
+
+/*
+ * Override hook for arch_trigger_cpumask_backtrace() (see
+ * arch/arm64/kernel/smp.c). Returns true when SDEI handled the request,
+ * which is the case whenever SDEI is active; on a false return the arch
+ * falls back to its regular-IRQ (or pseudo-NMI, if enabled) IPI.
+ *
+ * On a kernel built without paying the pseudo-NMI hot-path cost (the
+ * usual case for this driver's target), the IPI can't reach a CPU that
+ * has interrupts masked -- so the backtrace of the one CPU you care
+ * about comes back empty. SDEI is dispatched out of EL3 and lands
+ * regardless of the target's DAIF, without taxing the IRQ-mask path.
+ */
+bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
+{
+ if (!sdei_nmi_available)
+ return false;
+
+ nmi_trigger_cpumask_backtrace(mask, exclude_cpu,
+ sdei_nmi_raise_backtrace);
+ return true;
+}
+
+/*
+ * device_initcall (after arch_initcall(sdei_init), so the SDEI subsystem
+ * is up): probe the firmware, register the event, and turn on the
+ * cross-CPU service. If the probe fails the driver stays inert and the
+ * override hooks decline, leaving the arch's own paths in place.
+ */
+static int __init sdei_nmi_init(void)
+{
+ int err;
+
+ if (!sdei_is_present())
+ return 0;
+
+ err = sdei_event_register(SDEI_NMI_EVENT, sdei_nmi_handler, NULL);
+ if (err) {
+ pr_err("sdei_event_register(%u) failed: %d\n",
+ SDEI_NMI_EVENT, err);
+ return 0;
+ }
+
+ err = sdei_event_enable(SDEI_NMI_EVENT);
+ if (err) {
+ pr_err("sdei_event_enable(%u) failed: %d\n",
+ SDEI_NMI_EVENT, err);
+ sdei_event_unregister(SDEI_NMI_EVENT);
+ return 0;
+ }
+
+ sdei_nmi_available = true;
+ pr_info("using SDEI cross-CPU NMI (SDEI_EVENT_SIGNAL, event %u)\n",
+ SDEI_NMI_EVENT);
+
+ return 0;
+}
+device_initcall(sdei_nmi_init);
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v4 4/4] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort
2026-06-17 19:20 [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
` (2 preceding siblings ...)
2026-06-17 19:20 ` [PATCH v4 3/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
@ 2026-06-17 19:20 ` Kiryl Shutsemau
2026-06-17 20:02 ` Doug Anderson
3 siblings, 1 reply; 7+ messages in thread
From: Kiryl Shutsemau @ 2026-06-17 19:20 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, James Morse
Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
Kiryl Shutsemau (Meta)
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
A CPU wedged with interrupts masked ignores the stop IPI, and without
pseudo-NMI there is no NMI IPI to escalate to: a reboot proceeds with
the CPU still running, and a kdump misses its registers.
Add a third rung to smp_send_stop(): once the IPI (and pseudo-NMI IPI,
if enabled) rungs have run, signal SDEI event 0 at whatever stayed
online. Firmware delivers it regardless of the target's DAIF, so it
reaches a CPU a plain IPI cannot; the target acks by going offline,
which the caller already polls for.
Fold the stop bookkeeping into one arm64_nmi_cpu_stop(regs,
die_on_crash), shared by the stop IPI handlers, panic_smp_self_stop()
and the SDEI handler, replacing the near-duplicate local_cpu_stop() and
ipi_cpu_crash_stop(). @die_on_crash is the only difference: the IPI
handlers pass true and PSCI CPU_OFF the CPU on a crash stop so a capture
kernel can reclaim it; the SDEI handler and self-stop pass false and
park. The SDEI park is required, not conservative -- its handler runs
inside an SDEI event that is never completed (completing it resumes the
wedged context), and a CPU_OFF from that unfinished-event context wedges
EL3 on some firmware (left as a follow-up). The dump is unaffected; only
re-onlining the CPU in an SMP capture kernel is lost.
Suggested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
---
arch/arm64/include/asm/nmi.h | 24 +++++++
arch/arm64/kernel/smp.c | 113 +++++++++++++++++++++-----------
drivers/firmware/Kconfig | 2 +
drivers/firmware/arm_sdei_nmi.c | 86 ++++++++++++++++++++++++
4 files changed, 187 insertions(+), 38 deletions(-)
diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h
index 9366be419d18..2e8974ff8d63 100644
--- a/arch/arm64/include/asm/nmi.h
+++ b/arch/arm64/include/asm/nmi.h
@@ -4,21 +4,45 @@
#include <linux/cpumask.h>
+struct pt_regs;
+
/*
* Cross-CPU NMI provider hooks, consulted by the arm64 arch code before
* its regular-IRQ / pseudo-NMI IPI paths. The SDEI provider in
* drivers/firmware/arm_sdei_nmi.c implements them when active; a future
* FEAT_NMI provider could slot in here too. The stubs let callers stay
* unconditional when ARM_SDEI_NMI is off.
+ *
+ * sdei_nmi_active() lets a caller test for the service before committing
+ * to (and waiting on) the SDEI stop rung; sdei_nmi_stop_cpus() then signals
+ * the targets, which ack by going offline.
*/
#ifdef CONFIG_ARM_SDEI_NMI
bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
+bool sdei_nmi_active(void);
+void sdei_nmi_stop_cpus(const cpumask_t *mask);
#else
static inline bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
int exclude_cpu)
{
return false;
}
+
+static inline bool sdei_nmi_active(void)
+{
+ return false;
+}
+
+static inline void sdei_nmi_stop_cpus(const cpumask_t *mask) { }
#endif
+/*
+ * The common "stop this CPU" entry every arm64 stop path funnels through:
+ * the regular/pseudo-NMI stop IPI handlers, panic_smp_self_stop(), and the
+ * SDEI cross-CPU NMI handler. @die_on_crash powers the CPU off on the kdump
+ * crash path (IPI handlers) instead of parking it (SDEI / self-stop).
+ * Defined in arch/arm64/kernel/smp.c.
+ */
+void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash);
+
#endif /* __ASM_NMI_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index a670434a8cae..05c7007a1f4a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -33,6 +33,7 @@
#include <linux/kernel_stat.h>
#include <linux/kexec.h>
#include <linux/kgdb.h>
+#include <linux/kprobes.h>
#include <linux/kvm_host.h>
#include <linux/nmi.h>
@@ -862,14 +863,62 @@ void arch_irq_work_raise(void)
}
#endif
-static void __noreturn local_cpu_stop(unsigned int cpu)
+/**
+ * arm64_nmi_cpu_stop() - stop the local CPU after it is told to stop.
+ * @regs: register state to record in the vmcore on a crash stop, or NULL for
+ * panic_smp_self_stop(), which has no interrupted context to save.
+ * @die_on_crash: on the kdump crash path, power the CPU off via PSCI CPU_OFF
+ * (so a capture kernel can reclaim it) rather than parking it.
+ *
+ * The single point every arm64 stop path funnels through, keeping the
+ * bookkeeping (mask interrupts, save the crash context, mark offline, mask
+ * SDEI, optionally power off) in one place:
+ *
+ * - the regular IPI_CPU_STOP and pseudo-NMI IPI_CPU_STOP_NMI handlers;
+ * - panic_smp_self_stop(), a CPU parking itself on a parallel panic();
+ * - the SDEI cross-CPU NMI handler (drivers/firmware/arm_sdei_nmi.c),
+ * which reaches CPUs the stop IPIs could not.
+ *
+ * The IPI stop handlers pass @die_on_crash true. The SDEI handler and
+ * panic_smp_self_stop() pass false and only park. For SDEI that is required,
+ * not just conservative: it runs inside an SDEI event that is deliberately
+ * never completed (completing it has firmware resume the wedged context), and
+ * a CPU_OFF from that not-yet-completed context wedges EL3 on some firmware --
+ * a documented follow-up. Parking also matches this path's own fallback when
+ * CPU_OFF is unavailable.
+ */
+void __noreturn arm64_nmi_cpu_stop(struct pt_regs *regs, bool die_on_crash)
{
+ unsigned int cpu = smp_processor_id();
+ bool crash = IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop;
+
+ /*
+ * Use local_daif_mask() instead of local_irq_disable() to make sure
+ * that pseudo-NMIs are disabled. The "stop" code starts with an IRQ
+ * and falls back to NMI (which might be pseudo). If the IRQ finally
+ * goes through right as we're timing out then the NMI could interrupt
+ * us. It's better to prevent the NMI and let the IRQ finish since the
+ * pt_regs will be better.
+ */
+ local_daif_mask();
+
+#ifdef CONFIG_KEXEC_CORE
+ if (crash && regs)
+ crash_save_cpu(regs, cpu);
+#endif
+
+ /* the ack a stop requester (e.g. smp_send_stop()) polls for */
set_cpu_online(cpu, false);
- local_daif_mask();
sdei_mask_local_cpu();
+
+ if (crash && die_on_crash)
+ __cpu_try_die(cpu);
+
+ /* just in case */
cpu_park_loop();
}
+NOKPROBE_SYMBOL(arm64_nmi_cpu_stop);
/*
* We need to implement panic_smp_self_stop() for parallel panic() calls, so
@@ -878,36 +927,7 @@ static void __noreturn local_cpu_stop(unsigned int cpu)
*/
void __noreturn panic_smp_self_stop(void)
{
- local_cpu_stop(smp_processor_id());
-}
-
-static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
-{
-#ifdef CONFIG_KEXEC_CORE
- /*
- * Use local_daif_mask() instead of local_irq_disable() to make sure
- * that pseudo-NMIs are disabled. The "crash stop" code starts with
- * an IRQ and falls back to NMI (which might be pseudo). If the IRQ
- * finally goes through right as we're timing out then the NMI could
- * interrupt us. It's better to prevent the NMI and let the IRQ
- * finish since the pt_regs will be better.
- */
- local_daif_mask();
-
- crash_save_cpu(regs, cpu);
-
- set_cpu_online(cpu, false);
-
- sdei_mask_local_cpu();
-
- if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
- __cpu_try_die(cpu);
-
- /* just in case */
- cpu_park_loop();
-#else
- BUG();
-#endif
+ arm64_nmi_cpu_stop(NULL, false);
}
static void arm64_send_ipi(const cpumask_t *mask, unsigned int nr)
@@ -984,12 +1004,7 @@ static void do_handle_IPI(int ipinr)
case IPI_CPU_STOP:
case IPI_CPU_STOP_NMI:
- if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) {
- ipi_cpu_crash_stop(cpu, get_irq_regs());
- unreachable();
- } else {
- local_cpu_stop(cpu);
- }
+ arm64_nmi_cpu_stop(get_irq_regs(), true);
break;
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
@@ -1263,6 +1278,28 @@ void smp_send_stop(void)
udelay(1);
}
+ /*
+ * If CPUs are *still* online, try the SDEI cross-CPU NMI. Firmware
+ * delivers it regardless of the target's DAIF state, so it reaches
+ * a CPU spinning with interrupts masked, which neither rung above
+ * could (without pseudo-NMI there is no NMI rung at all). Allow
+ * 100ms: a firmware round-trip per CPU, with headroom.
+ */
+ if (num_other_online_cpus() && sdei_nmi_active()) {
+ /* re-snapshot after the rungs above took CPUs offline */
+ smp_rmb();
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+
+ pr_info("SMP: retry stop with SDEI NMI for CPUs %*pbl\n",
+ cpumask_pr_args(&mask));
+
+ sdei_nmi_stop_cpus(&mask);
+ timeout = USEC_PER_MSEC * 100;
+ while (num_other_online_cpus() && timeout--)
+ udelay(1);
+ }
+
if (num_other_online_cpus()) {
smp_rmb();
cpumask_copy(&mask, cpu_online_mask);
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 6501087ff90d..ab0ee36d46e7 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -46,6 +46,8 @@ config ARM_SDEI_NMI
- arch_trigger_cpumask_backtrace() (sysrq-l, RCU stalls,
hardlockup_all_cpu_backtrace, soft-lockup secondary dumps,
hung-task auxiliary dumps)
+ - smp_send_stop() escalation (reboot/halt and the
+ panic / kdump crash stop)
The driver registers a handler for the SDEI software-signalled
event (event 0) and reaches a target CPU by signalling it with
diff --git a/drivers/firmware/arm_sdei_nmi.c b/drivers/firmware/arm_sdei_nmi.c
index 2a78cee203fb..68bbf981fdd7 100644
--- a/drivers/firmware/arm_sdei_nmi.c
+++ b/drivers/firmware/arm_sdei_nmi.c
@@ -29,6 +29,11 @@
* hardlockup_all_cpu_backtrace, soft-lockup/hung-task secondary
* dumps all reach interrupt-masked CPUs.
*
+ * - sdei_nmi_stop_cpus() — the last rung of smp_send_stop()'s
+ * escalation (reboot/halt and the panic/kdump crash stop alike),
+ * reaching CPUs that ignored the stop IPIs; on the kdump path the
+ * wedged context is captured into the vmcore before the CPU parks.
+ *
* Delivery uses the standard SDEI software-signalled event (event 0) and
* SDEI_EVENT_SIGNAL. We register a handler for event 0, enable it, and
* poke a target CPU with sdei_event_signal(0, mpidr): firmware makes
@@ -59,8 +64,57 @@ static bool sdei_nmi_available;
#define SDEI_NMI_EVENT 0
+/*
+ * Backtrace and stop both ride SDEI event 0. That is not a chosen economy:
+ * event 0 is the only architecturally software-signalled event -- the sole
+ * event SDEI_EVENT_SIGNAL can target at an arbitrary PE. Every other event
+ * number is a firmware/platform interrupt-bound event, not something the
+ * kernel can raise cross-CPU, so a dedicated "stop" event would need
+ * firmware to define and bind it -- exactly the firmware dependency this
+ * driver sets out to avoid.
+ *
+ * Sharing one event means the handler must tell a stop apart from a
+ * backtrace. A stop is terminal and system-wide -- sdei_nmi_stop_cpus() is
+ * only reached from smp_send_stop() (reboot/halt/panic/kdump), which never
+ * returns -- so once a stop is requested, every later event-0 fire is a
+ * stop too. A single write-once flag therefore carries as much as a
+ * per-CPU mask would: sdei_nmi_stop_cpus() sets it before signalling, and
+ * the handler reads a set flag as "stop this CPU" and a clear flag as
+ * "backtrace" (handled by nmi_cpu_backtrace(), which self-gates on the
+ * framework's backtrace mask). A backtrace fire that races in after a stop
+ * has begun just stops that CPU instead -- harmless, it is going down.
+ */
+static bool sdei_nmi_stopping;
+
static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
{
+ /*
+ * No smp_rmb() pairing sdei_nmi_stop_cpus()'s smp_wmb(): the flag is
+ * the only shared value, and this handler runs only because firmware
+ * delivered the event -- a round-trip past that store -- so the read
+ * cannot be stale and there is no second load for a barrier to order.
+ */
+ if (READ_ONCE(sdei_nmi_stopping)) {
+ /*
+ * Never returns, and deliberately never completes the SDEI
+ * event: SDEI_EVENT_COMPLETE has firmware restore the
+ * interrupted context, which would land the CPU back in
+ * the wedged loop (or in do_idle, which BUGs at
+ * cpuhp_report_idle_dead once it sees itself offline).
+ * Returning a modified pt_regs doesn't help --
+ * arch/arm64/kernel/sdei.c::do_sdei_event only honours a PC
+ * override via its IRQ-state heuristic and otherwise hands
+ * EL3 its own saved-context slot back.
+ *
+ * Trade-off: EL3 retains ~one saved-context slot per parked
+ * CPU until the next hardware reset (~hundreds of bytes per
+ * CPU). Recoverability is unchanged versus an IPI-stopped
+ * CPU: neither comes back without a reset.
+ */
+ arm64_nmi_cpu_stop(regs, false);
+ /* unreachable */
+ }
+
/*
* nmi_cpu_backtrace() no-ops unless this CPU's bit is set in the
* global backtrace mask (driven by nmi_trigger_cpumask_backtrace()),
@@ -115,6 +169,38 @@ bool sdei_nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
return true;
}
+bool sdei_nmi_active(void)
+{
+ return sdei_nmi_available;
+}
+
+/*
+ * Last rung of the stop escalation in smp_send_stop() (see
+ * arch/arm64/kernel/smp.c). The caller runs the regular stop IPI (and
+ * the pseudo-NMI stop IPI, where available) first; @mask holds whatever
+ * stayed online through those -- typically CPUs wedged with interrupts
+ * masked, unreachable by an IPI. Mark the stop in progress and signal
+ * event 0 at each target; a target acks by marking itself offline, which
+ * the caller polls for. The caller has already confirmed sdei_nmi_active().
+ */
+void sdei_nmi_stop_cpus(const cpumask_t *mask)
+{
+ unsigned int cpu;
+
+ WRITE_ONCE(sdei_nmi_stopping, true);
+
+ /*
+ * Publish the flag before signalling. The SMC is a context-sync
+ * event, not a barrier, so WRITE_ONCE() alone could let the store be
+ * observed after the event it triggers. The barrier is cumulative: a
+ * target that sees the event is guaranteed to see the flag.
+ */
+ smp_wmb();
+
+ for_each_cpu(cpu, mask)
+ sdei_nmi_fire(cpu);
+}
+
/*
* device_initcall (after arch_initcall(sdei_init), so the SDEI subsystem
* is up): probe the firmware, register the event, and turn on the
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v4 4/4] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort
2026-06-17 19:20 ` [PATCH v4 4/4] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
@ 2026-06-17 20:02 ` Doug Anderson
0 siblings, 0 replies; 7+ messages in thread
From: Doug Anderson @ 2026-06-17 20:02 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
Marc Zyngier, Petr Mladek, Thomas Gleixner, Andrew Morton,
Baoquan He, Puranjay Mohan, Usama Arif, Breno Leitao,
Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
linux-arm-kernel, linux-kernel, Kiryl Shutsemau (Meta)
Hi,
On Wed, Jun 17, 2026 at 12:20 PM Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> A CPU wedged with interrupts masked ignores the stop IPI, and without
> pseudo-NMI there is no NMI IPI to escalate to: a reboot proceeds with
> the CPU still running, and a kdump misses its registers.
>
> Add a third rung to smp_send_stop(): once the IPI (and pseudo-NMI IPI,
> if enabled) rungs have run, signal SDEI event 0 at whatever stayed
> online. Firmware delivers it regardless of the target's DAIF, so it
> reaches a CPU a plain IPI cannot; the target acks by going offline,
> which the caller already polls for.
>
> Fold the stop bookkeeping into one arm64_nmi_cpu_stop(regs,
> die_on_crash), shared by the stop IPI handlers, panic_smp_self_stop()
> and the SDEI handler, replacing the near-duplicate local_cpu_stop() and
> ipi_cpu_crash_stop(). @die_on_crash is the only difference: the IPI
> handlers pass true and PSCI CPU_OFF the CPU on a crash stop so a capture
> kernel can reclaim it; the SDEI handler and self-stop pass false and
> park. The SDEI park is required, not conservative -- its handler runs
> inside an SDEI event that is never completed (completing it resumes the
> wedged context), and a CPU_OFF from that unfinished-event context wedges
> EL3 on some firmware (left as a follow-up). The dump is unaffected; only
> re-onlining the CPU in an SMP capture kernel is lost.
>
> Suggested-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> ---
> arch/arm64/include/asm/nmi.h | 24 +++++++
> arch/arm64/kernel/smp.c | 113 +++++++++++++++++++++-----------
> drivers/firmware/Kconfig | 2 +
> drivers/firmware/arm_sdei_nmi.c | 86 ++++++++++++++++++++++++
> 4 files changed, 187 insertions(+), 38 deletions(-)
Reviewed-by: Douglas Anderson <dianders@chromium.org>
^ permalink raw reply [flat|nested] 7+ messages in thread