From: Kiryl Shutsemau <kirill@shutemov.name>
To: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
Marc Zyngier <maz@kernel.org>,
Doug Anderson <dianders@chromium.org>,
Petr Mladek <pmladek@suse.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
Baoquan He <bhe@redhat.com>, Puranjay Mohan <puranjay@kernel.org>,
Usama Arif <usama.arif@linux.dev>,
Breno Leitao <leitao@debian.org>,
Julien Thierry <julien.thierry.kdev@gmail.com>,
Lecopzer Chen <lecopzer@gmail.com>,
Sumit Garg <sumit.garg@kernel.org>,
kernel-team@meta.com, kexec@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
"Kiryl Shutsemau (Meta)" <kas@kernel.org>
Subject: [PATCH v2 0/3] arm64: cross-CPU NMI via SDEI
Date: Tue, 9 Jun 2026 14:58:32 +0100 [thread overview]
Message-ID: <cover.1781013134.git.kas@kernel.org> (raw)
In-Reply-To: <cover.1780496779.git.kas@kernel.org>
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
A class of debug/observability features needs to interrupt a CPU that has
its interrupts locally masked: the all-CPU backtrace behind sysrq-l /
RCU-stall / hung-task / hard-lockup dumps, and crash_smp_send_stop()
capturing a stuck CPU's state into the vmcore. On arm64 these need a
mechanism that reaches a CPU spinning with DAIF masked, which a normal IPI
cannot.
arm64 has two such mechanisms today:
- GICv3 pseudo-NMI (interrupt priority masking). Its cost is on the
interrupt mask/unmask hot path: local_irq_enable() becomes an
ICC_PMR_EL1 write plus a synchronising barrier, and exception
entry/exit save and restore the PMR, paid on every CPU whether or not
an NMI is ever delivered. In our measurements, enabling pseudo-NMI
costs up to ~5% on real workloads, and ~66% on a syscall-in-a-loop
microbenchmark. A fleet-wide ~5% regression is not acceptable, so
these systems run with pseudo-NMI disabled.
- FEAT_NMI (Armv8.8) -- the architectural fix, but absent from deployed
silicon and from most of the fleet for years to come.
For deployments that do not run pseudo-NMI, the backtrace and crash paths
are degraded: a plain IPI can't reach the masked CPU, so the backtrace of
the CPU you care about comes back empty and the kdump is missing the
culprit's registers. The hard-lockup detector on these systems is the
software buddy detector (HARDLOCKUP_DETECTOR_BUDDY): it detects a stall
from a neighbour CPU, but it cannot itself interrupt the wedged CPU, so
its report has no stack for the culprit and (with hardlockup_panic) the
panic runs on the bystander.
This series adds a third delivery backend that costs nothing on the hot
path: SDEI. Firmware delivers an SDEI event into a CPU regardless of its
DAIF state, so interrupt masking stays the cheap PSTATE.DAIF operation and
the firmware round-trip is paid only at the rare moment a CPU must be
interrupted.
It does not add a hard-lockup detector. Detection stays with the buddy
detector (CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY); this series gives the
backtrace and crash-stop paths -- including the buddy detector's
backtrace of the stalled CPU -- a way to actually reach a masked CPU.
Mechanism
=========
It uses the standard SDEI software-signalled event (event 0) and the
SDEI_EVENT_SIGNAL call (DEN0054) -- a spec-defined cross-PE signal, not a
vendor extension. The driver registers a handler for event 0 and pokes a
target CPU with sdei_event_signal(0, target_mpidr); firmware makes event 0
pending on that PE and dispatches the handler NMI-like.
No firmware change is required beyond SDEI being enabled, which
firmware-first RAS (APEI/GHES) deployments already have; the only
SDEI-core addition is a thin sdei_event_signal() wrapper over the standard
call.
Prior SDEI watchdog work
========================
Out-of-tree SDEI hard-lockup watchdogs exist (e.g. in the openEuler and
Anolis kernels). They bind the secure physical timer as an SDEI event, so
firmware delivers a periodic self-CPU tick that drives a detector. That
requires a new SDEI interrupt-binding API, pushes the watchdog period into
firmware, and adds secure-timer EOI handling on the kexec path. This
series instead uses only the standard software-signalled event 0, keeps
all timing in the kernel (the buddy detector), and the same delivery
primitive serves the backtrace and crash-stop users, not just lockup
reporting.
Not included / follow-ups
=========================
- No SDEI hard-lockup-detector backend. v1 had one; it is dropped here.
The buddy detector plus this series' backtrace already cover the
no-pseudo-NMI case, and a dedicated SDEI backend duplicated the
perf-NMI detector it had to compile-exclude. Run PREFER_BUDDY.
- A CPU stopped by the SDEI rung is parked, not powered off via PSCI
CPU_OFF. Reaching and dumping the wedged CPU -- the point of the
series -- works, and this matches ipi_cpu_crash_stop()'s own park
fallback. The consequence is that an SMP crash-capture kernel cannot
re-online such a CPU (it stays "already on"); the capture kernel boots
and runs on the remaining CPUs. Powering the stopped CPU off so a
capture kernel can reclaim it requires completing the SDEI event and
then CPU_OFF, which hit a firmware-specific issue still under
investigation; it is left as a follow-up and does not affect the
dump's contents.
Testing
=======
Developed on QEMU 'virt' (Trusted Firmware-A with SDEI enabled) and
validated on NVIDIA Grace (Neoverse V2) hardware, under
irqchip.gicv3_pseudo_nmi=0 with HARDLOCKUP_DETECTOR_PREFER_BUDDY=y:
- sysrq-l backtrace of an interrupt-masked CPU returns its real stack,
pstate showing DAIF set -- proof SDEI delivered into the masked CPU;
- buddy detector catches a hard lockup (LKDTM) and the wedged CPU's
stack is fetched via the SDEI backtrace;
- reboot/halt and the panic/kdump crash stop reach a wedged CPU via the
SDEI rung ("SMP: retry stop with SDEI NMI for CPUs N"), and the kdump
captures the wedged CPU's registers in the vmcore.
Changes since v1
================
- Dropped the SDEI hard-lockup-detector patch (v1 3/4); use the buddy
detector instead (Doug Anderson).
- Reworked the crash-stop patch (v1 4/4) into a third rung of
smp_send_stop()'s escalation, shared with the IPI stop path and
covering reboot/halt as well as crash; no on-stack cpumask
(Doug Anderson).
- 2/3: split the merged comment in arch_trigger_cpumask_backtrace()
(Doug Anderson).
- Renamed the driver to drivers/firmware/arm_sdei_nmi.c, to sit beside
the SDEI core it builds on (drivers/firmware/arm_sdei.c), and widened
that entry's MAINTAINERS glob (arm_sdei.c -> arm_sdei*) to cover it.
- Picked up Reviewed-by from Doug Anderson on 1/3 and 2/3 (the changes
above are mechanical / comment-only on those two).
v1: https://lore.kernel.org/all/cover.1780496779.git.kas@kernel.org
Also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git sdei-nmi/v2
Kiryl Shutsemau (Meta) (3):
firmware: arm_sdei: add SDEI_EVENT_SIGNAL support
drivers/firmware: add SDEI cross-CPU NMI service for arm64
arm64: escalate smp_send_stop() to an SDEI NMI as a last resort
MAINTAINERS | 2 +-
arch/arm64/include/asm/nmi.h | 38 ++++++
arch/arm64/kernel/smp.c | 64 ++++++++++
drivers/firmware/Kconfig | 21 +++
drivers/firmware/Makefile | 1 +
drivers/firmware/arm_sdei.c | 12 ++
drivers/firmware/arm_sdei_nmi.c | 220 ++++++++++++++++++++++++++++++++
include/linux/arm_sdei.h | 6 +
include/uapi/linux/arm_sdei.h | 1 +
9 files changed, 364 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/nmi.h
create mode 100644 drivers/firmware/arm_sdei_nmi.c
base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
--
2.54.0
next prev parent reply other threads:[~2026-06-09 13:58 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 14:36 [PATCH 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 1/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-05 20:46 ` Doug Anderson
2026-06-03 14:36 ` [PATCH 2/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-05 20:54 ` Doug Anderson
2026-06-05 21:29 ` Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 3/4] arm64: wire SDEI NMI into the hardlockup watchdog Kiryl Shutsemau
2026-06-05 20:03 ` Doug Anderson
2026-06-05 21:11 ` Kiryl Shutsemau
2026-06-05 22:08 ` Doug Anderson
2026-06-03 14:36 ` [PATCH 4/4] arm64: route crash_smp_send_stop() last resort through SDEI Kiryl Shutsemau
2026-06-05 20:42 ` Doug Anderson
2026-06-05 21:46 ` Kiryl Shutsemau
2026-06-09 10:21 ` Kiryl Shutsemau
2026-06-09 13:58 ` Kiryl Shutsemau [this message]
2026-06-09 13:58 ` [PATCH v2 1/3] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-09 13:58 ` [PATCH v2 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-09 13:58 ` [PATCH v2 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1781013134.git.kas@kernel.org \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=catalin.marinas@arm.com \
--cc=dianders@chromium.org \
--cc=james.morse@arm.com \
--cc=julien.thierry.kdev@gmail.com \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=kexec@lists.infradead.org \
--cc=lecopzer@gmail.com \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=pmladek@suse.com \
--cc=puranjay@kernel.org \
--cc=sumit.garg@kernel.org \
--cc=tglx@linutronix.de \
--cc=usama.arif@linux.dev \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox