Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] arm64: cross-CPU NMI via SDEI
@ 2026-06-03 14:36 Kiryl Shutsemau
  2026-06-03 14:36 ` [PATCH 1/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Kiryl Shutsemau @ 2026-06-03 14:36 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, James Morse
  Cc: Mark Rutland, Marc Zyngier, Doug Anderson, Petr Mladek,
	Thomas Gleixner, Andrew Morton, Baoquan He, Puranjay Mohan,
	Usama Arif, Breno Leitao, Julien Thierry, Lecopzer Chen,
	Sumit Garg, kernel-team, kexec, linux-arm-kernel, linux-kernel,
	Kiryl Shutsemau (Meta)

From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>

A class of debug/observability features needs to interrupt a CPU that has
its interrupts locally masked: hard-lockup detection, the all-CPU
backtrace behind sysrq-l / RCU-stall / hung-task dumps, and
crash_smp_send_stop() capturing a stuck CPU's state into the vmcore. On
arm64 these need a mechanism that reaches a CPU spinning with DAIF masked,
which a normal IPI cannot.

arm64 has two such mechanisms today:

  - GICv3 pseudo-NMI (interrupt priority masking). This is the preferred
    path and what the perf-based hard-lockup detector
    (HAVE_HARDLOCKUP_DETECTOR_PERF) is built on. Its cost, however, is on
    the interrupt mask/unmask hot path: local_irq_enable() becomes an
    ICC_PMR_EL1 write plus a synchronising barrier, and exception
    entry/exit save and restore the PMR, paid on every CPU whether or not
    an NMI is ever delivered.

    In our measurements, enabling pseudo-NMI costs up to ~5% on real
    workloads, and ~66% on a syscall-in-a-loop microbenchmark that
    maximises exception entry/exit (where pseudo-NMI adds the PMR
    save/restore). A fleet-wide ~5% regression is not acceptable, so these
    systems run with pseudo-NMI disabled — and therefore have no
    hard-lockup detector and degraded backtrace/crash-stop today.

  - FEAT_NMI (Armv8.8) — the architectural fix, but absent from deployed
    silicon and from most of the fleet for years to come.

For deployments that do not run pseudo-NMI (to avoid that standing
hot-path cost), the hard-lockup detector and the backtrace/crash paths
are degraded: a plain IPI can't reach the masked CPU, so the lockup goes
undetected, the backtrace of the CPU you care about comes back empty, and
the kdump is missing the culprit's registers.

This series adds a third delivery backend that costs nothing on the hot
path: SDEI. Firmware delivers an SDEI event into a CPU regardless of its
DAIF state, so interrupt masking stays the cheap PSTATE.DAIF operation and
the firmware round-trip is paid only at the rare moment a CPU must be
interrupted.

Mechanism
=========

It uses the standard SDEI software-signalled event (event 0) and the
SDEI_EVENT_SIGNAL call (DEN0054) — a spec-defined cross-PE signal, not a
vendor extension. The driver registers a handler for event 0 and pokes a
target CPU with sdei_event_signal(0, target_mpidr); firmware makes event 0
pending on that PE and dispatches the handler NMI-like.

No firmware change is required beyond SDEI being enabled, which
firmware-first RAS (APEI/GHES) deployments already have; the only
SDEI-core addition is a thin sdei_event_signal() wrapper over the standard
call.

Clean kdump when a CPU panics from inside the SDEI handler (the
hard-lockup case) is handled by the already-merged sdei_handler_abort(),
which crash_smp_send_stop() calls: it issues SDEI_EVENT_COMPLETE_AND_RESUME
so the firmware-side priority is dropped before the capture kernel boots.

Prior SDEI watchdog work
========================

Out-of-tree SDEI hard-lockup watchdogs exist (e.g. in the openEuler and
Anolis kernels). They take a different mechanism: they bind the secure
physical timer as an SDEI event, so firmware delivers a periodic self-CPU
tick that drives the detector. That requires a new SDEI interrupt-binding
API, pushes the watchdog period (watchdog_thresh) into firmware, and adds
secure-timer EOI handling on the kexec path.

This series instead uses only the standard software-signalled event 0:
the kernel keeps the timing (a per-CPU hrtimer with a buddy heartbeat
check) and firmware does nothing but deliver the cross-CPU poke when a
buddy looks stalled. The result is a smaller, far less firmware-coupled
change — no secure-timer dependency, no new SDEI API, no period in
firmware — and the same delivery primitive serves the backtrace and
crash-stop users, not just the watchdog.

Testing
=======

Developed on QEMU (Trusted Firmware-A with SDEI enabled) and
validated on NVIDIA Grace (Neoverse V2) hardware, under
irqchip.gicv3_pseudo_nmi=0:

  - hard lockup (LKDTM) caught by the SDEI watchdog and panicked, with the
    stack pointing at the wedged code;
  - sysrq-l backtrace of an interrupt-masked CPU returning its real stack;
  - kdump via crash_smp_send_stop() with a wedged CPU, and via a watchdog
    panic from inside the event-0 handler — sdei_handler_abort() fires and
    the capture kernel boots to userspace on the formerly-wedged CPU, with
    its registers present in the vmcore.

Series
======

  [1/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support
        Thin sdei_event_signal() wrapper over the standard call; NMI/crash
        safe (no locks).
  [2/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64
        Register event 0; first user, arch_trigger_cpumask_backtrace().
  [3/4] arm64: wire SDEI NMI into the hardlockup watchdog
        HAVE_HARDLOCKUP_DETECTOR_ARCH backend; boot-time source selection
        with perf-NMI fallback.
  [4/4] arm64: route crash_smp_send_stop() last resort through SDEI
        SDEI as the final escalation rung for CPUs that ignored the normal
        and pseudo-NMI stop IPIs.

Also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git sdei-nmi

 arch/arm64/Kconfig            |   1 +
 arch/arm64/include/asm/nmi.h  |  30 ++
 arch/arm64/kernel/smp.c       |  33 +++
 drivers/firmware/Kconfig      |  23 ++
 drivers/firmware/Makefile     |   1 +
 drivers/firmware/arm_sdei.c   |  12 +
 drivers/firmware/sdei_nmi.c   | 523 ++++++++++++++++++++++++++++++++++
 include/linux/arm_sdei.h      |   6 +
 include/uapi/linux/arm_sdei.h |   1 +
 9 files changed, 630 insertions(+)
 create mode 100644 arch/arm64/include/asm/nmi.h
 create mode 100644 drivers/firmware/sdei_nmi.c


base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
-- 
2.54.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-03 14:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 14:36 [PATCH 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 1/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 2/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 3/4] arm64: wire SDEI NMI into the hardlockup watchdog Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 4/4] arm64: route crash_smp_send_stop() last resort through SDEI Kiryl Shutsemau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox