All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kiryl Shutsemau <kirill@shutemov.name>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Doug Anderson <dianders@chromium.org>,
	Petr Mladek <pmladek@suse.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>, Puranjay Mohan <puranjay@kernel.org>,
	Usama Arif <usama.arif@linux.dev>,
	Breno Leitao <leitao@debian.org>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Lecopzer Chen <lecopzer@gmail.com>,
	Sumit Garg <sumit.garg@kernel.org>,
	kernel-team@meta.com, kexec@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	"Kiryl Shutsemau (Meta)" <kas@kernel.org>
Subject: [PATCH v2 0/3] arm64: cross-CPU NMI via SDEI
Date: Tue,  9 Jun 2026 14:58:32 +0100	[thread overview]
Message-ID: <cover.1781013134.git.kas@kernel.org> (raw)
In-Reply-To: <cover.1780496779.git.kas@kernel.org>

From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>

A class of debug/observability features needs to interrupt a CPU that has
its interrupts locally masked: the all-CPU backtrace behind sysrq-l /
RCU-stall / hung-task / hard-lockup dumps, and crash_smp_send_stop()
capturing a stuck CPU's state into the vmcore. On arm64 these need a
mechanism that reaches a CPU spinning with DAIF masked, which a normal IPI
cannot.

arm64 has two such mechanisms today:

  - GICv3 pseudo-NMI (interrupt priority masking). Its cost is on the
    interrupt mask/unmask hot path: local_irq_enable() becomes an
    ICC_PMR_EL1 write plus a synchronising barrier, and exception
    entry/exit save and restore the PMR, paid on every CPU whether or not
    an NMI is ever delivered. In our measurements, enabling pseudo-NMI
    costs up to ~5% on real workloads, and ~66% on a syscall-in-a-loop
    microbenchmark. A fleet-wide ~5% regression is not acceptable, so
    these systems run with pseudo-NMI disabled.

  - FEAT_NMI (Armv8.8) -- the architectural fix, but absent from deployed
    silicon and from most of the fleet for years to come.

For deployments that do not run pseudo-NMI, the backtrace and crash paths
are degraded: a plain IPI can't reach the masked CPU, so the backtrace of
the CPU you care about comes back empty and the kdump is missing the
culprit's registers. The hard-lockup detector on these systems is the
software buddy detector (HARDLOCKUP_DETECTOR_BUDDY): it detects a stall
from a neighbour CPU, but it cannot itself interrupt the wedged CPU, so
its report has no stack for the culprit and (with hardlockup_panic) the
panic runs on the bystander.

This series adds a third delivery backend that costs nothing on the hot
path: SDEI. Firmware delivers an SDEI event into a CPU regardless of its
DAIF state, so interrupt masking stays the cheap PSTATE.DAIF operation and
the firmware round-trip is paid only at the rare moment a CPU must be
interrupted.

It does not add a hard-lockup detector. Detection stays with the buddy
detector (CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY); this series gives the
backtrace and crash-stop paths -- including the buddy detector's
backtrace of the stalled CPU -- a way to actually reach a masked CPU.

Mechanism
=========

It uses the standard SDEI software-signalled event (event 0) and the
SDEI_EVENT_SIGNAL call (DEN0054) -- a spec-defined cross-PE signal, not a
vendor extension. The driver registers a handler for event 0 and pokes a
target CPU with sdei_event_signal(0, target_mpidr); firmware makes event 0
pending on that PE and dispatches the handler NMI-like.

No firmware change is required beyond SDEI being enabled, which
firmware-first RAS (APEI/GHES) deployments already have; the only
SDEI-core addition is a thin sdei_event_signal() wrapper over the standard
call.

Prior SDEI watchdog work
========================

Out-of-tree SDEI hard-lockup watchdogs exist (e.g. in the openEuler and
Anolis kernels). They bind the secure physical timer as an SDEI event, so
firmware delivers a periodic self-CPU tick that drives a detector. That
requires a new SDEI interrupt-binding API, pushes the watchdog period into
firmware, and adds secure-timer EOI handling on the kexec path. This
series instead uses only the standard software-signalled event 0, keeps
all timing in the kernel (the buddy detector), and the same delivery
primitive serves the backtrace and crash-stop users, not just lockup
reporting.

Not included / follow-ups
=========================

  - No SDEI hard-lockup-detector backend. v1 had one; it is dropped here.
    The buddy detector plus this series' backtrace already cover the
    no-pseudo-NMI case, and a dedicated SDEI backend duplicated the
    perf-NMI detector it had to compile-exclude. Run PREFER_BUDDY.

  - A CPU stopped by the SDEI rung is parked, not powered off via PSCI
    CPU_OFF. Reaching and dumping the wedged CPU -- the point of the
    series -- works, and this matches ipi_cpu_crash_stop()'s own park
    fallback. The consequence is that an SMP crash-capture kernel cannot
    re-online such a CPU (it stays "already on"); the capture kernel boots
    and runs on the remaining CPUs. Powering the stopped CPU off so a
    capture kernel can reclaim it requires completing the SDEI event and
    then CPU_OFF, which hit a firmware-specific issue still under
    investigation; it is left as a follow-up and does not affect the
    dump's contents.

Testing
=======

Developed on QEMU 'virt' (Trusted Firmware-A with SDEI enabled) and
validated on NVIDIA Grace (Neoverse V2) hardware, under
irqchip.gicv3_pseudo_nmi=0 with HARDLOCKUP_DETECTOR_PREFER_BUDDY=y:

  - sysrq-l backtrace of an interrupt-masked CPU returns its real stack,
    pstate showing DAIF set -- proof SDEI delivered into the masked CPU;
  - buddy detector catches a hard lockup (LKDTM) and the wedged CPU's
    stack is fetched via the SDEI backtrace;
  - reboot/halt and the panic/kdump crash stop reach a wedged CPU via the
    SDEI rung ("SMP: retry stop with SDEI NMI for CPUs N"), and the kdump
    captures the wedged CPU's registers in the vmcore.

Changes since v1
================

  - Dropped the SDEI hard-lockup-detector patch (v1 3/4); use the buddy
    detector instead (Doug Anderson).
  - Reworked the crash-stop patch (v1 4/4) into a third rung of
    smp_send_stop()'s escalation, shared with the IPI stop path and
    covering reboot/halt as well as crash; no on-stack cpumask
    (Doug Anderson).
  - 2/3: split the merged comment in arch_trigger_cpumask_backtrace()
    (Doug Anderson).
  - Renamed the driver to drivers/firmware/arm_sdei_nmi.c, to sit beside
    the SDEI core it builds on (drivers/firmware/arm_sdei.c), and widened
    that entry's MAINTAINERS glob (arm_sdei.c -> arm_sdei*) to cover it.
  - Picked up Reviewed-by from Doug Anderson on 1/3 and 2/3 (the changes
    above are mechanical / comment-only on those two).

v1: https://lore.kernel.org/all/cover.1780496779.git.kas@kernel.org

Also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git sdei-nmi/v2

Kiryl Shutsemau (Meta) (3):
  firmware: arm_sdei: add SDEI_EVENT_SIGNAL support
  drivers/firmware: add SDEI cross-CPU NMI service for arm64
  arm64: escalate smp_send_stop() to an SDEI NMI as a last resort

 MAINTAINERS                     |   2 +-
 arch/arm64/include/asm/nmi.h    |  38 ++++++
 arch/arm64/kernel/smp.c         |  64 ++++++++++
 drivers/firmware/Kconfig        |  21 +++
 drivers/firmware/Makefile       |   1 +
 drivers/firmware/arm_sdei.c     |  12 ++
 drivers/firmware/arm_sdei_nmi.c | 220 ++++++++++++++++++++++++++++++++
 include/linux/arm_sdei.h        |   6 +
 include/uapi/linux/arm_sdei.h   |   1 +
 9 files changed, 364 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/nmi.h
 create mode 100644 drivers/firmware/arm_sdei_nmi.c


base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
--
2.54.0



  parent reply	other threads:[~2026-06-09 13:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 14:36 [PATCH 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 1/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-05 20:46   ` Doug Anderson
2026-06-03 14:36 ` [PATCH 2/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-05 20:54   ` Doug Anderson
2026-06-05 21:29     ` Kiryl Shutsemau
2026-06-03 14:36 ` [PATCH 3/4] arm64: wire SDEI NMI into the hardlockup watchdog Kiryl Shutsemau
2026-06-05 20:03   ` Doug Anderson
2026-06-05 21:11     ` Kiryl Shutsemau
2026-06-05 22:08       ` Doug Anderson
2026-06-03 14:36 ` [PATCH 4/4] arm64: route crash_smp_send_stop() last resort through SDEI Kiryl Shutsemau
2026-06-05 20:42   ` Doug Anderson
2026-06-05 21:46     ` Kiryl Shutsemau
2026-06-09 10:21       ` Kiryl Shutsemau
2026-06-09 13:58 ` Kiryl Shutsemau [this message]
2026-06-09 13:58   ` [PATCH v2 1/3] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-09 13:58   ` [PATCH v2 2/3] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-09 13:58   ` [PATCH v2 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1781013134.git.kas@kernel.org \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=dianders@chromium.org \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kexec@lists.infradead.org \
    --cc=lecopzer@gmail.com \
    --cc=leitao@debian.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=pmladek@suse.com \
    --cc=puranjay@kernel.org \
    --cc=sumit.garg@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=usama.arif@linux.dev \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.