Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Kiryl Shutsemau <kirill@shutemov.name>
To: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	 Will Deacon <will@kernel.org>, James Morse <james.morse@arm.com>,
	 Mark Rutland <mark.rutland@arm.com>,
	Doug Anderson <dianders@chromium.org>,
	 Petr Mladek <pmladek@suse.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>,
	 Puranjay Mohan <puranjay@kernel.org>,
	Usama Arif <usama.arif@linux.dev>,
	 Breno Leitao <leitao@debian.org>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	 Lecopzer Chen <lecopzer@gmail.com>,
	Sumit Garg <sumit.garg@kernel.org>,
	kernel-team@meta.com,  kexec@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI
Date: Mon, 22 Jun 2026 14:56:16 +0100	[thread overview]
Message-ID: <ajk-Vge2qhaY-TwJ@thinkstation> (raw)
In-Reply-To: <868q8asj1u.wl-maz@kernel.org>

On Fri, Jun 19, 2026 at 03:26:21PM +0100, Marc Zyngier wrote:
> > Does your firmware set ICC_CTLR_EL1.PMHE? I'd be curious to see the
> > numbers if the DSB was omitted on the enable path.
>
> I certainly don't observe this sort of overhead on the HW I have
> access to, and would like to understand where this is coming from with
> actual profiling data.

Full disclosure: the ~66% figures come from internal testing about a year ago.
I no longer have the details of the machine it ran on and can't confirm whether
ICC_CTLR_EL1.PMHE was set there -- it may well have been. I shouldn't have
carried those numbers forward without being able to stand behind them, so
please disregard them.

Here are fresh numbers from NVIDIA Grace (Neoverse V2). Importantly, this
box reports:

  GICv3: Pseudo-NMIs enabled using relaxed ICC_PMR_EL1 synchronisation

i.e. PMHE == 0, so the synchronising DSB on the unmask path is already
patched to a NOP (ARM64_HAS_GIC_PRIO_RELAXED_SYNC). What's left is the
floor cost of PMR-based masking itself plus the PMR save/restore on
exception entry/exit -- not the DSB. So this is the case Catalin asked
about (DSB omitted), and there is still a measurable cost.

A trivial single-threaded gettid() loop (1e6 calls, median of 5,
performance governor, ASLR off):

  pseudo_nmi=0 (DAIF):       178.4 ns/call
  pseudo_nmi=1 (PMR):        252.5 ns/call
  delta:                     +74.1 ns/call  (~230-250 cycles)
                             +41.5% wall time / 0.706 throughput

  --- u-bench.c ---
  #include <unistd.h>
  #include <sys/syscall.h>
  #include <time.h>
  #include <stdio.h>
  int main(void) {
          struct timespec a, b;
          clock_gettime(CLOCK_MONOTONIC, &a);
          for (long i = 0; i < 1000000; i++)
                  syscall(SYS_gettid);
          clock_gettime(CLOCK_MONOTONIC, &b);
          printf("%f ns\n", (b.tv_sec-a.tv_sec)*1e9 + (b.tv_nsec-a.tv_nsec));
          return 0;
  }

will-it-scale agrees independently. sched_yield (ops/s, median of 5):

                      1 task       72 tasks
  pseudo_nmi=0      3,195,656    230,824,534
  pseudo_nmi=1      2,253,753    163,914,837
  ratio                0.705          0.710

The ratio is flat across the whole 1-to-72 sweep, so -- relevant to the
scalability question -- it's a constant per-syscall tax, not a contention
effect. The impact tracks syscall/exception density: page_fault1, a more
realistic workload, stays within ~5%.

> The direction of travel is to deprecate SDEI. I wouldn't add more stuff
> on top of this interface.

I understand FEAT_NMI is the long-term answer, and I'm not arguing against
deprecating SDEI. My concern is the gap in between. By our estimate it's
10+ years before the last non-FEAT_NMI machine retires from the fleet --
for scale, we're still running Skylake today. So there's roughly a
decade where a large installed base has neither FEAT_NMI nor affordable
pseudo-NMI, and no way to reach a DAIF-masked CPU for an all-CPU
backtrace or to capture a wedged CPU in a crash dump. That's the
functional gap this series tries to cover.

Given the deprecation direction, I deliberately kept the SDEI footprint as
small as I could. The series adds no new firmware interface and no vendor
SMC -- it uses only the standard software-signalled event (event 0) via
SDEI_EVENT_SIGNAL, which is already present on these systems for
firmware-first RAS (APEI/GHES). And SDEI is only ever invoked in a "bad
state": to deliver a backtrace signal to a CPU that a normal IPI can't
reach, or to stop a CPU that ignored the stop IPIs. Nothing on any hot or
steady-state path touches it.

If even that minimal use is unacceptable on a deprecated interface, I'd
rather know now and redirect the effort -- but I'd appreciate a pointer to
what should cover this gap for existing silicon in the meantime.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


  reply	other threads:[~2026-06-22 13:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 19:20 [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 1/4] firmware: arm_sdei: add sdei_is_present() Kiryl Shutsemau
2026-06-17 20:02   ` Doug Anderson
2026-06-17 19:20 ` [PATCH v4 2/4] firmware: arm_sdei: add SDEI_EVENT_SIGNAL support Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 3/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64 Kiryl Shutsemau
2026-06-18 10:46   ` Julian Braha
2026-06-18 15:48     ` Kiryl Shutsemau
2026-06-17 19:20 ` [PATCH v4 4/4] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort Kiryl Shutsemau
2026-06-17 20:02   ` Doug Anderson
2026-06-19 14:00 ` [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI Catalin Marinas
2026-06-19 14:26   ` Marc Zyngier
2026-06-22 13:56     ` Kiryl Shutsemau [this message]
2026-06-22 16:52       ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajk-Vge2qhaY-TwJ@thinkstation \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=dianders@chromium.org \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=kexec@lists.infradead.org \
    --cc=lecopzer@gmail.com \
    --cc=leitao@debian.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=pmladek@suse.com \
    --cc=puranjay@kernel.org \
    --cc=sumit.garg@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=usama.arif@linux.dev \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox