* [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable()
@ 2026-04-20 12:42 Breno Leitao
2026-04-20 13:06 ` Mark Rutland
0 siblings, 1 reply; 5+ messages in thread
From: Breno Leitao @ 2026-04-20 12:42 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon
Cc: leo.bras, mark.rutland, leo.yan, linux-arm-kernel, linux-kernel,
palmer, paulmck, puranjay, usama.arif, kernel-team, Breno Leitao
arch_local_irq_enable() is a small wrapper that dispatches between two
unmask paths: __daif_local_irq_enable() on most systems, and
__pmr_local_irq_enable() on builds that use GIC PMR-based masking
(Pseudo-NMI). Both leaf primitives are already __always_inline; the
wrapper itself is plain "static inline".
In practice the compiler does not always inline the wrapper. When it
gets emitted out-of-line, samples taken inside it during the post-WFI
IRQ unmask in default_idle_call() show up as arch_local_irq_enable
overhead in profiles, with default_idle_call() lost from the unwound
chain.
This matters most at fleet scale. On a large arm64 fleet, the
aggregate effect is that idle CPUs show up in fleet-wide profilers as
"busy stuck in arch_local_irq_enable" instead of as idle
(default_idle_call / cpu_startup_entry). Engineers looking at
fleet-wide top-symbol dashboards see what looks like significant
CPU-bound work in IRQ unmasking and chase a phantom hot path, when in
fact the cost is the WFI wake-up cycle being attributed to the wrong
function. Tooling has to special-case this symbol to suppress it,
which is fragile across kernel versions. Inlining the wrapper makes
idle CPUs appear idle in profiles - which is what they are.
The same misattribution affects driver stalls. arm64 PMU overflow is
delivered as a regular IRQ (no NMI on default builds), so a driver
that holds local_irq_disable() for milliseconds defers every PMU
sample to the moment it calls local_irq_enable(). With the wrapper
out-of-line, the resulting fat sample is credited to
arch_local_irq_enable rather than to the driver, and the FP-unwinder
points the call chain at the driver's caller instead of the driver
itself (the immediate caller is skipped because arch_local_irq_enable
is a leaf with no saved frame). The driver is still visible in the
profile from its other samples, but the stall cost itself is
mis-attributed and the chain leading to it is one frame off, making
fleet-wide root-cause analysis harder than it needs to be. Inlining
the wrapper attributes the stall sample to the driver function that
actually held IRQs disabled.
Trade-offs:
- Minor .text effect: every caller now expands the dispatch +
underlying primitive at its call site. system_uses_irq_prio_masking()
is a static-key check, so on non-pNMI systems the inlined body
collapses to a single MSR daifclr; on pNMI systems it collapses to a
single sysreg write.
- Loss of a debugging convenience: there is no longer an
arch_local_irq_enable symbol to set a breakpoint on. Callers must be
targeted individually.
- Compiler trust: __always_inline overrides size heuristics. The body
is small enough that this should be unobjectionable, but it is a
policy change.
This patch only flips arch_local_irq_enable(). The same reasoning
applies to arch_local_irq_disable()/save()/restore() which share the
identical static-inline-wrapper-around-__always_inline-primitives
pattern. Holding those off until profiles motivate them.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/arm64/include/asm/irqflags.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index d4d7451c2c129..505ef5be53a71 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -40,7 +40,7 @@ static __always_inline void __pmr_local_irq_enable(void)
barrier();
}
-static inline void arch_local_irq_enable(void)
+static __always_inline void arch_local_irq_enable(void)
{
if (system_uses_irq_prio_masking()) {
__pmr_local_irq_enable();
---
base-commit: 615aad0f61e0c7a898184a394dc895c610100d4f
change-id: 20260420-arm64_always_inline-6bc9dd3c17e6
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable()
2026-04-20 12:42 [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable() Breno Leitao
@ 2026-04-20 13:06 ` Mark Rutland
2026-04-20 13:15 ` Breno Leitao
0 siblings, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2026-04-20 13:06 UTC (permalink / raw)
To: Breno Leitao
Cc: Catalin Marinas, Will Deacon, leo.bras, leo.yan, linux-arm-kernel,
linux-kernel, palmer, paulmck, puranjay, usama.arif, kernel-team
On Mon, Apr 20, 2026 at 05:42:11AM -0700, Breno Leitao wrote:
> arch_local_irq_enable() is a small wrapper that dispatches between two
> unmask paths: __daif_local_irq_enable() on most systems, and
> __pmr_local_irq_enable() on builds that use GIC PMR-based masking
> (Pseudo-NMI). Both leaf primitives are already __always_inline; the
> wrapper itself is plain "static inline".
>
> In practice the compiler does not always inline the wrapper.
I think this was my mistake, and we should have marked all the helpers
as __always_inline for noinstr safety, as x86 did in commit:
7a745be1cc90 ("x86/entry: __always_inline irqflags for noinstr")
I think we should mark all of the following as __always_inline in one
go:
* arch_local_irq_enable()
* arch_local_irq_disable()
* arch_local_save_flags()
* arch_irqs_disabled_flags()
* arch_irqs_disabled()
* arch_local_irq_save()
* arch_local_irq_restore()
... which then ensures noinstr safety, and has the side benefit of
giving nicer traces as you're suggesting here.
Are you happy to try that?
Mark.
> When it gets emitted out-of-line, samples taken inside it during the
> post-WFI IRQ unmask in default_idle_call() show up as
> arch_local_irq_enable overhead in profiles, with default_idle_call()
> lost from the unwound chain.
>
> This matters most at fleet scale. On a large arm64 fleet, the
> aggregate effect is that idle CPUs show up in fleet-wide profilers as
> "busy stuck in arch_local_irq_enable" instead of as idle
> (default_idle_call / cpu_startup_entry). Engineers looking at
> fleet-wide top-symbol dashboards see what looks like significant
> CPU-bound work in IRQ unmasking and chase a phantom hot path, when in
> fact the cost is the WFI wake-up cycle being attributed to the wrong
> function. Tooling has to special-case this symbol to suppress it,
> which is fragile across kernel versions. Inlining the wrapper makes
> idle CPUs appear idle in profiles - which is what they are.
>
> The same misattribution affects driver stalls. arm64 PMU overflow is
> delivered as a regular IRQ (no NMI on default builds), so a driver
> that holds local_irq_disable() for milliseconds defers every PMU
> sample to the moment it calls local_irq_enable(). With the wrapper
> out-of-line, the resulting fat sample is credited to
> arch_local_irq_enable rather than to the driver, and the FP-unwinder
> points the call chain at the driver's caller instead of the driver
> itself (the immediate caller is skipped because arch_local_irq_enable
> is a leaf with no saved frame). The driver is still visible in the
> profile from its other samples, but the stall cost itself is
> mis-attributed and the chain leading to it is one frame off, making
> fleet-wide root-cause analysis harder than it needs to be. Inlining
> the wrapper attributes the stall sample to the driver function that
> actually held IRQs disabled.
>
> Trade-offs:
>
> - Minor .text effect: every caller now expands the dispatch +
> underlying primitive at its call site. system_uses_irq_prio_masking()
> is a static-key check, so on non-pNMI systems the inlined body
> collapses to a single MSR daifclr; on pNMI systems it collapses to a
> single sysreg write.
>
> - Loss of a debugging convenience: there is no longer an
> arch_local_irq_enable symbol to set a breakpoint on. Callers must be
> targeted individually.
>
> - Compiler trust: __always_inline overrides size heuristics. The body
> is small enough that this should be unobjectionable, but it is a
> policy change.
>
> This patch only flips arch_local_irq_enable(). The same reasoning
> applies to arch_local_irq_disable()/save()/restore() which share the
> identical static-inline-wrapper-around-__always_inline-primitives
> pattern. Holding those off until profiles motivate them.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> arch/arm64/include/asm/irqflags.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
> index d4d7451c2c129..505ef5be53a71 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -40,7 +40,7 @@ static __always_inline void __pmr_local_irq_enable(void)
> barrier();
> }
>
> -static inline void arch_local_irq_enable(void)
> +static __always_inline void arch_local_irq_enable(void)
> {
> if (system_uses_irq_prio_masking()) {
> __pmr_local_irq_enable();
>
> ---
> base-commit: 615aad0f61e0c7a898184a394dc895c610100d4f
> change-id: 20260420-arm64_always_inline-6bc9dd3c17e6
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable()
2026-04-20 13:06 ` Mark Rutland
@ 2026-04-20 13:15 ` Breno Leitao
2026-04-20 14:14 ` Mark Rutland
0 siblings, 1 reply; 5+ messages in thread
From: Breno Leitao @ 2026-04-20 13:15 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, leo.bras, leo.yan, linux-arm-kernel,
linux-kernel, palmer, paulmck, puranjay, usama.arif, kernel-team
On Mon, Apr 20, 2026 at 02:06:23PM +0100, Mark Rutland wrote:
> On Mon, Apr 20, 2026 at 05:42:11AM -0700, Breno Leitao wrote:
> > arch_local_irq_enable() is a small wrapper that dispatches between two
> > unmask paths: __daif_local_irq_enable() on most systems, and
> > __pmr_local_irq_enable() on builds that use GIC PMR-based masking
> > (Pseudo-NMI). Both leaf primitives are already __always_inline; the
> > wrapper itself is plain "static inline".
> >
> > In practice the compiler does not always inline the wrapper.
>
> I think this was my mistake, and we should have marked all the helpers
> as __always_inline for noinstr safety, as x86 did in commit:
>
> 7a745be1cc90 ("x86/entry: __always_inline irqflags for noinstr")
>
> I think we should mark all of the following as __always_inline in one
> go:
>
> * arch_local_irq_enable()
> * arch_local_irq_disable()
> * arch_local_save_flags()
> * arch_irqs_disabled_flags()
> * arch_irqs_disabled()
> * arch_local_irq_save()
> * arch_local_irq_restore()
>
> ... which then ensures noinstr safety, and has the side benefit of
> giving nicer traces as you're suggesting here.
>
> Are you happy to try that?
Absolutely, I'll work on testing it that and put together a patch
addressing all of them.
Should this be targeted for stable backports as well? If so, which
commit should I reference in the Fixes tag?
Thanks for the quick answer,
--breno
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable()
2026-04-20 13:15 ` Breno Leitao
@ 2026-04-20 14:14 ` Mark Rutland
2026-04-20 14:37 ` Breno Leitao
0 siblings, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2026-04-20 14:14 UTC (permalink / raw)
To: Breno Leitao
Cc: Catalin Marinas, Will Deacon, leo.bras, leo.yan, linux-arm-kernel,
linux-kernel, palmer, paulmck, puranjay, usama.arif, kernel-team
On Mon, Apr 20, 2026 at 06:15:24AM -0700, Breno Leitao wrote:
> On Mon, Apr 20, 2026 at 02:06:23PM +0100, Mark Rutland wrote:
> > On Mon, Apr 20, 2026 at 05:42:11AM -0700, Breno Leitao wrote:
> > > arch_local_irq_enable() is a small wrapper that dispatches between two
> > > unmask paths: __daif_local_irq_enable() on most systems, and
> > > __pmr_local_irq_enable() on builds that use GIC PMR-based masking
> > > (Pseudo-NMI). Both leaf primitives are already __always_inline; the
> > > wrapper itself is plain "static inline".
> > >
> > > In practice the compiler does not always inline the wrapper.
> >
> > I think this was my mistake, and we should have marked all the helpers
> > as __always_inline for noinstr safety, as x86 did in commit:
> >
> > 7a745be1cc90 ("x86/entry: __always_inline irqflags for noinstr")
> >
> > I think we should mark all of the following as __always_inline in one
> > go:
> >
> > * arch_local_irq_enable()
> > * arch_local_irq_disable()
> > * arch_local_save_flags()
> > * arch_irqs_disabled_flags()
> > * arch_irqs_disabled()
> > * arch_local_irq_save()
> > * arch_local_irq_restore()
> >
> > ... which then ensures noinstr safety, and has the side benefit of
> > giving nicer traces as you're suggesting here.
> >
> > Are you happy to try that?
>
> Absolutely, I'll work on testing it that and put together a patch
> addressing all of them.
>
> Should this be targeted for stable backports as well? If so, which
> commit should I reference in the Fixes tag?
I don't think we need to worry about backporting, and can do this as a
cleanup for now unless someone shouts that they're seeing brokenness in
a stable kernel.
There's no specific commit for a fixes tag; this has always been a bit
dodgy, but we've evidently been getting away with it in practice.
Mark.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable()
2026-04-20 14:14 ` Mark Rutland
@ 2026-04-20 14:37 ` Breno Leitao
0 siblings, 0 replies; 5+ messages in thread
From: Breno Leitao @ 2026-04-20 14:37 UTC (permalink / raw)
To: Mark Rutland
Cc: Catalin Marinas, Will Deacon, leo.bras, leo.yan, linux-arm-kernel,
linux-kernel, palmer, paulmck, puranjay, usama.arif, kernel-team
On Mon, Apr 20, 2026 at 03:14:49PM +0100, Mark Rutland wrote:
> On Mon, Apr 20, 2026 at 06:15:24AM -0700, Breno Leitao wrote:
> > On Mon, Apr 20, 2026 at 02:06:23PM +0100, Mark Rutland wrote:
> > >
> > > Are you happy to try that?
> >
> > Absolutely, I'll work on testing it that and put together a patch
> > addressing all of them.
> >
> > Should this be targeted for stable backports as well? If so, which
> > commit should I reference in the Fixes tag?
>
> I don't think we need to worry about backporting, and can do this as a
> cleanup for now unless someone shouts that they're seeing brokenness in
> a stable kernel.
>
> There's no specific commit for a fixes tag; this has always been a bit
> dodgy, but we've evidently been getting away with it in practice.
Ack. I'll run this through production testing for approximately 24
hours, then submit the patch.
Thanks,
--breno
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-20 14:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 12:42 [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable() Breno Leitao
2026-04-20 13:06 ` Mark Rutland
2026-04-20 13:15 ` Breno Leitao
2026-04-20 14:14 ` Mark Rutland
2026-04-20 14:37 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox