From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E123C3A1DB for ; Mon, 20 Apr 2026 13:06:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776690394; cv=none; b=eR6d8zr8opT2YqrGoEVguWxx7xLWIdwJ4+ubLOwNeVTNn84fPQ9VtttqXKuDjGJRyz/46eUGqX2+kM2I278fRUj2v78zQopAseXC9Xh9IP6yh7fugrpEd+KmIHvTcQhW5noeiCcS6G0lnkpN8FjH5fVeeSk6Rno4BI+iG4h6Ims= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776690394; c=relaxed/simple; bh=jUHoAfol7T9iVj6kVSoAI99u9qvpdPyDeTW7xW5DPdQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=bjw0cMmSubl3JaMgilM9kMA69Qh3x2oc0FAcErQwdxAvbhxA/btvpvJPqabqio+A3GyOX+LIQ2tyXV90L2nA/l9Lpt/KhsKCDTsEmsyvMoBip3c3Cs9DmUfuJW8qbnmBOOJiuq20wTZIWu7RWFBdBNnUKIGtAWT0oM1diUYxFKE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=pdRU18wS; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="pdRU18wS" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 702FD1516; Mon, 20 Apr 2026 06:06:26 -0700 (PDT) Received: from J2N7QTR9R3 (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 07E2E3F915; Mon, 20 Apr 2026 06:06:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1776690392; bh=jUHoAfol7T9iVj6kVSoAI99u9qvpdPyDeTW7xW5DPdQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=pdRU18wSpeyGyG4SP+DC+hW+lfoV/t93MQa7+7vni/K/TFjx3Kxt6CqfDH+sf0Ruk pGseLRVJlJ6yBXiPoDSbqZX8hGeF/9p2cr3uGcx5JYy806ik1e1yFZXO8xfYYrxnCm cji1Y1r8criycxoFI9wngD72a/2zIYNsxrq3RpDM= Date: Mon, 20 Apr 2026 14:06:23 +0100 From: Mark Rutland To: Breno Leitao Cc: Catalin Marinas , Will Deacon , leo.bras@arm.com, leo.yan@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, palmer@dabbelt.com, paulmck@kernel.org, puranjay@kernel.org, usama.arif@linux.dev, kernel-team@meta.com Subject: Re: [PATCH RFC] arm64/irqflags: force inline of arch_local_irq_enable() Message-ID: References: <20260420-arm64_always_inline-v1-1-dba919cf46bc@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260420-arm64_always_inline-v1-1-dba919cf46bc@debian.org> On Mon, Apr 20, 2026 at 05:42:11AM -0700, Breno Leitao wrote: > arch_local_irq_enable() is a small wrapper that dispatches between two > unmask paths: __daif_local_irq_enable() on most systems, and > __pmr_local_irq_enable() on builds that use GIC PMR-based masking > (Pseudo-NMI). Both leaf primitives are already __always_inline; the > wrapper itself is plain "static inline". > > In practice the compiler does not always inline the wrapper. I think this was my mistake, and we should have marked all the helpers as __always_inline for noinstr safety, as x86 did in commit: 7a745be1cc90 ("x86/entry: __always_inline irqflags for noinstr") I think we should mark all of the following as __always_inline in one go: * arch_local_irq_enable() * arch_local_irq_disable() * arch_local_save_flags() * arch_irqs_disabled_flags() * arch_irqs_disabled() * arch_local_irq_save() * arch_local_irq_restore() ... which then ensures noinstr safety, and has the side benefit of giving nicer traces as you're suggesting here. Are you happy to try that? Mark. > When it gets emitted out-of-line, samples taken inside it during the > post-WFI IRQ unmask in default_idle_call() show up as > arch_local_irq_enable overhead in profiles, with default_idle_call() > lost from the unwound chain. > > This matters most at fleet scale. On a large arm64 fleet, the > aggregate effect is that idle CPUs show up in fleet-wide profilers as > "busy stuck in arch_local_irq_enable" instead of as idle > (default_idle_call / cpu_startup_entry). Engineers looking at > fleet-wide top-symbol dashboards see what looks like significant > CPU-bound work in IRQ unmasking and chase a phantom hot path, when in > fact the cost is the WFI wake-up cycle being attributed to the wrong > function. Tooling has to special-case this symbol to suppress it, > which is fragile across kernel versions. Inlining the wrapper makes > idle CPUs appear idle in profiles - which is what they are. > > The same misattribution affects driver stalls. arm64 PMU overflow is > delivered as a regular IRQ (no NMI on default builds), so a driver > that holds local_irq_disable() for milliseconds defers every PMU > sample to the moment it calls local_irq_enable(). With the wrapper > out-of-line, the resulting fat sample is credited to > arch_local_irq_enable rather than to the driver, and the FP-unwinder > points the call chain at the driver's caller instead of the driver > itself (the immediate caller is skipped because arch_local_irq_enable > is a leaf with no saved frame). The driver is still visible in the > profile from its other samples, but the stall cost itself is > mis-attributed and the chain leading to it is one frame off, making > fleet-wide root-cause analysis harder than it needs to be. Inlining > the wrapper attributes the stall sample to the driver function that > actually held IRQs disabled. > > Trade-offs: > > - Minor .text effect: every caller now expands the dispatch + > underlying primitive at its call site. system_uses_irq_prio_masking() > is a static-key check, so on non-pNMI systems the inlined body > collapses to a single MSR daifclr; on pNMI systems it collapses to a > single sysreg write. > > - Loss of a debugging convenience: there is no longer an > arch_local_irq_enable symbol to set a breakpoint on. Callers must be > targeted individually. > > - Compiler trust: __always_inline overrides size heuristics. The body > is small enough that this should be unobjectionable, but it is a > policy change. > > This patch only flips arch_local_irq_enable(). The same reasoning > applies to arch_local_irq_disable()/save()/restore() which share the > identical static-inline-wrapper-around-__always_inline-primitives > pattern. Holding those off until profiles motivate them. > > Signed-off-by: Breno Leitao > --- > arch/arm64/include/asm/irqflags.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h > index d4d7451c2c129..505ef5be53a71 100644 > --- a/arch/arm64/include/asm/irqflags.h > +++ b/arch/arm64/include/asm/irqflags.h > @@ -40,7 +40,7 @@ static __always_inline void __pmr_local_irq_enable(void) > barrier(); > } > > -static inline void arch_local_irq_enable(void) > +static __always_inline void arch_local_irq_enable(void) > { > if (system_uses_irq_prio_masking()) { > __pmr_local_irq_enable(); > > --- > base-commit: 615aad0f61e0c7a898184a394dc895c610100d4f > change-id: 20260420-arm64_always_inline-6bc9dd3c17e6 > > Best regards, > -- > Breno Leitao >