From: Mark Rutland <mark.rutland@arm.com>
To: He Ying <heying24@huawei.com>
Cc: catalin.marinas@arm.com, will@kernel.org, marcan@marcan.st,
maz@kernel.org, joey.gouly@arm.com, pcc@google.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code
Date: Fri, 7 Jan 2022 13:19:27 +0000 [thread overview]
Message-ID: <Ydg939btY/bzEAe4@FVFF77S0Q05N> (raw)
In-Reply-To: <20220107085536.214501-1-heying24@huawei.com>
On Fri, Jan 07, 2022 at 03:55:36AM -0500, He Ying wrote:
> Our product has been updating its kernel from 4.4 to 5.10 recently and
> found a performance issue. We do a bussiness test called ARP test, which
> tests the latency for a ping-pong packets traffic with a certain payload.
> The result is as following.
>
> - 4.4 kernel: avg = ~20s
> - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s
Have you tested with a recent mainline kernel, e.g. v5.15?
Is this test publicly available, and can you say which hardrware (e.g. which
CPU implementation) you're testing with?
> I have been just learning arm64 pseudo-NMI code and have a question,
> why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI?
The code in question is all patched via alternatives, and when
CONFIG_ARM64_PSEUDO_NMI is not selected, the code was expected to only have the
overhead of the regular DAIF manipulation.
> I wonder if this brings some performance regression.
>
> First, I make this patch and then do the test again. Here's the result.
>
> - 5.10 kernel with this patch not applied: avg = ~40s
> - 5.10 kernel with this patch applied: avg = ~23s
>
> Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not
> set. It seems the pseudo-NMI feature actually brings some overhead to
> performance event if CONFIG_ARM64_PSEUDO_NMI is not set.
I'm surprised the overhead is so significant; as above this is all patched in
and so the overhead when this is disabled is expected to be *extremely* small.
For example, wjen CONFIG_ARM64_PSEUDO_NMI, in arch_local_irq_enable():
* The portion under the system_has_prio_mask_debugging() test will be removed
entirely by the compiler, as this internally checks
IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI).
* The assembly will be left as a write to DAIFClr. The only additional cost
should be that of generating GIC_PRIO_IRQON into a register.
* The pmr_sync() will be removed entirely by the compiler as is defined
conditionally dependent on CONFIG_ARM64_PSEUDO_NMI.
I can't spot an obvious issue with that or ther other cases. In the common case
those add no new instructions, and in the worst case they only add NOPs.
> Furthermore, I find the feature also brings some overhead to vmlinux size.
> I build 5.10 kernel with this patch applied or not while
> CONFIG_ARM64_PSEUDO_NMI is not set.
>
> - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes.
> - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes.
>
> That means arm64 pseudo-NMI feature may bring ~200KB overhead to
> vmlinux size.
I suspect that's just the (unused) alternatives, and we could improve that by
passing the config into the alternative blocks.
> Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size
> and performance even if config is not set. To avoid it, add macro control
> all around the related code.
>
> Signed-off-by: He Ying <heying24@huawei.com>
> ---
> arch/arm64/include/asm/irqflags.h | 38 +++++++++++++++++++++++++++++--
> arch/arm64/kernel/entry.S | 4 ++++
> 2 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
> index b57b9b1e4344..82f771b41cf5 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -26,6 +26,7 @@
> */
> static inline void arch_local_irq_enable(void)
> {
> +#ifdef CONFIG_ARM64_PSEUDO_NMI
> if (system_has_prio_mask_debugging()) {
> u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1);
>
> @@ -41,10 +42,18 @@ static inline void arch_local_irq_enable(void)
> : "memory");
>
> pmr_sync();
> +#else
> + asm volatile(
> + "msr daifclr, #3 // arch_local_irq_enable"
> + :
> + :
> + : "memory");
> +#endif
I'm happy to rework this to improve matters, but I am very much not happy with
duplicating the logic for the !PSEUDO_NMI case. Adding more ifdeffery and
copies of that is not acceptable.
Instead, can you please try changing the alternative to also take the config,
e.g. here have:
| asm volatile(ALTERNATIVE(
| "msr daifclr, #3 // arch_local_irq_enable",
| __msr_s(SYS_ICC_PMR_EL1, "%0"),
| ARM64_HAS_IRQ_PRIO_MASKING,
| CONFIG_ARM64_PSEUDO_NMI)
| :
| : "r" ((unsigned long) GIC_PRIO_IRQON)
| : "memory");
... and see if that makes a significant difference?
Likewise for the other casees.
> #endif /* __ASM_IRQFLAGS_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 2f69ae43941d..ffc32d3d909a 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -300,6 +300,7 @@ alternative_else_nop_endif
> str w21, [sp, #S_SYSCALLNO]
> .endif
>
> +#ifdef CONFIG_ARM64_PSEUDO_NMI
> /* Save pmr */
> alternative_if ARM64_HAS_IRQ_PRIO_MASKING
> mrs_s x20, SYS_ICC_PMR_EL1
> @@ -307,6 +308,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
> mov x20, #GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET
> msr_s SYS_ICC_PMR_EL1, x20
> alternative_else_nop_endif
> +#endif
>
> /* Re-enable tag checking (TCO set on exception entry) */
> #ifdef CONFIG_ARM64_MTE
> @@ -330,6 +332,7 @@ alternative_else_nop_endif
> disable_daif
> .endif
>
> +#ifdef CONFIG_ARM64_PSEUDO_NMI
> /* Restore pmr */
> alternative_if ARM64_HAS_IRQ_PRIO_MASKING
> ldr x20, [sp, #S_PMR_SAVE]
> @@ -339,6 +342,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
> dsb sy // Ensure priority change is seen by redistributor
> .L__skip_pmr_sync\@:
> alternative_else_nop_endif
> +#endif
For these two I think the ifdeffery is fine, but I'm surprised this has a
measureable impact as the alternatives should be initialized to NOPS (and never
modified).
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-01-07 13:21 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-07 8:55 [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code He Ying
2022-01-07 13:19 ` Mark Rutland [this message]
2022-01-10 3:00 ` He Ying
2022-01-10 11:26 ` Mark Rutland
2022-01-11 8:52 ` He Ying
2022-01-11 11:05 ` Mark Rutland
2022-01-08 12:51 ` Marc Zyngier
2022-01-10 3:20 ` He Ying
2022-01-12 3:24 ` [PATCH] arm64: entry: Save some nops when CONFIG_ARM64_PSEUDO_NMI is not set He Ying
2022-01-19 6:40 ` He Ying
2022-01-19 9:35 ` Mark Rutland
2022-01-19 9:47 ` He Ying
2022-02-15 23:18 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ydg939btY/bzEAe4@FVFF77S0Q05N \
--to=mark.rutland@arm.com \
--cc=catalin.marinas@arm.com \
--cc=heying24@huawei.com \
--cc=joey.gouly@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcan@marcan.st \
--cc=maz@kernel.org \
--cc=pcc@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox