* [PATCH] arm64: mte: optimize asynchronous tag check fault flag check
@ 2020-11-18 3:20 Peter Collingbourne
2020-11-18 11:23 ` Catalin Marinas
2020-11-18 17:48 ` Catalin Marinas
0 siblings, 2 replies; 5+ messages in thread
From: Peter Collingbourne @ 2020-11-18 3:20 UTC (permalink / raw)
To: Catalin Marinas; +Cc: Peter Collingbourne, Linux ARM
We don't need to check for MTE support before checking the flag
because it can only be set if the hardware supports MTE. As a result
we can unconditionally check the flag bit which is expected to be in
a register and therefore the check can be done in a single instruction
instead of first needing to load the hwcaps.
On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
the powersave governor this reduces the cost of a kernel entry/exit
(invalid syscall) from 465.1ns to 463.8ns.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If4dc3501fd4e4f287322f17805509613cfe47d24
---
arch/arm64/kernel/syscall.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index e4c0dadf0d92..f533e83fd722 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
local_daif_restore(DAIF_PROCCTX);
user_exit();
- if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
+ if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
/*
* Process the asynchronous tag check fault before the actual
* syscall. do_notify_resume() will send a signal to userspace
--
2.29.2.299.gdc1121823c-goog
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] arm64: mte: optimize asynchronous tag check fault flag check
2020-11-18 3:20 [PATCH] arm64: mte: optimize asynchronous tag check fault flag check Peter Collingbourne
@ 2020-11-18 11:23 ` Catalin Marinas
2020-11-18 16:53 ` Peter Collingbourne
2020-11-18 17:48 ` Catalin Marinas
1 sibling, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2020-11-18 11:23 UTC (permalink / raw)
To: Peter Collingbourne; +Cc: Linux ARM
On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote:
> We don't need to check for MTE support before checking the flag
> because it can only be set if the hardware supports MTE. As a result
> we can unconditionally check the flag bit which is expected to be in
> a register and therefore the check can be done in a single instruction
> instead of first needing to load the hwcaps.
>
> On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
> the powersave governor this reduces the cost of a kernel entry/exit
> (invalid syscall) from 465.1ns to 463.8ns.
That's less than 0.3%, could be noise as well.
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index e4c0dadf0d92..f533e83fd722 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
> local_daif_restore(DAIF_PROCCTX);
> user_exit();
>
> - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
> + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
system_supports_mte() is actually a static label (well, it expands to
two). So we get a combination of nop/branch jumping over the flags
check. Maybe the difference you are seeing is caused by the extra
instructions changing the cache alignment or confusing the branch
predictor.
I wonder whether changing __cpus_have_const_cap() to use
static_branch_likely() has any effect (given that we expect most
features to be enabled in future CPUs, such change would probably make
sense).
That said, I'm happy to take this patch, most likely it replaces some
static branch with tbnz on this path. Given that MTE is on in defconfig,
do we actually need the IS_ENABLED() check? If we remove it, it would be
more consistent with the similar check in do_notify_resume().
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] arm64: mte: optimize asynchronous tag check fault flag check
2020-11-18 11:23 ` Catalin Marinas
@ 2020-11-18 16:53 ` Peter Collingbourne
2020-11-18 17:07 ` Catalin Marinas
0 siblings, 1 reply; 5+ messages in thread
From: Peter Collingbourne @ 2020-11-18 16:53 UTC (permalink / raw)
To: Catalin Marinas; +Cc: Linux ARM
On Wed, Nov 18, 2020 at 3:24 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote:
> > We don't need to check for MTE support before checking the flag
> > because it can only be set if the hardware supports MTE. As a result
> > we can unconditionally check the flag bit which is expected to be in
> > a register and therefore the check can be done in a single instruction
> > instead of first needing to load the hwcaps.
> >
> > On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
> > the powersave governor this reduces the cost of a kernel entry/exit
> > (invalid syscall) from 465.1ns to 463.8ns.
>
> That's less than 0.3%, could be noise as well.
The numbers are quite stable following the methodology of [1] (one
thing that I forgot to mention there is adb shell stop before taking
measurements). Out of 100 samples there were about 75 in the faster
cluster ranging from 464.9966ns-465.1515ns before and
463.7776ns-463.9312ns after.
> > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> > index e4c0dadf0d92..f533e83fd722 100644
> > --- a/arch/arm64/kernel/syscall.c
> > +++ b/arch/arm64/kernel/syscall.c
> > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
> > local_daif_restore(DAIF_PROCCTX);
> > user_exit();
> >
> > - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
> > + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
>
> system_supports_mte() is actually a static label (well, it expands to
> two). So we get a combination of nop/branch jumping over the flags
> check. Maybe the difference you are seeing is caused by the extra
> instructions changing the cache alignment or confusing the branch
> predictor.
Okay, I wasn't aware of that mechanism. From the disassembly of
syscall.o it did appear that we were loading from hwcap every time.
But looking at the compiler's -S output it looks like there's
something going on with __jump_table which may lead to that happening.
There is still a conditional branch on the presumably static result of
the hwcap which may also be where part of the performance loss is
coming from.
> I wonder whether changing __cpus_have_const_cap() to use
> static_branch_likely() has any effect (given that we expect most
> features to be enabled in future CPUs, such change would probably make
> sense).
>
> That said, I'm happy to take this patch, most likely it replaces some
> static branch with tbnz on this path. Given that MTE is on in defconfig,
> do we actually need the IS_ENABLED() check? If we remove it, it would be
> more consistent with the similar check in do_notify_resume().
If we don't mind the additional conditional branch on
CONFIG_ARM64_MTE=n kernels we don't need the IS_ENABLED() check. I'll
remove it.
Peter
[1] https://lore.kernel.org/linux-arm-kernel/CAMn1gO7HCJiXEzDvBb-=T7znATqyaxE_t-zezqKL0dyXRCG-nQ@mail.gmail.com/
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] arm64: mte: optimize asynchronous tag check fault flag check
2020-11-18 16:53 ` Peter Collingbourne
@ 2020-11-18 17:07 ` Catalin Marinas
0 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2020-11-18 17:07 UTC (permalink / raw)
To: Peter Collingbourne; +Cc: Linux ARM
On Wed, Nov 18, 2020 at 08:53:16AM -0800, Peter Collingbourne wrote:
> On Wed, Nov 18, 2020 at 3:24 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote:
> > > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> > > index e4c0dadf0d92..f533e83fd722 100644
> > > --- a/arch/arm64/kernel/syscall.c
> > > +++ b/arch/arm64/kernel/syscall.c
> > > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
> > > local_daif_restore(DAIF_PROCCTX);
> > > user_exit();
> > >
> > > - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
> > > + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
[...]
> > That said, I'm happy to take this patch, most likely it replaces some
> > static branch with tbnz on this path. Given that MTE is on in defconfig,
> > do we actually need the IS_ENABLED() check? If we remove it, it would be
> > more consistent with the similar check in do_notify_resume().
>
> If we don't mind the additional conditional branch on
> CONFIG_ARM64_MTE=n kernels we don't need the IS_ENABLED() check. I'll
> remove it.
No need to, I'll do it when applying the patch.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] arm64: mte: optimize asynchronous tag check fault flag check
2020-11-18 3:20 [PATCH] arm64: mte: optimize asynchronous tag check fault flag check Peter Collingbourne
2020-11-18 11:23 ` Catalin Marinas
@ 2020-11-18 17:48 ` Catalin Marinas
1 sibling, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2020-11-18 17:48 UTC (permalink / raw)
To: Peter Collingbourne; +Cc: Will Deacon, Linux ARM
On Tue, 17 Nov 2020 19:20:51 -0800, Peter Collingbourne wrote:
> We don't need to check for MTE support before checking the flag
> because it can only be set if the hardware supports MTE. As a result
> we can unconditionally check the flag bit which is expected to be in
> a register and therefore the check can be done in a single instruction
> instead of first needing to load the hwcaps.
>
> On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
> the powersave governor this reduces the cost of a kernel entry/exit
> (invalid syscall) from 465.1ns to 463.8ns.
Applied to arm64 (for-next/misc), thanks!
[1/1] arm64: mte: optimize asynchronous tag check fault flag check
https://git.kernel.org/arm64/c/739003c64283
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-18 17:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-18 3:20 [PATCH] arm64: mte: optimize asynchronous tag check fault flag check Peter Collingbourne
2020-11-18 11:23 ` Catalin Marinas
2020-11-18 16:53 ` Peter Collingbourne
2020-11-18 17:07 ` Catalin Marinas
2020-11-18 17:48 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).