* [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context
@ 2022-12-07 10:39 Ard Biesheuvel
2022-12-07 10:39 ` [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled Ard Biesheuvel
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2022-12-07 10:39 UTC (permalink / raw)
To: linux-arm-kernel, linux
Cc: linux-crypto, Ard Biesheuvel, Linus Walleij, Arnd Bergmann
Currently on ARM, we only permit kernel mode NEON in task context, and
NEON based processing triggered from softirq context is queued for
asynchronous completion via the crypto API's cryptd layer.
For IPsec packet encryption involving highly performant crypto
implementations, this results in a substantial performance hit, and so
it would be desirable to permit those crypto operations to complete
synchronously even when invoked from softirq context.
For example, on a 1 GHz Cortex-A53 machine (SynQuacer), AES-256-GCM
executes in 7.2 cycles per byte, putting an upper bound of ~140 MB/s
on the achievable throughput of a single CPU.
Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.
When the crypto algorithm is permitted to execute in softirq context,
the throughput increases to 16.5 MB/s TX and 41 MB/s RX.
(This is measured using debian's iperf3 3.11 with the default options)
So let's reorganize the VFP state handling so that it its critical
handling of the FPU registers runs with softirqs disabled. Then, update
the kernel_neon_begin()/end() logic to keep softirq processing disabled
as long as the NEON is being used in kernel mode.
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Ard Biesheuvel (2):
ARM: vfp: Manipulate VFP state with softirqs disabled
ARM: permit non-nested kernel mode NEON in softirq context
arch/arm/include/asm/assembler.h | 19 ++++++++++++-------
arch/arm/include/asm/simd.h | 8 ++++++++
arch/arm/kernel/asm-offsets.c | 1 +
arch/arm/vfp/entry.S | 4 ++--
arch/arm/vfp/vfphw.S | 4 ++--
arch/arm/vfp/vfpmodule.c | 19 ++++++++++++-------
6 files changed, 37 insertions(+), 18 deletions(-)
create mode 100644 arch/arm/include/asm/simd.h
--
2.35.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled 2022-12-07 10:39 [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context Ard Biesheuvel @ 2022-12-07 10:39 ` Ard Biesheuvel 2022-12-15 10:22 ` Linus Walleij 2022-12-07 10:39 ` [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context Ard Biesheuvel 2022-12-12 14:37 ` [PATCH v2 0/2] ARM: allow " Martin Willi 2 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2022-12-07 10:39 UTC (permalink / raw) To: linux-arm-kernel, linux Cc: linux-crypto, Ard Biesheuvel, Linus Walleij, Arnd Bergmann In a subsequent patch, we will relax the kernel mode NEON policy, and permit kernel mode NEON to be used not only from task context, as is permitted today, but also from softirq context. Given that softirqs may trigger over the back of any IRQ unless they are explicitly disabled, we need to address the resulting races in the VFP state handling, by disabling softirq processing in two distinct but related cases: - kernel mode NEON will leave the FPU disabled after it completes, so any kernel code sequence that enables the FPU and subsequently accesses its registers needs to disable softirqs until it completes; - kernel_neon_begin() will preserve the userland VFP state in memory, and if it interrupts the ordinary VFP state preserve sequence, the latter will resume execution with the VFP registers corrupted, and happily save them to memory. Given that disabling softirqs also disables preemption, we can replace the existing preempt_disable/enable occurrences in the VFP state handling asm code with new macros that dis/enable softirqs instead. In the VFP state handling C code, add local_bh_disable/enable() calls in those places where the VFP state is preserved. One thing to keep in mind is that, once we allow NEON use in softirq context, the result of any such interruption is that the FPEXC_EN bit in the FPEXC register will be cleared, and vfp_current_hw_state[cpu] will be NULL. This means that any sequence that [conditionally] clears FPEXC_EN and/or sets vfp_current_hw_state[cpu] to NULL does not need to run with softirqs disabled, as the result will be the same. Furthermore, the handling of THREAD_NOTIFY_SWITCH is guaranteed to run with IRQs disabled, and so it does not need protection from softirq interruptions either. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm/include/asm/assembler.h | 19 ++++++++++++------- arch/arm/kernel/asm-offsets.c | 1 + arch/arm/vfp/entry.S | 4 ++-- arch/arm/vfp/vfphw.S | 4 ++-- arch/arm/vfp/vfpmodule.c | 8 +++++++- 5 files changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 90fbe4a3f9c8472f..df999b75c0e25b01 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -236,21 +236,26 @@ THUMB( fpreg .req r7 ) sub \tmp, \tmp, #1 @ decrement it str \tmp, [\ti, #TI_PREEMPT] .endm - - .macro dec_preempt_count_ti, ti, tmp - get_thread_info \ti - dec_preempt_count \ti, \tmp - .endm #else .macro inc_preempt_count, ti, tmp .endm .macro dec_preempt_count, ti, tmp .endm +#endif + + .macro local_bh_disable, ti, tmp + ldr \tmp, [\ti, #TI_PREEMPT] + add \tmp, \tmp, #SOFTIRQ_DISABLE_OFFSET + str \tmp, [\ti, #TI_PREEMPT] + .endm - .macro dec_preempt_count_ti, ti, tmp + .macro local_bh_enable_ti, ti, tmp + get_thread_info \ti + ldr \tmp, [\ti, #TI_PREEMPT] + sub \tmp, \tmp, #SOFTIRQ_DISABLE_OFFSET + str \tmp, [\ti, #TI_PREEMPT] .endm -#endif #define USERL(l, x...) \ 9999: x; \ diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index 2c8d76fd7c66298a..38121c59cbc26cdd 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -56,6 +56,7 @@ int main(void) DEFINE(VFP_CPU, offsetof(union vfp_state, hard.cpu)); #endif #endif + DEFINE(SOFTIRQ_DISABLE_OFFSET,SOFTIRQ_DISABLE_OFFSET); #ifdef CONFIG_ARM_THUMBEE DEFINE(TI_THUMBEE_STATE, offsetof(struct thread_info, thumbee_state)); #endif diff --git a/arch/arm/vfp/entry.S b/arch/arm/vfp/entry.S index 27b0a1f27fbdf392..9a89264cdcc0b46e 100644 --- a/arch/arm/vfp/entry.S +++ b/arch/arm/vfp/entry.S @@ -22,7 +22,7 @@ @ IRQs enabled. @ ENTRY(do_vfp) - inc_preempt_count r10, r4 + local_bh_disable r10, r4 ldr r4, .LCvfp ldr r11, [r10, #TI_CPU] @ CPU number add r10, r10, #TI_VFPSTATE @ r10 = workspace @@ -30,7 +30,7 @@ ENTRY(do_vfp) ENDPROC(do_vfp) ENTRY(vfp_null_entry) - dec_preempt_count_ti r10, r4 + local_bh_enable_ti r10, r4 ret lr ENDPROC(vfp_null_entry) diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S index 6f7926c9c1790f66..26c4f61ecfa39638 100644 --- a/arch/arm/vfp/vfphw.S +++ b/arch/arm/vfp/vfphw.S @@ -175,7 +175,7 @@ vfp_hw_state_valid: @ else it's one 32-bit instruction, so @ always subtract 4 from the following @ instruction address. - dec_preempt_count_ti r10, r4 + local_bh_enable_ti r10, r4 ret r9 @ we think we have handled things @@ -200,7 +200,7 @@ skip: @ not recognised by VFP DBGSTR "not VFP" - dec_preempt_count_ti r10, r4 + local_bh_enable_ti r10, r4 ret lr process_exception: diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 2cb355c1b5b71694..8f5bc672b4aac04a 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -416,7 +416,7 @@ void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs) if (exceptions) vfp_raise_exceptions(exceptions, trigger, orig_fpscr, regs); exit: - preempt_enable(); + local_bh_enable(); } static void vfp_enable(void *unused) @@ -517,6 +517,8 @@ void vfp_sync_hwstate(struct thread_info *thread) { unsigned int cpu = get_cpu(); + local_bh_disable(); + if (vfp_state_in_hw(cpu, thread)) { u32 fpexc = fmrx(FPEXC); @@ -528,6 +530,7 @@ void vfp_sync_hwstate(struct thread_info *thread) fmxr(FPEXC, fpexc); } + local_bh_enable(); put_cpu(); } @@ -717,6 +720,8 @@ void kernel_neon_begin(void) unsigned int cpu; u32 fpexc; + local_bh_disable(); + /* * Kernel mode NEON is only allowed outside of interrupt context * with preemption disabled. This will make sure that the kernel @@ -739,6 +744,7 @@ void kernel_neon_begin(void) vfp_save_state(vfp_current_hw_state[cpu], fpexc); #endif vfp_current_hw_state[cpu] = NULL; + local_bh_enable(); } EXPORT_SYMBOL(kernel_neon_begin); -- 2.35.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled 2022-12-07 10:39 ` [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled Ard Biesheuvel @ 2022-12-15 10:22 ` Linus Walleij 0 siblings, 0 replies; 10+ messages in thread From: Linus Walleij @ 2022-12-15 10:22 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux, linux-crypto, Arnd Bergmann On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb@kernel.org> wrote: > In a subsequent patch, we will relax the kernel mode NEON policy, and > permit kernel mode NEON to be used not only from task context, as is > permitted today, but also from softirq context. > > Given that softirqs may trigger over the back of any IRQ unless they are > explicitly disabled, we need to address the resulting races in the VFP > state handling, by disabling softirq processing in two distinct but > related cases: > - kernel mode NEON will leave the FPU disabled after it completes, so > any kernel code sequence that enables the FPU and subsequently accesses > its registers needs to disable softirqs until it completes; > - kernel_neon_begin() will preserve the userland VFP state in memory, > and if it interrupts the ordinary VFP state preserve sequence, the > latter will resume execution with the VFP registers corrupted, and > happily save them to memory. > > Given that disabling softirqs also disables preemption, we can replace > the existing preempt_disable/enable occurrences in the VFP state > handling asm code with new macros that dis/enable softirqs instead. > In the VFP state handling C code, add local_bh_disable/enable() calls > in those places where the VFP state is preserved. > > One thing to keep in mind is that, once we allow NEON use in softirq > context, the result of any such interruption is that the FPEXC_EN bit in > the FPEXC register will be cleared, and vfp_current_hw_state[cpu] will > be NULL. This means that any sequence that [conditionally] clears > FPEXC_EN and/or sets vfp_current_hw_state[cpu] to NULL does not need to > run with softirqs disabled, as the result will be the same. Furthermore, > the handling of THREAD_NOTIFY_SWITCH is guaranteed to run with IRQs > disabled, and so it does not need protection from softirq interruptions > either. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tricky patch, I had to read it a few times and visualize the concepts, but I am sufficiently convinced that it does the right thing. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context 2022-12-07 10:39 [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context Ard Biesheuvel 2022-12-07 10:39 ` [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled Ard Biesheuvel @ 2022-12-07 10:39 ` Ard Biesheuvel 2022-12-15 10:26 ` Linus Walleij 2022-12-12 14:37 ` [PATCH v2 0/2] ARM: allow " Martin Willi 2 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2022-12-07 10:39 UTC (permalink / raw) To: linux-arm-kernel, linux Cc: linux-crypto, Ard Biesheuvel, Linus Walleij, Arnd Bergmann We currently only permit kernel mode NEON in process context, to avoid the need to preserve/restore the NEON register file when taking an exception while running in the kernel. Like we did on arm64, we can relax this restriction substantially, by permitting kernel mode NEON from softirq context, while ensuring that softirq processing is disabled when the NEON is being used in task context. This guarantees that only NEON context belonging to user space needs to be preserved and restored, which is already taken care of. This is especially relevant for network encryption, where incoming frames are typically handled in softirq context, and deferring software decryption to a kernel thread or falling back to C code are both undesirable from a performance PoV. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm/include/asm/simd.h | 8 ++++++++ arch/arm/vfp/vfpmodule.c | 13 ++++++------- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h new file mode 100644 index 0000000000000000..82191dbd7e78a036 --- /dev/null +++ b/arch/arm/include/asm/simd.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include <linux/hardirq.h> + +static __must_check inline bool may_use_simd(void) +{ + return IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && !in_hardirq(); +} diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 8f5bc672b4aac04a..4e1a786df76df157 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -723,12 +723,12 @@ void kernel_neon_begin(void) local_bh_disable(); /* - * Kernel mode NEON is only allowed outside of interrupt context - * with preemption disabled. This will make sure that the kernel - * mode NEON register contents never need to be preserved. + * Kernel mode NEON is only allowed outside of hardirq context with + * preemption and softirq processing disabled. This will make sure that + * the kernel mode NEON register contents never need to be preserved. */ - BUG_ON(in_interrupt()); - cpu = get_cpu(); + BUG_ON(in_hardirq()); + cpu = __smp_processor_id(); fpexc = fmrx(FPEXC) | FPEXC_EN; fmxr(FPEXC, fpexc); @@ -744,7 +744,6 @@ void kernel_neon_begin(void) vfp_save_state(vfp_current_hw_state[cpu], fpexc); #endif vfp_current_hw_state[cpu] = NULL; - local_bh_enable(); } EXPORT_SYMBOL(kernel_neon_begin); @@ -752,7 +751,7 @@ void kernel_neon_end(void) { /* Disable the NEON/VFP unit. */ fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN); - put_cpu(); + local_bh_enable(); } EXPORT_SYMBOL(kernel_neon_end); -- 2.35.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context 2022-12-07 10:39 ` [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context Ard Biesheuvel @ 2022-12-15 10:26 ` Linus Walleij 2022-12-15 10:43 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Linus Walleij @ 2022-12-15 10:26 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux, linux-crypto, Arnd Bergmann On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb@kernel.org> wrote: > We currently only permit kernel mode NEON in process context, to avoid > the need to preserve/restore the NEON register file when taking an > exception while running in the kernel. > > Like we did on arm64, we can relax this restriction substantially, by > permitting kernel mode NEON from softirq context, while ensuring that > softirq processing is disabled when the NEON is being used in task > context. This guarantees that only NEON context belonging to user space > needs to be preserved and restored, which is already taken care of. > > This is especially relevant for network encryption, where incoming > frames are typically handled in softirq context, and deferring software > decryption to a kernel thread or falling back to C code are both > undesirable from a performance PoV. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> So boosting WireGuard as primary SW network encryption user? This is really neat, BTW: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context 2022-12-15 10:26 ` Linus Walleij @ 2022-12-15 10:43 ` Ard Biesheuvel 2022-12-15 10:51 ` Russell King (Oracle) 0 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2022-12-15 10:43 UTC (permalink / raw) To: Linus Walleij; +Cc: linux-arm-kernel, linux, linux-crypto, Arnd Bergmann On Thu, 15 Dec 2022 at 11:27, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > We currently only permit kernel mode NEON in process context, to avoid > > the need to preserve/restore the NEON register file when taking an > > exception while running in the kernel. > > > > Like we did on arm64, we can relax this restriction substantially, by > > permitting kernel mode NEON from softirq context, while ensuring that > > softirq processing is disabled when the NEON is being used in task > > context. This guarantees that only NEON context belonging to user space > > needs to be preserved and restored, which is already taken care of. > > > > This is especially relevant for network encryption, where incoming > > frames are typically handled in softirq context, and deferring software > > decryption to a kernel thread or falling back to C code are both > > undesirable from a performance PoV. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > So boosting WireGuard as primary SW network encryption user? Essentially, although the use case that inspired this work is related to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is ~3x faster than WG's chacha20poly1305, which makes the performance overhead of asynchronous completion even more significant. (Note that GCM needs the AES and PMULL instructions which are usually only available when running the 32-bit kernel on a 64-bit core, whereas chacha20poly1305 uses ordinary NEON instructions.) But Martin responded with a Tested-by regarding chacha20poly1305 on IPsec (not WG) where there is also a noticeable speedup, so WG on ARM32 should definitely benefit from this as well. > This is really neat, BTW: > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > Thanks! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context 2022-12-15 10:43 ` Ard Biesheuvel @ 2022-12-15 10:51 ` Russell King (Oracle) 2022-12-15 11:48 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Russell King (Oracle) @ 2022-12-15 10:51 UTC (permalink / raw) To: Ard Biesheuvel Cc: Linus Walleij, linux-arm-kernel, linux-crypto, Arnd Bergmann On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote: > On Thu, 15 Dec 2022 at 11:27, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > We currently only permit kernel mode NEON in process context, to avoid > > > the need to preserve/restore the NEON register file when taking an > > > exception while running in the kernel. > > > > > > Like we did on arm64, we can relax this restriction substantially, by > > > permitting kernel mode NEON from softirq context, while ensuring that > > > softirq processing is disabled when the NEON is being used in task > > > context. This guarantees that only NEON context belonging to user space > > > needs to be preserved and restored, which is already taken care of. > > > > > > This is especially relevant for network encryption, where incoming > > > frames are typically handled in softirq context, and deferring software > > > decryption to a kernel thread or falling back to C code are both > > > undesirable from a performance PoV. > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > So boosting WireGuard as primary SW network encryption user? > > Essentially, although the use case that inspired this work is related > to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is > ~3x faster than WG's chacha20poly1305, which makes the performance > overhead of asynchronous completion even more significant. (Note that > GCM needs the AES and PMULL instructions which are usually only > available when running the 32-bit kernel on a 64-bit core, whereas > chacha20poly1305 uses ordinary NEON instructions.) > > But Martin responded with a Tested-by regarding chacha20poly1305 on > IPsec (not WG) where there is also a noticeable speedup, so WG on > ARM32 should definitely benefit from this as well. It'll be interesting to see whether there is any noticable difference with my WG VPN. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context 2022-12-15 10:51 ` Russell King (Oracle) @ 2022-12-15 11:48 ` Ard Biesheuvel 0 siblings, 0 replies; 10+ messages in thread From: Ard Biesheuvel @ 2022-12-15 11:48 UTC (permalink / raw) To: Russell King (Oracle) Cc: Linus Walleij, linux-arm-kernel, linux-crypto, Arnd Bergmann On Thu, 15 Dec 2022 at 11:51, Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote: > > On Thu, 15 Dec 2022 at 11:27, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > We currently only permit kernel mode NEON in process context, to avoid > > > > the need to preserve/restore the NEON register file when taking an > > > > exception while running in the kernel. > > > > > > > > Like we did on arm64, we can relax this restriction substantially, by > > > > permitting kernel mode NEON from softirq context, while ensuring that > > > > softirq processing is disabled when the NEON is being used in task > > > > context. This guarantees that only NEON context belonging to user space > > > > needs to be preserved and restored, which is already taken care of. > > > > > > > > This is especially relevant for network encryption, where incoming > > > > frames are typically handled in softirq context, and deferring software > > > > decryption to a kernel thread or falling back to C code are both > > > > undesirable from a performance PoV. > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > So boosting WireGuard as primary SW network encryption user? > > > > Essentially, although the use case that inspired this work is related > > to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is > > ~3x faster than WG's chacha20poly1305, which makes the performance > > overhead of asynchronous completion even more significant. (Note that > > GCM needs the AES and PMULL instructions which are usually only > > available when running the 32-bit kernel on a 64-bit core, whereas > > chacha20poly1305 uses ordinary NEON instructions.) > > > > But Martin responded with a Tested-by regarding chacha20poly1305 on > > IPsec (not WG) where there is also a noticeable speedup, so WG on > > ARM32 should definitely benefit from this as well. > > It'll be interesting to see whether there is any noticable difference > with my WG VPN. > Using WireGuard with the same 32-bit KVM guest communicating with its 64-bit host using virtio-net, I get a 44% speedup in the host->guest direction. The other direction performs exactly the same, which is unsurprising as it doesn't involve NEON crypto in softirq context at all. BEFORE ====== ardb@vm32:~$ iperf3 -c 192.168.11.2 Connecting to host 192.168.11.2, port 5201 [ 5] local 192.168.11.1 port 40144 connected to 192.168.11.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 25.8 MBytes 216 Mbits/sec 0 397 KBytes [ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 397 KBytes [ 5] 2.00-3.00 sec 27.0 MBytes 226 Mbits/sec 0 397 KBytes [ 5] 3.00-4.00 sec 26.5 MBytes 222 Mbits/sec 0 397 KBytes [ 5] 4.00-5.00 sec 26.2 MBytes 220 Mbits/sec 0 397 KBytes [ 5] 5.00-6.00 sec 26.1 MBytes 219 Mbits/sec 0 436 KBytes [ 5] 6.00-7.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes [ 5] 7.00-8.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes [ 5] 8.00-9.00 sec 26.5 MBytes 222 Mbits/sec 0 480 KBytes [ 5] 9.00-10.00 sec 26.9 MBytes 225 Mbits/sec 0 480 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 263 MBytes 221 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec receiver ardb@sudo:~$ iperf3 -c 192.168.11.1 Connecting to host 192.168.11.1, port 5201 [ 5] local 192.168.11.2 port 46340 connected to 192.168.11.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 47.5 MBytes 398 Mbits/sec 0 1.75 MBytes [ 5] 1.00-2.00 sec 45.0 MBytes 377 Mbits/sec 18 1.35 MBytes [ 5] 2.00-3.00 sec 43.8 MBytes 367 Mbits/sec 0 1.47 MBytes [ 5] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec 0 1.56 MBytes [ 5] 4.00-5.00 sec 45.0 MBytes 377 Mbits/sec 0 1.63 MBytes [ 5] 5.00-6.00 sec 42.5 MBytes 357 Mbits/sec 0 1.68 MBytes [ 5] 6.00-7.00 sec 43.8 MBytes 367 Mbits/sec 0 1.71 MBytes [ 5] 7.00-8.00 sec 43.8 MBytes 367 Mbits/sec 0 1.73 MBytes [ 5] 8.00-9.00 sec 45.0 MBytes 377 Mbits/sec 0 1.74 MBytes [ 5] 9.00-10.00 sec 43.8 MBytes 367 Mbits/sec 0 1.75 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 445 MBytes 373 Mbits/sec 18 sender [ 5] 0.00-10.04 sec 444 MBytes 371 Mbits/sec receiver iperf Done. AFTER ===== ardb@vm32:~$ iperf3 -c 192.168.11.2 Connecting to host 192.168.11.2, port 5201 [ 5] local 192.168.11.1 port 44004 connected to 192.168.11.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 26.2 MBytes 220 Mbits/sec 0 399 KBytes [ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 399 KBytes [ 5] 2.00-3.00 sec 26.0 MBytes 218 Mbits/sec 0 444 KBytes [ 5] 3.00-4.00 sec 26.8 MBytes 225 Mbits/sec 0 485 KBytes [ 5] 4.00-5.00 sec 26.4 MBytes 222 Mbits/sec 0 542 KBytes [ 5] 5.00-6.00 sec 26.6 MBytes 223 Mbits/sec 0 568 KBytes [ 5] 6.00-7.00 sec 25.4 MBytes 213 Mbits/sec 0 568 KBytes [ 5] 7.00-8.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes [ 5] 8.00-9.00 sec 26.7 MBytes 224 Mbits/sec 0 568 KBytes [ 5] 9.00-10.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec 0 sender [ 5] 0.00-9.99 sec 261 MBytes 219 Mbits/sec receiver iperf Done. ardb@sudo:~$ iperf3 -c 192.168.11.1 Connecting to host 192.168.11.1, port 5201 [ 5] local 192.168.11.2 port 49838 connected to 192.168.11.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 61.2 MBytes 514 Mbits/sec 0 1.59 MBytes [ 5] 1.00-2.00 sec 66.2 MBytes 555 Mbits/sec 0 1.67 MBytes [ 5] 2.00-3.00 sec 65.0 MBytes 545 Mbits/sec 79 1.24 MBytes [ 5] 3.00-4.00 sec 63.8 MBytes 535 Mbits/sec 0 1.36 MBytes [ 5] 4.00-5.00 sec 63.8 MBytes 535 Mbits/sec 0 1.46 MBytes [ 5] 5.00-6.00 sec 63.8 MBytes 535 Mbits/sec 0 1.53 MBytes [ 5] 6.00-7.00 sec 62.5 MBytes 524 Mbits/sec 0 1.59 MBytes [ 5] 7.00-8.00 sec 65.0 MBytes 545 Mbits/sec 99 1.18 MBytes [ 5] 8.00-9.00 sec 65.0 MBytes 545 Mbits/sec 0 1.25 MBytes [ 5] 9.00-10.00 sec 65.0 MBytes 545 Mbits/sec 0 1.30 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 641 MBytes 538 Mbits/sec 178 sender [ 5] 0.00-10.02 sec 638 MBytes 535 Mbits/sec receiver iperf Done. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context 2022-12-07 10:39 [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context Ard Biesheuvel 2022-12-07 10:39 ` [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled Ard Biesheuvel 2022-12-07 10:39 ` [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context Ard Biesheuvel @ 2022-12-12 14:37 ` Martin Willi 2022-12-13 16:56 ` Ard Biesheuvel 2 siblings, 1 reply; 10+ messages in thread From: Martin Willi @ 2022-12-12 14:37 UTC (permalink / raw) To: Ard Biesheuvel, linux-arm-kernel, linux Cc: linux-crypto, Linus Walleij, Arnd Bergmann Hi Ard, > Currently on ARM, we only permit kernel mode NEON in task context [...] > For IPsec packet encryption involving highly performant crypto > implementations, this results in a substantial performance hit [...] Thanks for your continued work on this. > Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit > host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX. > > When the crypto algorithm is permitted to execute in softirq context, > the throughput increases to 16.5 MB/s TX and 41 MB/s RX. In my tests on an Armada 385, I could increase IPsec throughput with ChaCha20/Poly1305 on RX from ~230 to ~260 MBit/s when using the NEON code path. So you may add my: Tested-by: Martin Willi <martin@strongswan.org> Thanks, Martin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context 2022-12-12 14:37 ` [PATCH v2 0/2] ARM: allow " Martin Willi @ 2022-12-13 16:56 ` Ard Biesheuvel 0 siblings, 0 replies; 10+ messages in thread From: Ard Biesheuvel @ 2022-12-13 16:56 UTC (permalink / raw) To: Martin Willi Cc: linux-arm-kernel, linux, linux-crypto, Linus Walleij, Arnd Bergmann On Mon, 12 Dec 2022 at 15:38, Martin Willi <martin@strongswan.org> wrote: > > Hi Ard, > > > Currently on ARM, we only permit kernel mode NEON in task context [...] > > For IPsec packet encryption involving highly performant crypto > > implementations, this results in a substantial performance hit [...] > > Thanks for your continued work on this. > > > Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit > > host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX. > > > > When the crypto algorithm is permitted to execute in softirq context, > > the throughput increases to 16.5 MB/s TX and 41 MB/s RX. > > In my tests on an Armada 385, I could increase IPsec throughput with > ChaCha20/Poly1305 on RX from ~230 to ~260 MBit/s when using the NEON > code path. So you may add my: > > Tested-by: Martin Willi <martin@strongswan.org> > Thanks! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-12-15 11:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-07 10:39 [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context Ard Biesheuvel 2022-12-07 10:39 ` [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled Ard Biesheuvel 2022-12-15 10:22 ` Linus Walleij 2022-12-07 10:39 ` [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context Ard Biesheuvel 2022-12-15 10:26 ` Linus Walleij 2022-12-15 10:43 ` Ard Biesheuvel 2022-12-15 10:51 ` Russell King (Oracle) 2022-12-15 11:48 ` Ard Biesheuvel 2022-12-12 14:37 ` [PATCH v2 0/2] ARM: allow " Martin Willi 2022-12-13 16:56 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).