linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack
@ 2025-10-31 10:38 Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit Ard Biesheuvel
                   ` (21 more replies)
  0 siblings, 22 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel,
	Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown

From: Ard Biesheuvel <ardb@kernel.org>

Move the buffer for preserving/restoring the kernel mode FPSIMD state on a
context switch out of struct thread_struct, and onto the stack, so that
the memory cost is not imposed needlessly on all tasks in the system.

Changes since v3:
- Fix sloppy editing errors in ARM CRC code.

- Add more comments about the state argument to kernel_neon_begin/end
  and the associated field in thread_struct

- Fix bug in generic kernel mode FPU change

- Add Rbs from Eric

Changes since v2:
- Fix generic kernel mode FPU api instead of removing it.

- Rebase onto v6.18-rc0 and fix the fallout

- Prefer WARN() over BUG() in kernel_neon_begin/end

- Avoid unnecessary cmpxchg() calls

- When invoked in softirq context, use the caller provided buffer rather
  than the one stored in the task struct - this permits callers from
  task context (including users of the generic kernel mode FPU api) to
  pass NULL as the buffer when running with preemption disabled.

- Add acks from Kees and Eric; Mark's was dropped along with the patch
  in question.

- Fix new occurrence of kernel_neon_begin/end in Mellanox driver.

Changes since v1:
- Add a patch reverting the arm64 support for the generic
  kernel_fpu_begin()/end() API, which is problematic on arm64.

- Introduce a new 'ksimd' scoped guard that encapsulates the calls the
  kernel_neon_begin() and kernel_neon_end() at a higher level of
  abstraction. This makes it straight-forward to plumb in the stack
  buffer without complicating the callers.

- Move all kernel mode NEON users on arm64 (and some on ARM) over to the
  new API.

- Add Mark's ack to patches #6 - #8

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>

Ard Biesheuvel (21):
  crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
  crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
  crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
  arm64/simd: Add scoped guard API for kernel mode SIMD
  ARM/simd: Add scoped guard API for kernel mode SIMD
  crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
  raid6: Move to more abstract 'ksimd' guard API
  lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  lib/crypto: Switch ARM and arm64 to 'ksimd' scoped guard API
  crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
  crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
  crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
  crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
  crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
  crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
  crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
  arm64/xorblocks:  Switch to 'ksimd' scoped guard API
  net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
  arm64/fpu: Enforce task-context only for generic kernel mode FPU
  arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack

 arch/arm/include/asm/simd.h                  |   7 +
 arch/arm64/crypto/aes-ce-ccm-glue.c          | 116 +++++------
 arch/arm64/crypto/aes-ce-glue.c              |  87 ++++----
 arch/arm64/crypto/aes-glue.c                 | 139 ++++++-------
 arch/arm64/crypto/aes-neonbs-glue.c          | 150 +++++++-------
 arch/arm64/crypto/ghash-ce-glue.c            |  27 ++-
 arch/arm64/crypto/nhpoly1305-neon-glue.c     |   5 +-
 arch/arm64/crypto/polyval-ce-glue.c          |  12 +-
 arch/arm64/crypto/sha3-ce-glue.c             |  10 +-
 arch/arm64/crypto/sm3-ce-glue.c              |  15 +-
 arch/arm64/crypto/sm3-neon-glue.c            |  16 +-
 arch/arm64/crypto/sm4-ce-ccm-glue.c          |  49 ++---
 arch/arm64/crypto/sm4-ce-cipher-glue.c       |  10 +-
 arch/arm64/crypto/sm4-ce-gcm-glue.c          |  62 ++----
 arch/arm64/crypto/sm4-ce-glue.c              | 214 +++++++++-----------
 arch/arm64/crypto/sm4-neon-glue.c            |  25 +--
 arch/arm64/include/asm/fpu.h                 |  16 +-
 arch/arm64/include/asm/neon.h                |   4 +-
 arch/arm64/include/asm/processor.h           |   7 +-
 arch/arm64/include/asm/simd.h                |  10 +
 arch/arm64/include/asm/xor.h                 |  22 +-
 arch/arm64/kernel/fpsimd.c                   |  53 +++--
 crypto/aegis128-neon.c                       |  33 ++-
 drivers/net/ethernet/mellanox/mlx5/core/wc.c |  19 +-
 lib/crc/arm/crc-t10dif.h                     |  16 +-
 lib/crc/arm/crc32.h                          |  11 +-
 lib/crc/arm64/crc-t10dif.h                   |  16 +-
 lib/crc/arm64/crc32.h                        |  16 +-
 lib/crypto/arm/chacha.h                      |   6 +-
 lib/crypto/arm/poly1305.h                    |   6 +-
 lib/crypto/arm/sha1.h                        |  13 +-
 lib/crypto/arm/sha256.h                      |  12 +-
 lib/crypto/arm/sha512.h                      |   5 +-
 lib/crypto/arm64/chacha.h                    |  11 +-
 lib/crypto/arm64/poly1305.h                  |   6 +-
 lib/crypto/arm64/sha1.h                      |   7 +-
 lib/crypto/arm64/sha256.h                    |  19 +-
 lib/crypto/arm64/sha512.h                    |   8 +-
 lib/raid6/neon.c                             |  17 +-
 lib/raid6/recov_neon.c                       |  15 +-
 40 files changed, 601 insertions(+), 691 deletions(-)


base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-11-06  7:08   ` Herbert Xu
  2025-10-31 10:39 ` [PATCH v4 02/21] crypto/arm64: sm4-ce-ccm " Ard Biesheuvel
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Kernel mode NEON sections are now preemptible on arm64, and so there is
no need to yield it explicitly in order to prevent scheduling latency
spikes.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index 2d791d51891b..2eb4e76cabc3 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -114,11 +114,8 @@ static u32 ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
 			in += adv;
 			abytes -= adv;
 
-			if (unlikely(rem)) {
-				kernel_neon_end();
-				kernel_neon_begin();
+			if (unlikely(rem))
 				macp = 0;
-			}
 		} else {
 			u32 l = min(AES_BLOCK_SIZE - macp, abytes);
 
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 02/21] crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-11-06  7:11   ` Herbert Xu
  2025-10-31 10:39 ` [PATCH v4 03/21] crypto/arm64: sm4-ce-gcm " Ard Biesheuvel
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Kernel mode NEON sections are now preemptible on arm64, and so there is
no need to yield it when calling APIs that may sleep.

Also, move the calls to kernel_neon_end() to the same scope as
kernel_neon_begin(). This is needed for a subsequent change where a
stack buffer is allocated transparently and passed to
kernel_neon_begin().

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/sm4-ce-ccm-glue.c | 25 +++++---------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/crypto/sm4-ce-ccm-glue.c b/arch/arm64/crypto/sm4-ce-ccm-glue.c
index e9cc1c1364ec..e92cbdf1aaee 100644
--- a/arch/arm64/crypto/sm4-ce-ccm-glue.c
+++ b/arch/arm64/crypto/sm4-ce-ccm-glue.c
@@ -172,35 +172,22 @@ static int ccm_crypt(struct aead_request *req, struct skcipher_walk *walk,
 	if (req->assoclen)
 		ccm_calculate_auth_mac(req, mac);
 
-	while (walk->nbytes && walk->nbytes != walk->total) {
+	while (walk->nbytes) {
 		unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
 
+		if (walk->nbytes == walk->total)
+			tail = 0;
+
 		sm4_ce_ccm_crypt(rkey_enc, walk->dst.virt.addr,
 				 walk->src.virt.addr, walk->iv,
 				 walk->nbytes - tail, mac);
 
-		kernel_neon_end();
-
 		err = skcipher_walk_done(walk, tail);
-
-		kernel_neon_begin();
 	}
 
-	if (walk->nbytes) {
-		sm4_ce_ccm_crypt(rkey_enc, walk->dst.virt.addr,
-				 walk->src.virt.addr, walk->iv,
-				 walk->nbytes, mac);
-
-		sm4_ce_ccm_final(rkey_enc, ctr0, mac);
+	sm4_ce_ccm_final(rkey_enc, ctr0, mac);
 
-		kernel_neon_end();
-
-		err = skcipher_walk_done(walk, 0);
-	} else {
-		sm4_ce_ccm_final(rkey_enc, ctr0, mac);
-
-		kernel_neon_end();
-	}
+	kernel_neon_end();
 
 	return err;
 }
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 03/21] crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 02/21] crypto/arm64: sm4-ce-ccm " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-11-06  7:15   ` Herbert Xu
  2025-10-31 10:39 ` [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD Ard Biesheuvel
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Kernel mode NEON sections are now preemptible on arm64, and so there is
no need to yield it when calling APIs that may sleep.

Also, move the calls to kernel_neon_end() to the same scope as
kernel_neon_begin(). This is needed for a subsequent change where a
stack buffer is allocated transparently and passed to
kernel_neon_begin().

While at it, simplify the logic.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/sm4-ce-gcm-glue.c | 25 +++++---------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/crypto/sm4-ce-gcm-glue.c b/arch/arm64/crypto/sm4-ce-gcm-glue.c
index c2ea3d5f690b..8f6fc8c33c3f 100644
--- a/arch/arm64/crypto/sm4-ce-gcm-glue.c
+++ b/arch/arm64/crypto/sm4-ce-gcm-glue.c
@@ -154,36 +154,23 @@ static int gcm_crypt(struct aead_request *req, struct skcipher_walk *walk,
 	if (req->assoclen)
 		gcm_calculate_auth_mac(req, ghash);
 
-	while (walk->nbytes) {
+	do {
 		unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
 		const u8 *src = walk->src.virt.addr;
 		u8 *dst = walk->dst.virt.addr;
+		const u8 *l = NULL;
 
 		if (walk->nbytes == walk->total) {
-			sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv,
-					       walk->nbytes, ghash,
-					       ctx->ghash_table,
-					       (const u8 *)&lengths);
-
-			kernel_neon_end();
-
-			return skcipher_walk_done(walk, 0);
+			l = (const u8 *)&lengths;
+			tail = 0;
 		}
 
 		sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv,
 				       walk->nbytes - tail, ghash,
-				       ctx->ghash_table, NULL);
-
-		kernel_neon_end();
+				       ctx->ghash_table, l);
 
 		err = skcipher_walk_done(walk, tail);
-
-		kernel_neon_begin();
-	}
-
-	sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, NULL, NULL, iv,
-			       walk->nbytes, ghash, ctx->ghash_table,
-			       (const u8 *)&lengths);
+	} while (walk->nbytes);
 
 	kernel_neon_end();
 
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 03/21] crypto/arm64: sm4-ce-gcm " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 13:55   ` Jonathan Cameron
  2025-10-31 10:39 ` [PATCH v4 05/21] ARM/simd: " Ard Biesheuvel
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel,
	Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Encapsulate kernel_neon_begin() and kernel_neon_end() using a 'ksimd'
cleanup guard. This hides the prototype of those functions, allowing
them to be changed for arm64 but not ARM, without breaking code that is
shared between those architectures (RAID6, AEGIS-128)

It probably makes sense to expose this API more widely across
architectures, as it affords more flexibility to the arch code to
plumb it in, while imposing more rigid rules regarding the start/end
bookends appearing in matched pairs.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/simd.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 8e86c9e70e48..d9f83c478736 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -6,12 +6,15 @@
 #ifndef __ASM_SIMD_H
 #define __ASM_SIMD_H
 
+#include <linux/cleanup.h>
 #include <linux/compiler.h>
 #include <linux/irqflags.h>
 #include <linux/percpu.h>
 #include <linux/preempt.h>
 #include <linux/types.h>
 
+#include <asm/neon.h>
+
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 /*
@@ -40,4 +43,8 @@ static __must_check inline bool may_use_simd(void) {
 
 #endif /* ! CONFIG_KERNEL_MODE_NEON */
 
+DEFINE_LOCK_GUARD_0(ksimd, kernel_neon_begin(), kernel_neon_end())
+
+#define scoped_ksimd()	scoped_guard(ksimd)
+
 #endif
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 05/21] ARM/simd: Add scoped guard API for kernel mode SIMD
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 06/21] crypto: aegis128-neon - Move to more abstract 'ksimd' guard API Ard Biesheuvel
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel,
	Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Implement the ksimd scoped guard API so that it can be used by code that
supports both ARM and arm64.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/simd.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h
index be08a8da046f..8549fa8b7253 100644
--- a/arch/arm/include/asm/simd.h
+++ b/arch/arm/include/asm/simd.h
@@ -2,14 +2,21 @@
 #ifndef _ASM_SIMD_H
 #define _ASM_SIMD_H
 
+#include <linux/cleanup.h>
 #include <linux/compiler_attributes.h>
 #include <linux/preempt.h>
 #include <linux/types.h>
 
+#include <asm/neon.h>
+
 static __must_check inline bool may_use_simd(void)
 {
 	return IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && !in_hardirq()
 	       && !irqs_disabled();
 }
 
+DEFINE_LOCK_GUARD_0(ksimd, kernel_neon_begin(), kernel_neon_end())
+
+#define scoped_ksimd()	scoped_guard(ksimd)
+
 #endif	/* _ASM_SIMD_H */
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 06/21] crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 05/21] ARM/simd: " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 07/21] raid6: " Ard Biesheuvel
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Move away from calling kernel_neon_begin() and kernel_neon_end()
directly, and instead, use the newly introduced scoped_ksimd() API. This
permits arm64 to modify the kernel mode NEON API without affecting code
that is shared between ARM and arm64.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 crypto/aegis128-neon.c | 33 +++++++-------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/crypto/aegis128-neon.c b/crypto/aegis128-neon.c
index 9ee50549e823..b41807e63bd3 100644
--- a/crypto/aegis128-neon.c
+++ b/crypto/aegis128-neon.c
@@ -4,7 +4,7 @@
  */
 
 #include <asm/cpufeature.h>
-#include <asm/neon.h>
+#include <asm/simd.h>
 
 #include "aegis.h"
 #include "aegis-neon.h"
@@ -24,32 +24,28 @@ void crypto_aegis128_init_simd(struct aegis_state *state,
 			       const union aegis_block *key,
 			       const u8 *iv)
 {
-	kernel_neon_begin();
-	crypto_aegis128_init_neon(state, key, iv);
-	kernel_neon_end();
+	scoped_ksimd()
+		crypto_aegis128_init_neon(state, key, iv);
 }
 
 void crypto_aegis128_update_simd(struct aegis_state *state, const void *msg)
 {
-	kernel_neon_begin();
-	crypto_aegis128_update_neon(state, msg);
-	kernel_neon_end();
+	scoped_ksimd()
+		crypto_aegis128_update_neon(state, msg);
 }
 
 void crypto_aegis128_encrypt_chunk_simd(struct aegis_state *state, u8 *dst,
 					const u8 *src, unsigned int size)
 {
-	kernel_neon_begin();
-	crypto_aegis128_encrypt_chunk_neon(state, dst, src, size);
-	kernel_neon_end();
+	scoped_ksimd()
+		crypto_aegis128_encrypt_chunk_neon(state, dst, src, size);
 }
 
 void crypto_aegis128_decrypt_chunk_simd(struct aegis_state *state, u8 *dst,
 					const u8 *src, unsigned int size)
 {
-	kernel_neon_begin();
-	crypto_aegis128_decrypt_chunk_neon(state, dst, src, size);
-	kernel_neon_end();
+	scoped_ksimd()
+		crypto_aegis128_decrypt_chunk_neon(state, dst, src, size);
 }
 
 int crypto_aegis128_final_simd(struct aegis_state *state,
@@ -58,12 +54,7 @@ int crypto_aegis128_final_simd(struct aegis_state *state,
 			       unsigned int cryptlen,
 			       unsigned int authsize)
 {
-	int ret;
-
-	kernel_neon_begin();
-	ret = crypto_aegis128_final_neon(state, tag_xor, assoclen, cryptlen,
-					 authsize);
-	kernel_neon_end();
-
-	return ret;
+	scoped_ksimd()
+		return crypto_aegis128_final_neon(state, tag_xor, assoclen,
+						  cryptlen, authsize);
 }
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 07/21] raid6: Move to more abstract 'ksimd' guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 06/21] crypto: aegis128-neon - Move to more abstract 'ksimd' guard API Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped " Ard Biesheuvel
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Move away from calling kernel_neon_begin() and kernel_neon_end()
directly, and instead, use the newly introduced scoped_ksimd() API. This
permits arm64 to modify the kernel mode NEON API without affecting code
that is shared between ARM and arm64.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/raid6/neon.c       | 17 +++++++----------
 lib/raid6/recov_neon.c | 15 ++++++---------
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/lib/raid6/neon.c b/lib/raid6/neon.c
index 0a2e76035ea9..6d9474ce6da9 100644
--- a/lib/raid6/neon.c
+++ b/lib/raid6/neon.c
@@ -8,10 +8,9 @@
 #include <linux/raid/pq.h>
 
 #ifdef __KERNEL__
-#include <asm/neon.h>
+#include <asm/simd.h>
 #else
-#define kernel_neon_begin()
-#define kernel_neon_end()
+#define scoped_ksimd()
 #define cpu_has_neon()		(1)
 #endif
 
@@ -32,10 +31,9 @@
 	{								\
 		void raid6_neon ## _n  ## _gen_syndrome_real(int,	\
 						unsigned long, void**);	\
-		kernel_neon_begin();					\
-		raid6_neon ## _n ## _gen_syndrome_real(disks,		\
+		scoped_ksimd()						\
+			raid6_neon ## _n ## _gen_syndrome_real(disks,	\
 					(unsigned long)bytes, ptrs);	\
-		kernel_neon_end();					\
 	}								\
 	static void raid6_neon ## _n ## _xor_syndrome(int disks,	\
 					int start, int stop, 		\
@@ -43,10 +41,9 @@
 	{								\
 		void raid6_neon ## _n  ## _xor_syndrome_real(int,	\
 				int, int, unsigned long, void**);	\
-		kernel_neon_begin();					\
-		raid6_neon ## _n ## _xor_syndrome_real(disks,		\
-			start, stop, (unsigned long)bytes, ptrs);	\
-		kernel_neon_end();					\
+		scoped_ksimd()						\
+			raid6_neon ## _n ## _xor_syndrome_real(disks,	\
+				start, stop, (unsigned long)bytes, ptrs);\
 	}								\
 	struct raid6_calls const raid6_neonx ## _n = {			\
 		raid6_neon ## _n ## _gen_syndrome,			\
diff --git a/lib/raid6/recov_neon.c b/lib/raid6/recov_neon.c
index 70e1404c1512..9d99aeabd31a 100644
--- a/lib/raid6/recov_neon.c
+++ b/lib/raid6/recov_neon.c
@@ -7,11 +7,10 @@
 #include <linux/raid/pq.h>
 
 #ifdef __KERNEL__
-#include <asm/neon.h>
+#include <asm/simd.h>
 #include "neon.h"
 #else
-#define kernel_neon_begin()
-#define kernel_neon_end()
+#define scoped_ksimd()
 #define cpu_has_neon()		(1)
 #endif
 
@@ -55,9 +54,8 @@ static void raid6_2data_recov_neon(int disks, size_t bytes, int faila,
 	qmul  = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^
 					 raid6_gfexp[failb]]];
 
-	kernel_neon_begin();
-	__raid6_2data_recov_neon(bytes, p, q, dp, dq, pbmul, qmul);
-	kernel_neon_end();
+	scoped_ksimd()
+		__raid6_2data_recov_neon(bytes, p, q, dp, dq, pbmul, qmul);
 }
 
 static void raid6_datap_recov_neon(int disks, size_t bytes, int faila,
@@ -86,9 +84,8 @@ static void raid6_datap_recov_neon(int disks, size_t bytes, int faila,
 	/* Now, pick the proper data tables */
 	qmul = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila]]];
 
-	kernel_neon_begin();
-	__raid6_datap_recov_neon(bytes, p, q, dq, qmul);
-	kernel_neon_end();
+	scoped_ksimd()
+		__raid6_datap_recov_neon(bytes, p, q, dq, qmul);
 }
 
 const struct raid6_recov_calls raid6_recov_neon = {
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 07/21] raid6: " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 13:49   ` Jonathan Cameron
  2025-10-31 10:39 ` [PATCH v4 09/21] lib/crypto: " Ard Biesheuvel
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Before modifying the prototypes of kernel_neon_begin() and
kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
which encapsulates the calls to those functions.

For symmetry, do the same for 32-bit ARM too.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
 lib/crc/arm/crc32.h        | 11 ++++-------
 lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
 lib/crc/arm64/crc32.h      | 16 ++++++----------
 4 files changed, 20 insertions(+), 39 deletions(-)

diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
index 63441de5e3f1..aaeeab0defb5 100644
--- a/lib/crc/arm/crc-t10dif.h
+++ b/lib/crc/arm/crc-t10dif.h
@@ -5,7 +5,6 @@
  * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
  */
 
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
@@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
 static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
 {
 	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
-		if (static_branch_likely(&have_pmull)) {
-			if (likely(may_use_simd())) {
-				kernel_neon_begin();
-				crc = crc_t10dif_pmull64(crc, data, length);
-				kernel_neon_end();
-				return crc;
-			}
+		if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
+			scoped_ksimd()
+				return crc_t10dif_pmull64(crc, data, length);
 		} else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
 			   static_branch_likely(&have_neon) &&
 			   likely(may_use_simd())) {
 			u8 buf[16] __aligned(16);
 
-			kernel_neon_begin();
-			crc_t10dif_pmull8(crc, data, length, buf);
-			kernel_neon_end();
+			scoped_ksimd()
+				crc_t10dif_pmull8(crc, data, length, buf);
 
 			return crc_t10dif_generic(0, buf, sizeof(buf));
 		}
diff --git a/lib/crc/arm/crc32.h b/lib/crc/arm/crc32.h
index 7b76f52f6907..f33de6b22cd4 100644
--- a/lib/crc/arm/crc32.h
+++ b/lib/crc/arm/crc32.h
@@ -8,7 +8,6 @@
 #include <linux/cpufeature.h>
 
 #include <asm/hwcap.h>
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_crc32);
@@ -42,9 +41,8 @@ static inline u32 crc32_le_arch(u32 crc, const u8 *p, size_t len)
 			len -= n;
 		}
 		n = round_down(len, 16);
-		kernel_neon_begin();
-		crc = crc32_pmull_le(p, n, crc);
-		kernel_neon_end();
+		scoped_ksimd()
+			crc = crc32_pmull_le(p, n, crc);
 		p += n;
 		len -= n;
 	}
@@ -71,9 +69,8 @@ static inline u32 crc32c_arch(u32 crc, const u8 *p, size_t len)
 			len -= n;
 		}
 		n = round_down(len, 16);
-		kernel_neon_begin();
-		crc = crc32c_pmull_le(p, n, crc);
-		kernel_neon_end();
+		scoped_ksimd()
+			crc = crc32c_pmull_le(p, n, crc);
 		p += n;
 		len -= n;
 	}
diff --git a/lib/crc/arm64/crc-t10dif.h b/lib/crc/arm64/crc-t10dif.h
index f88db2971805..0de03ab1aeab 100644
--- a/lib/crc/arm64/crc-t10dif.h
+++ b/lib/crc/arm64/crc-t10dif.h
@@ -7,7 +7,6 @@
 
 #include <linux/cpufeature.h>
 
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_asimd);
@@ -22,21 +21,16 @@ asmlinkage u16 crc_t10dif_pmull_p64(u16 init_crc, const u8 *buf, size_t len);
 static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
 {
 	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
-		if (static_branch_likely(&have_pmull)) {
-			if (likely(may_use_simd())) {
-				kernel_neon_begin();
-				crc = crc_t10dif_pmull_p64(crc, data, length);
-				kernel_neon_end();
-				return crc;
-			}
+		if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
+			scoped_ksimd()
+				return crc_t10dif_pmull_p64(crc, data, length);
 		} else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
 			   static_branch_likely(&have_asimd) &&
 			   likely(may_use_simd())) {
 			u8 buf[16];
 
-			kernel_neon_begin();
-			crc_t10dif_pmull_p8(crc, data, length, buf);
-			kernel_neon_end();
+			scoped_ksimd()
+				crc_t10dif_pmull_p8(crc, data, length, buf);
 
 			return crc_t10dif_generic(0, buf, sizeof(buf));
 		}
diff --git a/lib/crc/arm64/crc32.h b/lib/crc/arm64/crc32.h
index 31e649cd40a2..1939a5dee477 100644
--- a/lib/crc/arm64/crc32.h
+++ b/lib/crc/arm64/crc32.h
@@ -2,7 +2,6 @@
 
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 // The minimum input length to consider the 4-way interleaved code path
@@ -23,9 +22,8 @@ static inline u32 crc32_le_arch(u32 crc, const u8 *p, size_t len)
 
 	if (len >= min_len && cpu_have_named_feature(PMULL) &&
 	    likely(may_use_simd())) {
-		kernel_neon_begin();
-		crc = crc32_le_arm64_4way(crc, p, len);
-		kernel_neon_end();
+		scoped_ksimd()
+			crc = crc32_le_arm64_4way(crc, p, len);
 
 		p += round_down(len, 64);
 		len %= 64;
@@ -44,9 +42,8 @@ static inline u32 crc32c_arch(u32 crc, const u8 *p, size_t len)
 
 	if (len >= min_len && cpu_have_named_feature(PMULL) &&
 	    likely(may_use_simd())) {
-		kernel_neon_begin();
-		crc = crc32c_le_arm64_4way(crc, p, len);
-		kernel_neon_end();
+		scoped_ksimd()
+			crc = crc32c_le_arm64_4way(crc, p, len);
 
 		p += round_down(len, 64);
 		len %= 64;
@@ -65,9 +62,8 @@ static inline u32 crc32_be_arch(u32 crc, const u8 *p, size_t len)
 
 	if (len >= min_len && cpu_have_named_feature(PMULL) &&
 	    likely(may_use_simd())) {
-		kernel_neon_begin();
-		crc = crc32_be_arm64_4way(crc, p, len);
-		kernel_neon_end();
+		scoped_ksimd()
+			crc = crc32_be_arm64_4way(crc, p, len);
 
 		p += round_down(len, 64);
 		len %= 64;
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 09/21] lib/crypto: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 10/21] crypto/arm64: aes-ccm - Switch " Ard Biesheuvel
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Before modifying the prototypes of kernel_neon_begin() and
kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
which encapsulates the calls to those functions.

For symmetry, do the same for 32-bit ARM too.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/crypto/arm/chacha.h     |  6 ++----
 lib/crypto/arm/poly1305.h   |  6 ++----
 lib/crypto/arm/sha1.h       | 13 ++++++-------
 lib/crypto/arm/sha256.h     | 12 ++++++------
 lib/crypto/arm/sha512.h     |  5 ++---
 lib/crypto/arm64/chacha.h   | 11 ++++-------
 lib/crypto/arm64/poly1305.h |  6 ++----
 lib/crypto/arm64/sha1.h     |  7 +++----
 lib/crypto/arm64/sha256.h   | 19 ++++++++-----------
 lib/crypto/arm64/sha512.h   |  8 ++++----
 10 files changed, 39 insertions(+), 54 deletions(-)

diff --git a/lib/crypto/arm/chacha.h b/lib/crypto/arm/chacha.h
index 0cae30f8ee5d..b27ba00b3b23 100644
--- a/lib/crypto/arm/chacha.h
+++ b/lib/crypto/arm/chacha.h
@@ -12,7 +12,6 @@
 
 #include <asm/cputype.h>
 #include <asm/hwcap.h>
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 asmlinkage void chacha_block_xor_neon(const struct chacha_state *state,
@@ -87,9 +86,8 @@ static void chacha_crypt_arch(struct chacha_state *state, u8 *dst,
 	do {
 		unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
 
-		kernel_neon_begin();
-		chacha_doneon(state, dst, src, todo, nrounds);
-		kernel_neon_end();
+		scoped_ksimd()
+			chacha_doneon(state, dst, src, todo, nrounds);
 
 		bytes -= todo;
 		src += todo;
diff --git a/lib/crypto/arm/poly1305.h b/lib/crypto/arm/poly1305.h
index 0021cf368307..0fe903d8de55 100644
--- a/lib/crypto/arm/poly1305.h
+++ b/lib/crypto/arm/poly1305.h
@@ -6,7 +6,6 @@
  */
 
 #include <asm/hwcap.h>
-#include <asm/neon.h>
 #include <asm/simd.h>
 #include <linux/cpufeature.h>
 #include <linux/jump_label.h>
@@ -32,9 +31,8 @@ static void poly1305_blocks(struct poly1305_block_state *state, const u8 *src,
 		do {
 			unsigned int todo = min_t(unsigned int, len, SZ_4K);
 
-			kernel_neon_begin();
-			poly1305_blocks_neon(state, src, todo, padbit);
-			kernel_neon_end();
+			scoped_ksimd()
+				poly1305_blocks_neon(state, src, todo, padbit);
 
 			len -= todo;
 			src += todo;
diff --git a/lib/crypto/arm/sha1.h b/lib/crypto/arm/sha1.h
index 29f8bcad0447..3e2d8c7cab9f 100644
--- a/lib/crypto/arm/sha1.h
+++ b/lib/crypto/arm/sha1.h
@@ -4,7 +4,6 @@
  *
  * Copyright 2025 Google LLC
  */
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
@@ -22,12 +21,12 @@ static void sha1_blocks(struct sha1_block_state *state,
 {
 	if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&
 	    static_branch_likely(&have_neon) && likely(may_use_simd())) {
-		kernel_neon_begin();
-		if (static_branch_likely(&have_ce))
-			sha1_ce_transform(state, data, nblocks);
-		else
-			sha1_transform_neon(state, data, nblocks);
-		kernel_neon_end();
+		scoped_ksimd() {
+			if (static_branch_likely(&have_ce))
+				sha1_ce_transform(state, data, nblocks);
+			else
+				sha1_transform_neon(state, data, nblocks);
+		}
 	} else {
 		sha1_block_data_order(state, data, nblocks);
 	}
diff --git a/lib/crypto/arm/sha256.h b/lib/crypto/arm/sha256.h
index 7556457b3094..ae7e52dd6e3b 100644
--- a/lib/crypto/arm/sha256.h
+++ b/lib/crypto/arm/sha256.h
@@ -22,12 +22,12 @@ static void sha256_blocks(struct sha256_block_state *state,
 {
 	if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&
 	    static_branch_likely(&have_neon) && likely(may_use_simd())) {
-		kernel_neon_begin();
-		if (static_branch_likely(&have_ce))
-			sha256_ce_transform(state, data, nblocks);
-		else
-			sha256_block_data_order_neon(state, data, nblocks);
-		kernel_neon_end();
+		scoped_ksimd() {
+			if (static_branch_likely(&have_ce))
+				sha256_ce_transform(state, data, nblocks);
+			else
+				sha256_block_data_order_neon(state, data, nblocks);
+		}
 	} else {
 		sha256_block_data_order(state, data, nblocks);
 	}
diff --git a/lib/crypto/arm/sha512.h b/lib/crypto/arm/sha512.h
index d1b485dd275d..ed9bd81d6d78 100644
--- a/lib/crypto/arm/sha512.h
+++ b/lib/crypto/arm/sha512.h
@@ -19,9 +19,8 @@ static void sha512_blocks(struct sha512_block_state *state,
 {
 	if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&
 	    static_branch_likely(&have_neon) && likely(may_use_simd())) {
-		kernel_neon_begin();
-		sha512_block_data_order_neon(state, data, nblocks);
-		kernel_neon_end();
+		scoped_ksimd()
+			sha512_block_data_order_neon(state, data, nblocks);
 	} else {
 		sha512_block_data_order(state, data, nblocks);
 	}
diff --git a/lib/crypto/arm64/chacha.h b/lib/crypto/arm64/chacha.h
index ba6c22d46086..ca8c6a8b0578 100644
--- a/lib/crypto/arm64/chacha.h
+++ b/lib/crypto/arm64/chacha.h
@@ -23,7 +23,6 @@
 #include <linux/kernel.h>
 
 #include <asm/hwcap.h>
-#include <asm/neon.h>
 #include <asm/simd.h>
 
 asmlinkage void chacha_block_xor_neon(const struct chacha_state *state,
@@ -65,9 +64,8 @@ static void hchacha_block_arch(const struct chacha_state *state,
 	if (!static_branch_likely(&have_neon) || !crypto_simd_usable()) {
 		hchacha_block_generic(state, out, nrounds);
 	} else {
-		kernel_neon_begin();
-		hchacha_block_neon(state, out, nrounds);
-		kernel_neon_end();
+		scoped_ksimd()
+			hchacha_block_neon(state, out, nrounds);
 	}
 }
 
@@ -81,9 +79,8 @@ static void chacha_crypt_arch(struct chacha_state *state, u8 *dst,
 	do {
 		unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
 
-		kernel_neon_begin();
-		chacha_doneon(state, dst, src, todo, nrounds);
-		kernel_neon_end();
+		scoped_ksimd()
+			chacha_doneon(state, dst, src, todo, nrounds);
 
 		bytes -= todo;
 		src += todo;
diff --git a/lib/crypto/arm64/poly1305.h b/lib/crypto/arm64/poly1305.h
index aed5921ccd9a..b77669767cd6 100644
--- a/lib/crypto/arm64/poly1305.h
+++ b/lib/crypto/arm64/poly1305.h
@@ -6,7 +6,6 @@
  */
 
 #include <asm/hwcap.h>
-#include <asm/neon.h>
 #include <asm/simd.h>
 #include <linux/cpufeature.h>
 #include <linux/jump_label.h>
@@ -31,9 +30,8 @@ static void poly1305_blocks(struct poly1305_block_state *state, const u8 *src,
 		do {
 			unsigned int todo = min_t(unsigned int, len, SZ_4K);
 
-			kernel_neon_begin();
-			poly1305_blocks_neon(state, src, todo, padbit);
-			kernel_neon_end();
+			scoped_ksimd()
+				poly1305_blocks_neon(state, src, todo, padbit);
 
 			len -= todo;
 			src += todo;
diff --git a/lib/crypto/arm64/sha1.h b/lib/crypto/arm64/sha1.h
index aaef4ebfc5e3..bc7071f1be09 100644
--- a/lib/crypto/arm64/sha1.h
+++ b/lib/crypto/arm64/sha1.h
@@ -4,7 +4,6 @@
  *
  * Copyright 2025 Google LLC
  */
-#include <asm/neon.h>
 #include <asm/simd.h>
 #include <linux/cpufeature.h>
 
@@ -20,9 +19,9 @@ static void sha1_blocks(struct sha1_block_state *state,
 		do {
 			size_t rem;
 
-			kernel_neon_begin();
-			rem = __sha1_ce_transform(state, data, nblocks);
-			kernel_neon_end();
+			scoped_ksimd()
+				rem = __sha1_ce_transform(state, data, nblocks);
+
 			data += (nblocks - rem) * SHA1_BLOCK_SIZE;
 			nblocks = rem;
 		} while (nblocks);
diff --git a/lib/crypto/arm64/sha256.h b/lib/crypto/arm64/sha256.h
index 80d06df27d3a..568dff0f276a 100644
--- a/lib/crypto/arm64/sha256.h
+++ b/lib/crypto/arm64/sha256.h
@@ -4,7 +4,6 @@
  *
  * Copyright 2025 Google LLC
  */
-#include <asm/neon.h>
 #include <asm/simd.h>
 #include <linux/cpufeature.h>
 
@@ -27,17 +26,16 @@ static void sha256_blocks(struct sha256_block_state *state,
 			do {
 				size_t rem;
 
-				kernel_neon_begin();
-				rem = __sha256_ce_transform(state,
-							    data, nblocks);
-				kernel_neon_end();
+				scoped_ksimd()
+					rem = __sha256_ce_transform(state, data,
+								    nblocks);
+
 				data += (nblocks - rem) * SHA256_BLOCK_SIZE;
 				nblocks = rem;
 			} while (nblocks);
 		} else {
-			kernel_neon_begin();
-			sha256_block_neon(state, data, nblocks);
-			kernel_neon_end();
+			scoped_ksimd()
+				sha256_block_neon(state, data, nblocks);
 		}
 	} else {
 		sha256_block_data_order(state, data, nblocks);
@@ -66,9 +64,8 @@ static bool sha256_finup_2x_arch(const struct __sha256_ctx *ctx,
 	if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&
 	    static_branch_likely(&have_ce) && len >= SHA256_BLOCK_SIZE &&
 	    len <= 65536 && likely(may_use_simd())) {
-		kernel_neon_begin();
-		sha256_ce_finup2x(ctx, data1, data2, len, out1, out2);
-		kernel_neon_end();
+		scoped_ksimd()
+			sha256_ce_finup2x(ctx, data1, data2, len, out1, out2);
 		kmsan_unpoison_memory(out1, SHA256_DIGEST_SIZE);
 		kmsan_unpoison_memory(out2, SHA256_DIGEST_SIZE);
 		return true;
diff --git a/lib/crypto/arm64/sha512.h b/lib/crypto/arm64/sha512.h
index ddb0d256f73a..7eb7ef04d268 100644
--- a/lib/crypto/arm64/sha512.h
+++ b/lib/crypto/arm64/sha512.h
@@ -4,7 +4,7 @@
  *
  * Copyright 2025 Google LLC
  */
-#include <asm/neon.h>
+
 #include <asm/simd.h>
 #include <linux/cpufeature.h>
 
@@ -24,9 +24,9 @@ static void sha512_blocks(struct sha512_block_state *state,
 		do {
 			size_t rem;
 
-			kernel_neon_begin();
-			rem = __sha512_ce_transform(state, data, nblocks);
-			kernel_neon_end();
+			scoped_ksimd()
+				rem = __sha512_ce_transform(state, data, nblocks);
+
 			data += (nblocks - rem) * SHA512_BLOCK_SIZE;
 			nblocks = rem;
 		} while (nblocks);
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 10/21] crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 09/21] lib/crypto: " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 11/21] crypto/arm64: aes-blk " Ard Biesheuvel
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-ce-ccm-glue.c | 135 ++++++++++----------
 1 file changed, 66 insertions(+), 69 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index 2eb4e76cabc3..c4fd648471f1 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -8,7 +8,6 @@
  * Author: Ard Biesheuvel <ardb@kernel.org>
  */
 
-#include <asm/neon.h>
 #include <linux/unaligned.h>
 #include <crypto/aes.h>
 #include <crypto/scatterwalk.h>
@@ -16,6 +15,8 @@
 #include <crypto/internal/skcipher.h>
 #include <linux/module.h>
 
+#include <asm/simd.h>
+
 #include "aes-ce-setkey.h"
 
 MODULE_IMPORT_NS("CRYPTO_INTERNAL");
@@ -184,40 +185,38 @@ static int ccm_encrypt(struct aead_request *req)
 	if (unlikely(err))
 		return err;
 
-	kernel_neon_begin();
-
-	if (req->assoclen)
-		ccm_calculate_auth_mac(req, mac);
-
-	do {
-		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
-		const u8 *src = walk.src.virt.addr;
-		u8 *dst = walk.dst.virt.addr;
-		u8 buf[AES_BLOCK_SIZE];
-		u8 *final_iv = NULL;
-
-		if (walk.nbytes == walk.total) {
-			tail = 0;
-			final_iv = orig_iv;
-		}
-
-		if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
-			src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes],
-					   src, walk.nbytes);
-
-		ce_aes_ccm_encrypt(dst, src, walk.nbytes - tail,
-				   ctx->key_enc, num_rounds(ctx),
-				   mac, walk.iv, final_iv);
-
-		if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
-			memcpy(walk.dst.virt.addr, dst, walk.nbytes);
-
-		if (walk.nbytes) {
-			err = skcipher_walk_done(&walk, tail);
-		}
-	} while (walk.nbytes);
-
-	kernel_neon_end();
+	scoped_ksimd() {
+		if (req->assoclen)
+			ccm_calculate_auth_mac(req, mac);
+
+		do {
+			u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+			const u8 *src = walk.src.virt.addr;
+			u8 *dst = walk.dst.virt.addr;
+			u8 buf[AES_BLOCK_SIZE];
+			u8 *final_iv = NULL;
+
+			if (walk.nbytes == walk.total) {
+				tail = 0;
+				final_iv = orig_iv;
+			}
+
+			if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
+				src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes],
+						   src, walk.nbytes);
+
+			ce_aes_ccm_encrypt(dst, src, walk.nbytes - tail,
+					   ctx->key_enc, num_rounds(ctx),
+					   mac, walk.iv, final_iv);
+
+			if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
+				memcpy(walk.dst.virt.addr, dst, walk.nbytes);
+
+			if (walk.nbytes) {
+				err = skcipher_walk_done(&walk, tail);
+			}
+		} while (walk.nbytes);
+	}
 
 	if (unlikely(err))
 		return err;
@@ -251,40 +250,38 @@ static int ccm_decrypt(struct aead_request *req)
 	if (unlikely(err))
 		return err;
 
-	kernel_neon_begin();
-
-	if (req->assoclen)
-		ccm_calculate_auth_mac(req, mac);
-
-	do {
-		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
-		const u8 *src = walk.src.virt.addr;
-		u8 *dst = walk.dst.virt.addr;
-		u8 buf[AES_BLOCK_SIZE];
-		u8 *final_iv = NULL;
-
-		if (walk.nbytes == walk.total) {
-			tail = 0;
-			final_iv = orig_iv;
-		}
-
-		if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
-			src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes],
-					   src, walk.nbytes);
-
-		ce_aes_ccm_decrypt(dst, src, walk.nbytes - tail,
-				   ctx->key_enc, num_rounds(ctx),
-				   mac, walk.iv, final_iv);
-
-		if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
-			memcpy(walk.dst.virt.addr, dst, walk.nbytes);
-
-		if (walk.nbytes) {
-			err = skcipher_walk_done(&walk, tail);
-		}
-	} while (walk.nbytes);
-
-	kernel_neon_end();
+	scoped_ksimd() {
+		if (req->assoclen)
+			ccm_calculate_auth_mac(req, mac);
+
+		do {
+			u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+			const u8 *src = walk.src.virt.addr;
+			u8 *dst = walk.dst.virt.addr;
+			u8 buf[AES_BLOCK_SIZE];
+			u8 *final_iv = NULL;
+
+			if (walk.nbytes == walk.total) {
+				tail = 0;
+				final_iv = orig_iv;
+			}
+
+			if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
+				src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes],
+						   src, walk.nbytes);
+
+			ce_aes_ccm_decrypt(dst, src, walk.nbytes - tail,
+					   ctx->key_enc, num_rounds(ctx),
+					   mac, walk.iv, final_iv);
+
+			if (unlikely(walk.nbytes < AES_BLOCK_SIZE))
+				memcpy(walk.dst.virt.addr, dst, walk.nbytes);
+
+			if (walk.nbytes) {
+				err = skcipher_walk_done(&walk, tail);
+			}
+		} while (walk.nbytes);
+	}
 
 	if (unlikely(err))
 		return err;
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 11/21] crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 10/21] crypto/arm64: aes-ccm - Switch " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 12/21] crypto/arm64: aes-gcm " Ard Biesheuvel
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-ce-glue.c     |  87 ++++++------
 arch/arm64/crypto/aes-glue.c        | 139 ++++++++----------
 arch/arm64/crypto/aes-neonbs-glue.c | 150 ++++++++++----------
 3 files changed, 181 insertions(+), 195 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-glue.c b/arch/arm64/crypto/aes-ce-glue.c
index 00b8749013c5..a4dad370991d 100644
--- a/arch/arm64/crypto/aes-ce-glue.c
+++ b/arch/arm64/crypto/aes-ce-glue.c
@@ -52,9 +52,8 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 		return;
 	}
 
-	kernel_neon_begin();
-	__aes_ce_encrypt(ctx->key_enc, dst, src, num_rounds(ctx));
-	kernel_neon_end();
+	scoped_ksimd()
+		__aes_ce_encrypt(ctx->key_enc, dst, src, num_rounds(ctx));
 }
 
 static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
@@ -66,9 +65,8 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 		return;
 	}
 
-	kernel_neon_begin();
-	__aes_ce_decrypt(ctx->key_dec, dst, src, num_rounds(ctx));
-	kernel_neon_end();
+	scoped_ksimd()
+		__aes_ce_decrypt(ctx->key_dec, dst, src, num_rounds(ctx));
 }
 
 int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
@@ -94,47 +92,48 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 	for (i = 0; i < kwords; i++)
 		ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32));
 
-	kernel_neon_begin();
-	for (i = 0; i < sizeof(rcon); i++) {
-		u32 *rki = ctx->key_enc + (i * kwords);
-		u32 *rko = rki + kwords;
-
-		rko[0] = ror32(__aes_ce_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
-		rko[1] = rko[0] ^ rki[1];
-		rko[2] = rko[1] ^ rki[2];
-		rko[3] = rko[2] ^ rki[3];
-
-		if (key_len == AES_KEYSIZE_192) {
-			if (i >= 7)
-				break;
-			rko[4] = rko[3] ^ rki[4];
-			rko[5] = rko[4] ^ rki[5];
-		} else if (key_len == AES_KEYSIZE_256) {
-			if (i >= 6)
-				break;
-			rko[4] = __aes_ce_sub(rko[3]) ^ rki[4];
-			rko[5] = rko[4] ^ rki[5];
-			rko[6] = rko[5] ^ rki[6];
-			rko[7] = rko[6] ^ rki[7];
+	scoped_ksimd() {
+		for (i = 0; i < sizeof(rcon); i++) {
+			u32 *rki = ctx->key_enc + (i * kwords);
+			u32 *rko = rki + kwords;
+
+			rko[0] = ror32(__aes_ce_sub(rki[kwords - 1]), 8) ^
+				 rcon[i] ^ rki[0];
+			rko[1] = rko[0] ^ rki[1];
+			rko[2] = rko[1] ^ rki[2];
+			rko[3] = rko[2] ^ rki[3];
+
+			if (key_len == AES_KEYSIZE_192) {
+				if (i >= 7)
+					break;
+				rko[4] = rko[3] ^ rki[4];
+				rko[5] = rko[4] ^ rki[5];
+			} else if (key_len == AES_KEYSIZE_256) {
+				if (i >= 6)
+					break;
+				rko[4] = __aes_ce_sub(rko[3]) ^ rki[4];
+				rko[5] = rko[4] ^ rki[5];
+				rko[6] = rko[5] ^ rki[6];
+				rko[7] = rko[6] ^ rki[7];
+			}
 		}
-	}
 
-	/*
-	 * Generate the decryption keys for the Equivalent Inverse Cipher.
-	 * This involves reversing the order of the round keys, and applying
-	 * the Inverse Mix Columns transformation on all but the first and
-	 * the last one.
-	 */
-	key_enc = (struct aes_block *)ctx->key_enc;
-	key_dec = (struct aes_block *)ctx->key_dec;
-	j = num_rounds(ctx);
-
-	key_dec[0] = key_enc[j];
-	for (i = 1, j--; j > 0; i++, j--)
-		__aes_ce_invert(key_dec + i, key_enc + j);
-	key_dec[i] = key_enc[0];
+		/*
+		 * Generate the decryption keys for the Equivalent Inverse
+		 * Cipher.  This involves reversing the order of the round
+		 * keys, and applying the Inverse Mix Columns transformation on
+		 * all but the first and the last one.
+		 */
+		key_enc = (struct aes_block *)ctx->key_enc;
+		key_dec = (struct aes_block *)ctx->key_dec;
+		j = num_rounds(ctx);
+
+		key_dec[0] = key_enc[j];
+		for (i = 1, j--; j > 0; i++, j--)
+			__aes_ce_invert(key_dec + i, key_enc + j);
+		key_dec[i] = key_enc[0];
+	}
 
-	kernel_neon_end();
 	return 0;
 }
 EXPORT_SYMBOL(ce_aes_expandkey);
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 5e207ff34482..b087b900d279 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -5,8 +5,6 @@
  * Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
  */
 
-#include <asm/hwcap.h>
-#include <asm/neon.h>
 #include <crypto/aes.h>
 #include <crypto/ctr.h>
 #include <crypto/internal/hash.h>
@@ -20,6 +18,9 @@
 #include <linux/module.h>
 #include <linux/string.h>
 
+#include <asm/hwcap.h>
+#include <asm/simd.h>
+
 #include "aes-ce-setkey.h"
 
 #ifdef USE_V8_CRYPTO_EXTENSIONS
@@ -186,10 +187,9 @@ static int __maybe_unused ecb_encrypt(struct skcipher_request *req)
 	err = skcipher_walk_virt(&walk, req, false);
 
 	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
-		kernel_neon_begin();
-		aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				ctx->key_enc, rounds, blocks);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+					ctx->key_enc, rounds, blocks);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
 	return err;
@@ -206,10 +206,9 @@ static int __maybe_unused ecb_decrypt(struct skcipher_request *req)
 	err = skcipher_walk_virt(&walk, req, false);
 
 	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
-		kernel_neon_begin();
-		aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				ctx->key_dec, rounds, blocks);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+					ctx->key_dec, rounds, blocks);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
 	return err;
@@ -224,10 +223,9 @@ static int cbc_encrypt_walk(struct skcipher_request *req,
 	unsigned int blocks;
 
 	while ((blocks = (walk->nbytes / AES_BLOCK_SIZE))) {
-		kernel_neon_begin();
-		aes_cbc_encrypt(walk->dst.virt.addr, walk->src.virt.addr,
-				ctx->key_enc, rounds, blocks, walk->iv);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_cbc_encrypt(walk->dst.virt.addr, walk->src.virt.addr,
+					ctx->key_enc, rounds, blocks, walk->iv);
 		err = skcipher_walk_done(walk, walk->nbytes % AES_BLOCK_SIZE);
 	}
 	return err;
@@ -253,10 +251,9 @@ static int cbc_decrypt_walk(struct skcipher_request *req,
 	unsigned int blocks;
 
 	while ((blocks = (walk->nbytes / AES_BLOCK_SIZE))) {
-		kernel_neon_begin();
-		aes_cbc_decrypt(walk->dst.virt.addr, walk->src.virt.addr,
-				ctx->key_dec, rounds, blocks, walk->iv);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_cbc_decrypt(walk->dst.virt.addr, walk->src.virt.addr,
+					ctx->key_dec, rounds, blocks, walk->iv);
 		err = skcipher_walk_done(walk, walk->nbytes % AES_BLOCK_SIZE);
 	}
 	return err;
@@ -322,10 +319,9 @@ static int cts_cbc_encrypt(struct skcipher_request *req)
 	if (err)
 		return err;
 
-	kernel_neon_begin();
-	aes_cbc_cts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-			    ctx->key_enc, rounds, walk.nbytes, walk.iv);
-	kernel_neon_end();
+	scoped_ksimd()
+		aes_cbc_cts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				    ctx->key_enc, rounds, walk.nbytes, walk.iv);
 
 	return skcipher_walk_done(&walk, 0);
 }
@@ -379,10 +375,9 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 	if (err)
 		return err;
 
-	kernel_neon_begin();
-	aes_cbc_cts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-			    ctx->key_dec, rounds, walk.nbytes, walk.iv);
-	kernel_neon_end();
+	scoped_ksimd()
+		aes_cbc_cts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				    ctx->key_dec, rounds, walk.nbytes, walk.iv);
 
 	return skcipher_walk_done(&walk, 0);
 }
@@ -399,11 +394,11 @@ static int __maybe_unused essiv_cbc_encrypt(struct skcipher_request *req)
 
 	blocks = walk.nbytes / AES_BLOCK_SIZE;
 	if (blocks) {
-		kernel_neon_begin();
-		aes_essiv_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				      ctx->key1.key_enc, rounds, blocks,
-				      req->iv, ctx->key2.key_enc);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_essiv_cbc_encrypt(walk.dst.virt.addr,
+					      walk.src.virt.addr,
+					      ctx->key1.key_enc, rounds, blocks,
+					      req->iv, ctx->key2.key_enc);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
 	return err ?: cbc_encrypt_walk(req, &walk);
@@ -421,11 +416,11 @@ static int __maybe_unused essiv_cbc_decrypt(struct skcipher_request *req)
 
 	blocks = walk.nbytes / AES_BLOCK_SIZE;
 	if (blocks) {
-		kernel_neon_begin();
-		aes_essiv_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				      ctx->key1.key_dec, rounds, blocks,
-				      req->iv, ctx->key2.key_enc);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_essiv_cbc_decrypt(walk.dst.virt.addr,
+					      walk.src.virt.addr,
+					      ctx->key1.key_dec, rounds, blocks,
+					      req->iv, ctx->key2.key_enc);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
 	return err ?: cbc_decrypt_walk(req, &walk);
@@ -461,10 +456,9 @@ static int __maybe_unused xctr_encrypt(struct skcipher_request *req)
 		else if (nbytes < walk.total)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
-		kernel_neon_begin();
-		aes_xctr_encrypt(dst, src, ctx->key_enc, rounds, nbytes,
-						 walk.iv, byte_ctr);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_xctr_encrypt(dst, src, ctx->key_enc, rounds, nbytes,
+							 walk.iv, byte_ctr);
 
 		if (unlikely(nbytes < AES_BLOCK_SIZE))
 			memcpy(walk.dst.virt.addr,
@@ -506,10 +500,9 @@ static int __maybe_unused ctr_encrypt(struct skcipher_request *req)
 		else if (nbytes < walk.total)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
-		kernel_neon_begin();
-		aes_ctr_encrypt(dst, src, ctx->key_enc, rounds, nbytes,
-				walk.iv);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_ctr_encrypt(dst, src, ctx->key_enc, rounds, nbytes,
+					walk.iv);
 
 		if (unlikely(nbytes < AES_BLOCK_SIZE))
 			memcpy(walk.dst.virt.addr,
@@ -562,11 +555,10 @@ static int __maybe_unused xts_encrypt(struct skcipher_request *req)
 		if (walk.nbytes < walk.total)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
-		kernel_neon_begin();
-		aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				ctx->key1.key_enc, rounds, nbytes,
-				ctx->key2.key_enc, walk.iv, first);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+					ctx->key1.key_enc, rounds, nbytes,
+					ctx->key2.key_enc, walk.iv, first);
 		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
 	}
 
@@ -584,11 +576,10 @@ static int __maybe_unused xts_encrypt(struct skcipher_request *req)
 	if (err)
 		return err;
 
-	kernel_neon_begin();
-	aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-			ctx->key1.key_enc, rounds, walk.nbytes,
-			ctx->key2.key_enc, walk.iv, first);
-	kernel_neon_end();
+	scoped_ksimd()
+		aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				ctx->key1.key_enc, rounds, walk.nbytes,
+				ctx->key2.key_enc, walk.iv, first);
 
 	return skcipher_walk_done(&walk, 0);
 }
@@ -634,11 +625,10 @@ static int __maybe_unused xts_decrypt(struct skcipher_request *req)
 		if (walk.nbytes < walk.total)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
-		kernel_neon_begin();
-		aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				ctx->key1.key_dec, rounds, nbytes,
-				ctx->key2.key_enc, walk.iv, first);
-		kernel_neon_end();
+		scoped_ksimd()
+			aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+					ctx->key1.key_dec, rounds, nbytes,
+					ctx->key2.key_enc, walk.iv, first);
 		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
 	}
 
@@ -657,11 +647,10 @@ static int __maybe_unused xts_decrypt(struct skcipher_request *req)
 		return err;
 
 
-	kernel_neon_begin();
-	aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-			ctx->key1.key_dec, rounds, walk.nbytes,
-			ctx->key2.key_enc, walk.iv, first);
-	kernel_neon_end();
+	scoped_ksimd()
+		aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				ctx->key1.key_dec, rounds, walk.nbytes,
+				ctx->key2.key_enc, walk.iv, first);
 
 	return skcipher_walk_done(&walk, 0);
 }
@@ -808,10 +797,9 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key,
 		return err;
 
 	/* encrypt the zero vector */
-	kernel_neon_begin();
-	aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, ctx->key.key_enc,
-			rounds, 1);
-	kernel_neon_end();
+	scoped_ksimd()
+		aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){},
+				ctx->key.key_enc, rounds, 1);
 
 	cmac_gf128_mul_by_x(consts, consts);
 	cmac_gf128_mul_by_x(consts + 1, consts);
@@ -837,10 +825,10 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key,
 	if (err)
 		return err;
 
-	kernel_neon_begin();
-	aes_ecb_encrypt(key, ks[0], ctx->key.key_enc, rounds, 1);
-	aes_ecb_encrypt(ctx->consts, ks[1], ctx->key.key_enc, rounds, 2);
-	kernel_neon_end();
+	scoped_ksimd() {
+		aes_ecb_encrypt(key, ks[0], ctx->key.key_enc, rounds, 1);
+		aes_ecb_encrypt(ctx->consts, ks[1], ctx->key.key_enc, rounds, 2);
+	}
 
 	return cbcmac_setkey(tfm, key, sizeof(key));
 }
@@ -860,10 +848,9 @@ static void mac_do_update(struct crypto_aes_ctx *ctx, u8 const in[], int blocks,
 	int rem;
 
 	do {
-		kernel_neon_begin();
-		rem = aes_mac_update(in, ctx->key_enc, rounds, blocks,
-				     dg, enc_before, !enc_before);
-		kernel_neon_end();
+		scoped_ksimd()
+			rem = aes_mac_update(in, ctx->key_enc, rounds, blocks,
+					     dg, enc_before, !enc_before);
 		in += (blocks - rem) * AES_BLOCK_SIZE;
 		blocks = rem;
 	} while (blocks);
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index c4a623e86593..d496effb0a5b 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -85,9 +85,8 @@ static int aesbs_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
 
 	ctx->rounds = 6 + key_len / 4;
 
-	kernel_neon_begin();
-	aesbs_convert_key(ctx->rk, rk.key_enc, ctx->rounds);
-	kernel_neon_end();
+	scoped_ksimd()
+		aesbs_convert_key(ctx->rk, rk.key_enc, ctx->rounds);
 
 	return 0;
 }
@@ -110,10 +109,9 @@ static int __ecb_crypt(struct skcipher_request *req,
 			blocks = round_down(blocks,
 					    walk.stride / AES_BLOCK_SIZE);
 
-		kernel_neon_begin();
-		fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk,
-		   ctx->rounds, blocks);
-		kernel_neon_end();
+		scoped_ksimd()
+			fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk,
+			   ctx->rounds, blocks);
 		err = skcipher_walk_done(&walk,
 					 walk.nbytes - blocks * AES_BLOCK_SIZE);
 	}
@@ -146,9 +144,8 @@ static int aesbs_cbc_ctr_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
 
 	memcpy(ctx->enc, rk.key_enc, sizeof(ctx->enc));
 
-	kernel_neon_begin();
-	aesbs_convert_key(ctx->key.rk, rk.key_enc, ctx->key.rounds);
-	kernel_neon_end();
+	scoped_ksimd()
+		aesbs_convert_key(ctx->key.rk, rk.key_enc, ctx->key.rounds);
 	memzero_explicit(&rk, sizeof(rk));
 
 	return 0;
@@ -167,11 +164,11 @@ static int cbc_encrypt(struct skcipher_request *req)
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
 		/* fall back to the non-bitsliced NEON implementation */
-		kernel_neon_begin();
-		neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				     ctx->enc, ctx->key.rounds, blocks,
-				     walk.iv);
-		kernel_neon_end();
+		scoped_ksimd()
+			neon_aes_cbc_encrypt(walk.dst.virt.addr,
+					     walk.src.virt.addr,
+					     ctx->enc, ctx->key.rounds, blocks,
+					     walk.iv);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
 	return err;
@@ -193,11 +190,10 @@ static int cbc_decrypt(struct skcipher_request *req)
 			blocks = round_down(blocks,
 					    walk.stride / AES_BLOCK_SIZE);
 
-		kernel_neon_begin();
-		aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				  ctx->key.rk, ctx->key.rounds, blocks,
-				  walk.iv);
-		kernel_neon_end();
+		scoped_ksimd()
+			aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+					  ctx->key.rk, ctx->key.rounds, blocks,
+					  walk.iv);
 		err = skcipher_walk_done(&walk,
 					 walk.nbytes - blocks * AES_BLOCK_SIZE);
 	}
@@ -220,30 +216,32 @@ static int ctr_encrypt(struct skcipher_request *req)
 		const u8 *src = walk.src.virt.addr;
 		u8 *dst = walk.dst.virt.addr;
 
-		kernel_neon_begin();
-		if (blocks >= 8) {
-			aesbs_ctr_encrypt(dst, src, ctx->key.rk, ctx->key.rounds,
-					  blocks, walk.iv);
-			dst += blocks * AES_BLOCK_SIZE;
-			src += blocks * AES_BLOCK_SIZE;
-		}
-		if (nbytes && walk.nbytes == walk.total) {
-			u8 buf[AES_BLOCK_SIZE];
-			u8 *d = dst;
-
-			if (unlikely(nbytes < AES_BLOCK_SIZE))
-				src = dst = memcpy(buf + sizeof(buf) - nbytes,
-						   src, nbytes);
-
-			neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds,
-					     nbytes, walk.iv);
+		scoped_ksimd() {
+			if (blocks >= 8) {
+				aesbs_ctr_encrypt(dst, src, ctx->key.rk,
+						  ctx->key.rounds, blocks,
+						  walk.iv);
+				dst += blocks * AES_BLOCK_SIZE;
+				src += blocks * AES_BLOCK_SIZE;
+			}
+			if (nbytes && walk.nbytes == walk.total) {
+				u8 buf[AES_BLOCK_SIZE];
+				u8 *d = dst;
+
+				if (unlikely(nbytes < AES_BLOCK_SIZE))
+					src = dst = memcpy(buf + sizeof(buf) -
+							   nbytes, src, nbytes);
+
+				neon_aes_ctr_encrypt(dst, src, ctx->enc,
+						     ctx->key.rounds, nbytes,
+						     walk.iv);
 
-			if (unlikely(nbytes < AES_BLOCK_SIZE))
-				memcpy(d, dst, nbytes);
+				if (unlikely(nbytes < AES_BLOCK_SIZE))
+					memcpy(d, dst, nbytes);
 
-			nbytes = 0;
+				nbytes = 0;
+			}
 		}
-		kernel_neon_end();
 		err = skcipher_walk_done(&walk, nbytes);
 	}
 	return err;
@@ -320,33 +318,33 @@ static int __xts_crypt(struct skcipher_request *req, bool encrypt,
 		in = walk.src.virt.addr;
 		nbytes = walk.nbytes;
 
-		kernel_neon_begin();
-		if (blocks >= 8) {
-			if (first == 1)
-				neon_aes_ecb_encrypt(walk.iv, walk.iv,
-						     ctx->twkey,
-						     ctx->key.rounds, 1);
-			first = 2;
-
-			fn(out, in, ctx->key.rk, ctx->key.rounds, blocks,
-			   walk.iv);
-
-			out += blocks * AES_BLOCK_SIZE;
-			in += blocks * AES_BLOCK_SIZE;
-			nbytes -= blocks * AES_BLOCK_SIZE;
+		scoped_ksimd() {
+			if (blocks >= 8) {
+				if (first == 1)
+					neon_aes_ecb_encrypt(walk.iv, walk.iv,
+							     ctx->twkey,
+							     ctx->key.rounds, 1);
+				first = 2;
+
+				fn(out, in, ctx->key.rk, ctx->key.rounds, blocks,
+				   walk.iv);
+
+				out += blocks * AES_BLOCK_SIZE;
+				in += blocks * AES_BLOCK_SIZE;
+				nbytes -= blocks * AES_BLOCK_SIZE;
+			}
+			if (walk.nbytes == walk.total && nbytes > 0) {
+				if (encrypt)
+					neon_aes_xts_encrypt(out, in, ctx->cts.key_enc,
+							     ctx->key.rounds, nbytes,
+							     ctx->twkey, walk.iv, first);
+				else
+					neon_aes_xts_decrypt(out, in, ctx->cts.key_dec,
+							     ctx->key.rounds, nbytes,
+							     ctx->twkey, walk.iv, first);
+				nbytes = first = 0;
+			}
 		}
-		if (walk.nbytes == walk.total && nbytes > 0) {
-			if (encrypt)
-				neon_aes_xts_encrypt(out, in, ctx->cts.key_enc,
-						     ctx->key.rounds, nbytes,
-						     ctx->twkey, walk.iv, first);
-			else
-				neon_aes_xts_decrypt(out, in, ctx->cts.key_dec,
-						     ctx->key.rounds, nbytes,
-						     ctx->twkey, walk.iv, first);
-			nbytes = first = 0;
-		}
-		kernel_neon_end();
 		err = skcipher_walk_done(&walk, nbytes);
 	}
 
@@ -369,14 +367,16 @@ static int __xts_crypt(struct skcipher_request *req, bool encrypt,
 	in = walk.src.virt.addr;
 	nbytes = walk.nbytes;
 
-	kernel_neon_begin();
-	if (encrypt)
-		neon_aes_xts_encrypt(out, in, ctx->cts.key_enc, ctx->key.rounds,
-				     nbytes, ctx->twkey, walk.iv, first);
-	else
-		neon_aes_xts_decrypt(out, in, ctx->cts.key_dec, ctx->key.rounds,
-				     nbytes, ctx->twkey, walk.iv, first);
-	kernel_neon_end();
+	scoped_ksimd() {
+		if (encrypt)
+			neon_aes_xts_encrypt(out, in, ctx->cts.key_enc,
+					     ctx->key.rounds, nbytes, ctx->twkey,
+					     walk.iv, first);
+		else
+			neon_aes_xts_decrypt(out, in, ctx->cts.key_dec,
+					     ctx->key.rounds, nbytes, ctx->twkey,
+					     walk.iv, first);
+	}
 
 	return skcipher_walk_done(&walk, 0);
 }
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 12/21] crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 11/21] crypto/arm64: aes-blk " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 13/21] crypto/arm64: nhpoly1305 " Ard Biesheuvel
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/ghash-ce-glue.c | 27 ++++++++++----------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index 4995b6e22335..7951557a285a 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -5,7 +5,6 @@
  * Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
  */
 
-#include <asm/neon.h>
 #include <crypto/aes.h>
 #include <crypto/b128ops.h>
 #include <crypto/gcm.h>
@@ -22,6 +21,8 @@
 #include <linux/string.h>
 #include <linux/unaligned.h>
 
+#include <asm/simd.h>
+
 MODULE_DESCRIPTION("GHASH and AES-GCM using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
@@ -74,9 +75,8 @@ void ghash_do_simd_update(int blocks, u64 dg[], const char *src,
 					      u64 const h[][2],
 					      const char *head))
 {
-	kernel_neon_begin();
-	simd_update(blocks, dg, src, key->h, head);
-	kernel_neon_end();
+	scoped_ksimd()
+		simd_update(blocks, dg, src, key->h, head);
 }
 
 /* avoid hogging the CPU for too long */
@@ -329,11 +329,10 @@ static int gcm_encrypt(struct aead_request *req, char *iv, int assoclen)
 			tag = NULL;
 		}
 
-		kernel_neon_begin();
-		pmull_gcm_encrypt(nbytes, dst, src, ctx->ghash_key.h,
-				  dg, iv, ctx->aes_key.key_enc, nrounds,
-				  tag);
-		kernel_neon_end();
+		scoped_ksimd()
+			pmull_gcm_encrypt(nbytes, dst, src, ctx->ghash_key.h,
+					  dg, iv, ctx->aes_key.key_enc, nrounds,
+					  tag);
 
 		if (unlikely(!nbytes))
 			break;
@@ -399,11 +398,11 @@ static int gcm_decrypt(struct aead_request *req, char *iv, int assoclen)
 			tag = NULL;
 		}
 
-		kernel_neon_begin();
-		ret = pmull_gcm_decrypt(nbytes, dst, src, ctx->ghash_key.h,
-					dg, iv, ctx->aes_key.key_enc,
-					nrounds, tag, otag, authsize);
-		kernel_neon_end();
+		scoped_ksimd()
+			ret = pmull_gcm_decrypt(nbytes, dst, src,
+						ctx->ghash_key.h,
+						dg, iv, ctx->aes_key.key_enc,
+						nrounds, tag, otag, authsize);
 
 		if (unlikely(!nbytes))
 			break;
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 13/21] crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 12/21] crypto/arm64: aes-gcm " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 14/21] crypto/arm64: polyval " Ard Biesheuvel
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/nhpoly1305-neon-glue.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/crypto/nhpoly1305-neon-glue.c b/arch/arm64/crypto/nhpoly1305-neon-glue.c
index e4a0b463f080..013de6ac569a 100644
--- a/arch/arm64/crypto/nhpoly1305-neon-glue.c
+++ b/arch/arm64/crypto/nhpoly1305-neon-glue.c
@@ -25,9 +25,8 @@ static int nhpoly1305_neon_update(struct shash_desc *desc,
 	do {
 		unsigned int n = min_t(unsigned int, srclen, SZ_4K);
 
-		kernel_neon_begin();
-		crypto_nhpoly1305_update_helper(desc, src, n, nh_neon);
-		kernel_neon_end();
+		scoped_ksimd()
+			crypto_nhpoly1305_update_helper(desc, src, n, nh_neon);
 		src += n;
 		srclen -= n;
 	} while (srclen);
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 14/21] crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 13/21] crypto/arm64: nhpoly1305 " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 15/21] crypto/arm64: sha3 " Ard Biesheuvel
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/polyval-ce-glue.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c
index c4e653688ea0..51eefbe97885 100644
--- a/arch/arm64/crypto/polyval-ce-glue.c
+++ b/arch/arm64/crypto/polyval-ce-glue.c
@@ -15,7 +15,7 @@
  * ARMv8 Crypto Extensions instructions to implement the finite field operations.
  */
 
-#include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/internal/hash.h>
 #include <crypto/polyval.h>
 #include <crypto/utils.h>
@@ -45,16 +45,14 @@ asmlinkage void pmull_polyval_mul(u8 *op1, const u8 *op2);
 static void internal_polyval_update(const struct polyval_tfm_ctx *keys,
 	const u8 *in, size_t nblocks, u8 *accumulator)
 {
-	kernel_neon_begin();
-	pmull_polyval_update(keys, in, nblocks, accumulator);
-	kernel_neon_end();
+	scoped_ksimd()
+		pmull_polyval_update(keys, in, nblocks, accumulator);
 }
 
 static void internal_polyval_mul(u8 *op1, const u8 *op2)
 {
-	kernel_neon_begin();
-	pmull_polyval_mul(op1, op2);
-	kernel_neon_end();
+	scoped_ksimd()
+		pmull_polyval_mul(op1, op2);
 }
 
 static int polyval_arm64_setkey(struct crypto_shash *tfm,
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 15/21] crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 14/21] crypto/arm64: polyval " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 16/21] crypto/arm64: sm3 " Ard Biesheuvel
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/sha3-ce-glue.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c
index b4f1001046c9..22732760edd3 100644
--- a/arch/arm64/crypto/sha3-ce-glue.c
+++ b/arch/arm64/crypto/sha3-ce-glue.c
@@ -46,9 +46,8 @@ static int sha3_update(struct shash_desc *desc, const u8 *data,
 	do {
 		int rem;
 
-		kernel_neon_begin();
-		rem = sha3_ce_transform(sctx->st, data, blocks, ds);
-		kernel_neon_end();
+		scoped_ksimd()
+			rem = sha3_ce_transform(sctx->st, data, blocks, ds);
 		data += (blocks - rem) * bs;
 		blocks = rem;
 	} while (blocks);
@@ -73,9 +72,8 @@ static int sha3_finup(struct shash_desc *desc, const u8 *src, unsigned int len,
 	memset(block + len, 0, bs - len);
 	block[bs - 1] |= 0x80;
 
-	kernel_neon_begin();
-	sha3_ce_transform(sctx->st, block, 1, ds);
-	kernel_neon_end();
+	scoped_ksimd()
+		sha3_ce_transform(sctx->st, block, 1, ds);
 	memzero_explicit(block , sizeof(block));
 
 	for (i = 0; i < ds / 8; i++)
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 16/21] crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 15/21] crypto/arm64: sha3 " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 13:52   ` Jonathan Cameron
  2025-10-31 10:39 ` [PATCH v4 17/21] crypto/arm64: sm4 " Ard Biesheuvel
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/sm3-ce-glue.c   | 15 ++++++++-------
 arch/arm64/crypto/sm3-neon-glue.c | 16 ++++++----------
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c
index eac6f5fa0abe..24c1fcfae072 100644
--- a/arch/arm64/crypto/sm3-ce-glue.c
+++ b/arch/arm64/crypto/sm3-ce-glue.c
@@ -5,7 +5,6 @@
  * Copyright (C) 2018 Linaro Ltd <ard.biesheuvel@linaro.org>
  */
 
-#include <asm/neon.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sm3.h>
 #include <crypto/sm3_base.h>
@@ -13,6 +12,8 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
+#include <asm/simd.h>
+
 MODULE_DESCRIPTION("SM3 secure hash using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
@@ -25,18 +26,18 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
 {
 	int remain;
 
-	kernel_neon_begin();
-	remain = sm3_base_do_update_blocks(desc, data, len, sm3_ce_transform);
-	kernel_neon_end();
+	scoped_ksimd() {
+		remain = sm3_base_do_update_blocks(desc, data, len, sm3_ce_transform);
+	}
 	return remain;
 }
 
 static int sm3_ce_finup(struct shash_desc *desc, const u8 *data,
 			unsigned int len, u8 *out)
 {
-	kernel_neon_begin();
-	sm3_base_do_finup(desc, data, len, sm3_ce_transform);
-	kernel_neon_end();
+	scoped_ksimd() {
+		sm3_base_do_finup(desc, data, len, sm3_ce_transform);
+	}
 	return sm3_base_finish(desc, out);
 }
 
diff --git a/arch/arm64/crypto/sm3-neon-glue.c b/arch/arm64/crypto/sm3-neon-glue.c
index 6c4611a503a3..15f30cc24f32 100644
--- a/arch/arm64/crypto/sm3-neon-glue.c
+++ b/arch/arm64/crypto/sm3-neon-glue.c
@@ -5,7 +5,7 @@
  * Copyright (C) 2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
  */
 
-#include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/internal/hash.h>
 #include <crypto/sm3.h>
 #include <crypto/sm3_base.h>
@@ -20,20 +20,16 @@ asmlinkage void sm3_neon_transform(struct sm3_state *sst, u8 const *src,
 static int sm3_neon_update(struct shash_desc *desc, const u8 *data,
 			   unsigned int len)
 {
-	int remain;
-
-	kernel_neon_begin();
-	remain = sm3_base_do_update_blocks(desc, data, len, sm3_neon_transform);
-	kernel_neon_end();
-	return remain;
+	scoped_ksimd()
+		return sm3_base_do_update_blocks(desc, data, len,
+						 sm3_neon_transform);
 }
 
 static int sm3_neon_finup(struct shash_desc *desc, const u8 *data,
 			  unsigned int len, u8 *out)
 {
-	kernel_neon_begin();
-	sm3_base_do_finup(desc, data, len, sm3_neon_transform);
-	kernel_neon_end();
+	scoped_ksimd()
+		sm3_base_do_finup(desc, data, len, sm3_neon_transform);
 	return sm3_base_finish(desc, out);
 }
 
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 17/21] crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 16/21] crypto/arm64: sm3 " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 18/21] arm64/xorblocks: " Ard Biesheuvel
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/sm4-ce-ccm-glue.c    |  38 ++--
 arch/arm64/crypto/sm4-ce-cipher-glue.c |  10 +-
 arch/arm64/crypto/sm4-ce-gcm-glue.c    |  53 +++--
 arch/arm64/crypto/sm4-ce-glue.c        | 214 +++++++++-----------
 arch/arm64/crypto/sm4-neon-glue.c      |  25 +--
 5 files changed, 149 insertions(+), 191 deletions(-)

diff --git a/arch/arm64/crypto/sm4-ce-ccm-glue.c b/arch/arm64/crypto/sm4-ce-ccm-glue.c
index e92cbdf1aaee..332f02167a96 100644
--- a/arch/arm64/crypto/sm4-ce-ccm-glue.c
+++ b/arch/arm64/crypto/sm4-ce-ccm-glue.c
@@ -11,7 +11,7 @@
 #include <linux/crypto.h>
 #include <linux/kernel.h>
 #include <linux/cpufeature.h>
-#include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/scatterwalk.h>
 #include <crypto/internal/aead.h>
 #include <crypto/internal/skcipher.h>
@@ -35,10 +35,9 @@ static int ccm_setkey(struct crypto_aead *tfm, const u8 *key,
 	if (key_len != SM4_KEY_SIZE)
 		return -EINVAL;
 
-	kernel_neon_begin();
-	sm4_ce_expand_key(key, ctx->rkey_enc, ctx->rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
-	kernel_neon_end();
+	scoped_ksimd()
+		sm4_ce_expand_key(key, ctx->rkey_enc, ctx->rkey_dec,
+				  crypto_sm4_fk, crypto_sm4_ck);
 
 	return 0;
 }
@@ -167,28 +166,25 @@ static int ccm_crypt(struct aead_request *req, struct skcipher_walk *walk,
 	memcpy(ctr0, walk->iv, SM4_BLOCK_SIZE);
 	crypto_inc(walk->iv, SM4_BLOCK_SIZE);
 
-	kernel_neon_begin();
+	scoped_ksimd() {
+		if (req->assoclen)
+			ccm_calculate_auth_mac(req, mac);
 
-	if (req->assoclen)
-		ccm_calculate_auth_mac(req, mac);
-
-	while (walk->nbytes) {
-		unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
+		while (walk->nbytes) {
+			unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
 
-		if (walk->nbytes == walk->total)
-			tail = 0;
+			if (walk->nbytes == walk->total)
+				tail = 0;
 
-		sm4_ce_ccm_crypt(rkey_enc, walk->dst.virt.addr,
-				 walk->src.virt.addr, walk->iv,
-				 walk->nbytes - tail, mac);
+			sm4_ce_ccm_crypt(rkey_enc, walk->dst.virt.addr,
+					 walk->src.virt.addr, walk->iv,
+					 walk->nbytes - tail, mac);
 
-		err = skcipher_walk_done(walk, tail);
+			err = skcipher_walk_done(walk, tail);
+		}
+		sm4_ce_ccm_final(rkey_enc, ctr0, mac);
 	}
 
-	sm4_ce_ccm_final(rkey_enc, ctr0, mac);
-
-	kernel_neon_end();
-
 	return err;
 }
 
diff --git a/arch/arm64/crypto/sm4-ce-cipher-glue.c b/arch/arm64/crypto/sm4-ce-cipher-glue.c
index c31d76fb5a17..bceec833ef4e 100644
--- a/arch/arm64/crypto/sm4-ce-cipher-glue.c
+++ b/arch/arm64/crypto/sm4-ce-cipher-glue.c
@@ -32,9 +32,8 @@ static void sm4_ce_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	if (!crypto_simd_usable()) {
 		sm4_crypt_block(ctx->rkey_enc, out, in);
 	} else {
-		kernel_neon_begin();
-		sm4_ce_do_crypt(ctx->rkey_enc, out, in);
-		kernel_neon_end();
+		scoped_ksimd()
+			sm4_ce_do_crypt(ctx->rkey_enc, out, in);
 	}
 }
 
@@ -45,9 +44,8 @@ static void sm4_ce_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	if (!crypto_simd_usable()) {
 		sm4_crypt_block(ctx->rkey_dec, out, in);
 	} else {
-		kernel_neon_begin();
-		sm4_ce_do_crypt(ctx->rkey_dec, out, in);
-		kernel_neon_end();
+		scoped_ksimd()
+			sm4_ce_do_crypt(ctx->rkey_dec, out, in);
 	}
 }
 
diff --git a/arch/arm64/crypto/sm4-ce-gcm-glue.c b/arch/arm64/crypto/sm4-ce-gcm-glue.c
index 8f6fc8c33c3f..ef06f4f768a1 100644
--- a/arch/arm64/crypto/sm4-ce-gcm-glue.c
+++ b/arch/arm64/crypto/sm4-ce-gcm-glue.c
@@ -11,7 +11,7 @@
 #include <linux/crypto.h>
 #include <linux/kernel.h>
 #include <linux/cpufeature.h>
-#include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/b128ops.h>
 #include <crypto/scatterwalk.h>
 #include <crypto/internal/aead.h>
@@ -48,13 +48,11 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *key,
 	if (key_len != SM4_KEY_SIZE)
 		return -EINVAL;
 
-	kernel_neon_begin();
-
-	sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
-	sm4_ce_pmull_ghash_setup(ctx->key.rkey_enc, ctx->ghash_table);
-
-	kernel_neon_end();
+	scoped_ksimd() {
+		sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
+				crypto_sm4_fk, crypto_sm4_ck);
+		sm4_ce_pmull_ghash_setup(ctx->key.rkey_enc, ctx->ghash_table);
+	}
 	return 0;
 }
 
@@ -149,31 +147,28 @@ static int gcm_crypt(struct aead_request *req, struct skcipher_walk *walk,
 	memcpy(iv, req->iv, GCM_IV_SIZE);
 	put_unaligned_be32(2, iv + GCM_IV_SIZE);
 
-	kernel_neon_begin();
+	scoped_ksimd() {
+		if (req->assoclen)
+			gcm_calculate_auth_mac(req, ghash);
 
-	if (req->assoclen)
-		gcm_calculate_auth_mac(req, ghash);
+		do {
+			unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
+			const u8 *src = walk->src.virt.addr;
+			u8 *dst = walk->dst.virt.addr;
+			const u8 *l = NULL;
 
-	do {
-		unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
-		const u8 *src = walk->src.virt.addr;
-		u8 *dst = walk->dst.virt.addr;
-		const u8 *l = NULL;
-
-		if (walk->nbytes == walk->total) {
-			l = (const u8 *)&lengths;
-			tail = 0;
-		}
-
-		sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv,
-				       walk->nbytes - tail, ghash,
-				       ctx->ghash_table, l);
-
-		err = skcipher_walk_done(walk, tail);
-	} while (walk->nbytes);
+			if (walk->nbytes == walk->total) {
+				l = (const u8 *)&lengths;
+				tail = 0;
+			}
 
-	kernel_neon_end();
+			sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv,
+					       walk->nbytes - tail, ghash,
+					       ctx->ghash_table, l);
 
+			err = skcipher_walk_done(walk, tail);
+		} while (walk->nbytes);
+	}
 	return err;
 }
 
diff --git a/arch/arm64/crypto/sm4-ce-glue.c b/arch/arm64/crypto/sm4-ce-glue.c
index 7a60e7b559dc..5569cece5a0b 100644
--- a/arch/arm64/crypto/sm4-ce-glue.c
+++ b/arch/arm64/crypto/sm4-ce-glue.c
@@ -8,7 +8,7 @@
  * Copyright (C) 2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
  */
 
-#include <asm/neon.h>
+#include <asm/simd.h>
 #include <crypto/b128ops.h>
 #include <crypto/internal/hash.h>
 #include <crypto/internal/skcipher.h>
@@ -74,10 +74,9 @@ static int sm4_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	if (key_len != SM4_KEY_SIZE)
 		return -EINVAL;
 
-	kernel_neon_begin();
-	sm4_ce_expand_key(key, ctx->rkey_enc, ctx->rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
-	kernel_neon_end();
+	scoped_ksimd()
+		sm4_ce_expand_key(key, ctx->rkey_enc, ctx->rkey_dec,
+				  crypto_sm4_fk, crypto_sm4_ck);
 	return 0;
 }
 
@@ -94,12 +93,12 @@ static int sm4_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	if (ret)
 		return ret;
 
-	kernel_neon_begin();
-	sm4_ce_expand_key(key, ctx->key1.rkey_enc,
-			  ctx->key1.rkey_dec, crypto_sm4_fk, crypto_sm4_ck);
-	sm4_ce_expand_key(&key[SM4_KEY_SIZE], ctx->key2.rkey_enc,
-			  ctx->key2.rkey_dec, crypto_sm4_fk, crypto_sm4_ck);
-	kernel_neon_end();
+	scoped_ksimd() {
+		sm4_ce_expand_key(key, ctx->key1.rkey_enc,
+				ctx->key1.rkey_dec, crypto_sm4_fk, crypto_sm4_ck);
+		sm4_ce_expand_key(&key[SM4_KEY_SIZE], ctx->key2.rkey_enc,
+				ctx->key2.rkey_dec, crypto_sm4_fk, crypto_sm4_ck);
+	}
 
 	return 0;
 }
@@ -117,16 +116,14 @@ static int sm4_ecb_do_crypt(struct skcipher_request *req, const u32 *rkey)
 		u8 *dst = walk.dst.virt.addr;
 		unsigned int nblks;
 
-		kernel_neon_begin();
-
-		nblks = BYTES2BLKS(nbytes);
-		if (nblks) {
-			sm4_ce_crypt(rkey, dst, src, nblks);
-			nbytes -= nblks * SM4_BLOCK_SIZE;
+		scoped_ksimd() {
+			nblks = BYTES2BLKS(nbytes);
+			if (nblks) {
+				sm4_ce_crypt(rkey, dst, src, nblks);
+				nbytes -= nblks * SM4_BLOCK_SIZE;
+			}
 		}
 
-		kernel_neon_end();
-
 		err = skcipher_walk_done(&walk, nbytes);
 	}
 
@@ -167,16 +164,14 @@ static int sm4_cbc_crypt(struct skcipher_request *req,
 
 		nblocks = nbytes / SM4_BLOCK_SIZE;
 		if (nblocks) {
-			kernel_neon_begin();
-
-			if (encrypt)
-				sm4_ce_cbc_enc(ctx->rkey_enc, dst, src,
-					       walk.iv, nblocks);
-			else
-				sm4_ce_cbc_dec(ctx->rkey_dec, dst, src,
-					       walk.iv, nblocks);
-
-			kernel_neon_end();
+			scoped_ksimd() {
+				if (encrypt)
+					sm4_ce_cbc_enc(ctx->rkey_enc, dst, src,
+						       walk.iv, nblocks);
+				else
+					sm4_ce_cbc_dec(ctx->rkey_dec, dst, src,
+						       walk.iv, nblocks);
+			}
 		}
 
 		err = skcipher_walk_done(&walk, nbytes % SM4_BLOCK_SIZE);
@@ -249,16 +244,14 @@ static int sm4_cbc_cts_crypt(struct skcipher_request *req, bool encrypt)
 	if (err)
 		return err;
 
-	kernel_neon_begin();
-
-	if (encrypt)
-		sm4_ce_cbc_cts_enc(ctx->rkey_enc, walk.dst.virt.addr,
-				   walk.src.virt.addr, walk.iv, walk.nbytes);
-	else
-		sm4_ce_cbc_cts_dec(ctx->rkey_dec, walk.dst.virt.addr,
-				   walk.src.virt.addr, walk.iv, walk.nbytes);
-
-	kernel_neon_end();
+	scoped_ksimd() {
+		if (encrypt)
+			sm4_ce_cbc_cts_enc(ctx->rkey_enc, walk.dst.virt.addr,
+					   walk.src.virt.addr, walk.iv, walk.nbytes);
+		else
+			sm4_ce_cbc_cts_dec(ctx->rkey_dec, walk.dst.virt.addr,
+					   walk.src.virt.addr, walk.iv, walk.nbytes);
+	}
 
 	return skcipher_walk_done(&walk, 0);
 }
@@ -288,28 +281,26 @@ static int sm4_ctr_crypt(struct skcipher_request *req)
 		u8 *dst = walk.dst.virt.addr;
 		unsigned int nblks;
 
-		kernel_neon_begin();
-
-		nblks = BYTES2BLKS(nbytes);
-		if (nblks) {
-			sm4_ce_ctr_enc(ctx->rkey_enc, dst, src, walk.iv, nblks);
-			dst += nblks * SM4_BLOCK_SIZE;
-			src += nblks * SM4_BLOCK_SIZE;
-			nbytes -= nblks * SM4_BLOCK_SIZE;
-		}
-
-		/* tail */
-		if (walk.nbytes == walk.total && nbytes > 0) {
-			u8 keystream[SM4_BLOCK_SIZE];
-
-			sm4_ce_crypt_block(ctx->rkey_enc, keystream, walk.iv);
-			crypto_inc(walk.iv, SM4_BLOCK_SIZE);
-			crypto_xor_cpy(dst, src, keystream, nbytes);
-			nbytes = 0;
+		scoped_ksimd() {
+			nblks = BYTES2BLKS(nbytes);
+			if (nblks) {
+				sm4_ce_ctr_enc(ctx->rkey_enc, dst, src, walk.iv, nblks);
+				dst += nblks * SM4_BLOCK_SIZE;
+				src += nblks * SM4_BLOCK_SIZE;
+				nbytes -= nblks * SM4_BLOCK_SIZE;
+			}
+
+			/* tail */
+			if (walk.nbytes == walk.total && nbytes > 0) {
+				u8 keystream[SM4_BLOCK_SIZE];
+
+				sm4_ce_crypt_block(ctx->rkey_enc, keystream, walk.iv);
+				crypto_inc(walk.iv, SM4_BLOCK_SIZE);
+				crypto_xor_cpy(dst, src, keystream, nbytes);
+				nbytes = 0;
+			}
 		}
 
-		kernel_neon_end();
-
 		err = skcipher_walk_done(&walk, nbytes);
 	}
 
@@ -359,18 +350,16 @@ static int sm4_xts_crypt(struct skcipher_request *req, bool encrypt)
 		if (nbytes < walk.total)
 			nbytes &= ~(SM4_BLOCK_SIZE - 1);
 
-		kernel_neon_begin();
-
-		if (encrypt)
-			sm4_ce_xts_enc(ctx->key1.rkey_enc, walk.dst.virt.addr,
-				       walk.src.virt.addr, walk.iv, nbytes,
-				       rkey2_enc);
-		else
-			sm4_ce_xts_dec(ctx->key1.rkey_dec, walk.dst.virt.addr,
-				       walk.src.virt.addr, walk.iv, nbytes,
-				       rkey2_enc);
-
-		kernel_neon_end();
+		scoped_ksimd() {
+			if (encrypt)
+				sm4_ce_xts_enc(ctx->key1.rkey_enc, walk.dst.virt.addr,
+						walk.src.virt.addr, walk.iv, nbytes,
+						rkey2_enc);
+			else
+				sm4_ce_xts_dec(ctx->key1.rkey_dec, walk.dst.virt.addr,
+						walk.src.virt.addr, walk.iv, nbytes,
+						rkey2_enc);
+		}
 
 		rkey2_enc = NULL;
 
@@ -395,18 +384,16 @@ static int sm4_xts_crypt(struct skcipher_request *req, bool encrypt)
 	if (err)
 		return err;
 
-	kernel_neon_begin();
-
-	if (encrypt)
-		sm4_ce_xts_enc(ctx->key1.rkey_enc, walk.dst.virt.addr,
-			       walk.src.virt.addr, walk.iv, walk.nbytes,
-			       rkey2_enc);
-	else
-		sm4_ce_xts_dec(ctx->key1.rkey_dec, walk.dst.virt.addr,
-			       walk.src.virt.addr, walk.iv, walk.nbytes,
-			       rkey2_enc);
-
-	kernel_neon_end();
+	scoped_ksimd() {
+		if (encrypt)
+			sm4_ce_xts_enc(ctx->key1.rkey_enc, walk.dst.virt.addr,
+					walk.src.virt.addr, walk.iv, walk.nbytes,
+					rkey2_enc);
+		else
+			sm4_ce_xts_dec(ctx->key1.rkey_dec, walk.dst.virt.addr,
+					walk.src.virt.addr, walk.iv, walk.nbytes,
+					rkey2_enc);
+	}
 
 	return skcipher_walk_done(&walk, 0);
 }
@@ -510,11 +497,9 @@ static int sm4_cbcmac_setkey(struct crypto_shash *tfm, const u8 *key,
 	if (key_len != SM4_KEY_SIZE)
 		return -EINVAL;
 
-	kernel_neon_begin();
-	sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
-	kernel_neon_end();
-
+	scoped_ksimd()
+		sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
+				crypto_sm4_fk, crypto_sm4_ck);
 	return 0;
 }
 
@@ -530,15 +515,13 @@ static int sm4_cmac_setkey(struct crypto_shash *tfm, const u8 *key,
 
 	memset(consts, 0, SM4_BLOCK_SIZE);
 
-	kernel_neon_begin();
-
-	sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
+	scoped_ksimd() {
+		sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
+				crypto_sm4_fk, crypto_sm4_ck);
 
-	/* encrypt the zero block */
-	sm4_ce_crypt_block(ctx->key.rkey_enc, (u8 *)consts, (const u8 *)consts);
-
-	kernel_neon_end();
+		/* encrypt the zero block */
+		sm4_ce_crypt_block(ctx->key.rkey_enc, (u8 *)consts, (const u8 *)consts);
+	}
 
 	/* gf(2^128) multiply zero-ciphertext with u and u^2 */
 	a = be64_to_cpu(consts[0].a);
@@ -568,18 +551,16 @@ static int sm4_xcbc_setkey(struct crypto_shash *tfm, const u8 *key,
 	if (key_len != SM4_KEY_SIZE)
 		return -EINVAL;
 
-	kernel_neon_begin();
-
-	sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
+	scoped_ksimd() {
+		sm4_ce_expand_key(key, ctx->key.rkey_enc, ctx->key.rkey_dec,
+				crypto_sm4_fk, crypto_sm4_ck);
 
-	sm4_ce_crypt_block(ctx->key.rkey_enc, key2, ks[0]);
-	sm4_ce_crypt(ctx->key.rkey_enc, ctx->consts, ks[1], 2);
+		sm4_ce_crypt_block(ctx->key.rkey_enc, key2, ks[0]);
+		sm4_ce_crypt(ctx->key.rkey_enc, ctx->consts, ks[1], 2);
 
-	sm4_ce_expand_key(key2, ctx->key.rkey_enc, ctx->key.rkey_dec,
-			  crypto_sm4_fk, crypto_sm4_ck);
-
-	kernel_neon_end();
+		sm4_ce_expand_key(key2, ctx->key.rkey_enc, ctx->key.rkey_dec,
+				crypto_sm4_fk, crypto_sm4_ck);
+	}
 
 	return 0;
 }
@@ -600,10 +581,9 @@ static int sm4_mac_update(struct shash_desc *desc, const u8 *p,
 	unsigned int nblocks = len / SM4_BLOCK_SIZE;
 
 	len %= SM4_BLOCK_SIZE;
-	kernel_neon_begin();
-	sm4_ce_mac_update(tctx->key.rkey_enc, ctx->digest, p,
-			  nblocks, false, true);
-	kernel_neon_end();
+	scoped_ksimd()
+		sm4_ce_mac_update(tctx->key.rkey_enc, ctx->digest, p,
+				nblocks, false, true);
 	return len;
 }
 
@@ -619,10 +599,9 @@ static int sm4_cmac_finup(struct shash_desc *desc, const u8 *src,
 		ctx->digest[len] ^= 0x80;
 		consts += SM4_BLOCK_SIZE;
 	}
-	kernel_neon_begin();
-	sm4_ce_mac_update(tctx->key.rkey_enc, ctx->digest, consts, 1,
-			  false, true);
-	kernel_neon_end();
+	scoped_ksimd()
+		sm4_ce_mac_update(tctx->key.rkey_enc, ctx->digest, consts, 1,
+				  false, true);
 	memcpy(out, ctx->digest, SM4_BLOCK_SIZE);
 	return 0;
 }
@@ -635,10 +614,9 @@ static int sm4_cbcmac_finup(struct shash_desc *desc, const u8 *src,
 
 	if (len) {
 		crypto_xor(ctx->digest, src, len);
-		kernel_neon_begin();
-		sm4_ce_crypt_block(tctx->key.rkey_enc, ctx->digest,
-				   ctx->digest);
-		kernel_neon_end();
+		scoped_ksimd()
+			sm4_ce_crypt_block(tctx->key.rkey_enc, ctx->digest,
+					   ctx->digest);
 	}
 	memcpy(out, ctx->digest, SM4_BLOCK_SIZE);
 	return 0;
diff --git a/arch/arm64/crypto/sm4-neon-glue.c b/arch/arm64/crypto/sm4-neon-glue.c
index e3500aca2d18..e944c2a2efb0 100644
--- a/arch/arm64/crypto/sm4-neon-glue.c
+++ b/arch/arm64/crypto/sm4-neon-glue.c
@@ -48,11 +48,8 @@ static int sm4_ecb_do_crypt(struct skcipher_request *req, const u32 *rkey)
 
 		nblocks = nbytes / SM4_BLOCK_SIZE;
 		if (nblocks) {
-			kernel_neon_begin();
-
-			sm4_neon_crypt(rkey, dst, src, nblocks);
-
-			kernel_neon_end();
+			scoped_ksimd()
+				sm4_neon_crypt(rkey, dst, src, nblocks);
 		}
 
 		err = skcipher_walk_done(&walk, nbytes % SM4_BLOCK_SIZE);
@@ -126,12 +123,9 @@ static int sm4_cbc_decrypt(struct skcipher_request *req)
 
 		nblocks = nbytes / SM4_BLOCK_SIZE;
 		if (nblocks) {
-			kernel_neon_begin();
-
-			sm4_neon_cbc_dec(ctx->rkey_dec, dst, src,
-					 walk.iv, nblocks);
-
-			kernel_neon_end();
+			scoped_ksimd()
+				sm4_neon_cbc_dec(ctx->rkey_dec, dst, src,
+						 walk.iv, nblocks);
 		}
 
 		err = skcipher_walk_done(&walk, nbytes % SM4_BLOCK_SIZE);
@@ -157,12 +151,9 @@ static int sm4_ctr_crypt(struct skcipher_request *req)
 
 		nblocks = nbytes / SM4_BLOCK_SIZE;
 		if (nblocks) {
-			kernel_neon_begin();
-
-			sm4_neon_ctr_crypt(ctx->rkey_enc, dst, src,
-					   walk.iv, nblocks);
-
-			kernel_neon_end();
+			scoped_ksimd()
+				sm4_neon_ctr_crypt(ctx->rkey_enc, dst, src,
+						   walk.iv, nblocks);
 
 			dst += nblocks * SM4_BLOCK_SIZE;
 			src += nblocks * SM4_BLOCK_SIZE;
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 18/21] arm64/xorblocks:  Switch to 'ksimd' scoped guard API
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 17/21] crypto/arm64: sm4 " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 19/21] net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 Ard Biesheuvel
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Switch to the more abstract 'scoped_ksimd()' API, which will be modified
in a future patch to transparently allocate a kernel mode FP/SIMD state
buffer on the stack, so that kernel mode FP/SIMD code remains
preemptible in principe, but without the memory overhead that adds 528
bytes to the size of struct task_struct.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/xor.h | 22 ++++++++------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/xor.h b/arch/arm64/include/asm/xor.h
index befcd8a7abc9..c38e3d017a79 100644
--- a/arch/arm64/include/asm/xor.h
+++ b/arch/arm64/include/asm/xor.h
@@ -9,7 +9,7 @@
 #include <linux/hardirq.h>
 #include <asm-generic/xor.h>
 #include <asm/hwcap.h>
-#include <asm/neon.h>
+#include <asm/simd.h>
 
 #ifdef CONFIG_KERNEL_MODE_NEON
 
@@ -19,9 +19,8 @@ static void
 xor_neon_2(unsigned long bytes, unsigned long * __restrict p1,
 	   const unsigned long * __restrict p2)
 {
-	kernel_neon_begin();
-	xor_block_inner_neon.do_2(bytes, p1, p2);
-	kernel_neon_end();
+	scoped_ksimd()
+		xor_block_inner_neon.do_2(bytes, p1, p2);
 }
 
 static void
@@ -29,9 +28,8 @@ xor_neon_3(unsigned long bytes, unsigned long * __restrict p1,
 	   const unsigned long * __restrict p2,
 	   const unsigned long * __restrict p3)
 {
-	kernel_neon_begin();
-	xor_block_inner_neon.do_3(bytes, p1, p2, p3);
-	kernel_neon_end();
+	scoped_ksimd()
+		xor_block_inner_neon.do_3(bytes, p1, p2, p3);
 }
 
 static void
@@ -40,9 +38,8 @@ xor_neon_4(unsigned long bytes, unsigned long * __restrict p1,
 	   const unsigned long * __restrict p3,
 	   const unsigned long * __restrict p4)
 {
-	kernel_neon_begin();
-	xor_block_inner_neon.do_4(bytes, p1, p2, p3, p4);
-	kernel_neon_end();
+	scoped_ksimd()
+		xor_block_inner_neon.do_4(bytes, p1, p2, p3, p4);
 }
 
 static void
@@ -52,9 +49,8 @@ xor_neon_5(unsigned long bytes, unsigned long * __restrict p1,
 	   const unsigned long * __restrict p4,
 	   const unsigned long * __restrict p5)
 {
-	kernel_neon_begin();
-	xor_block_inner_neon.do_5(bytes, p1, p2, p3, p4, p5);
-	kernel_neon_end();
+	scoped_ksimd()
+		xor_block_inner_neon.do_5(bytes, p1, p2, p3, p4, p5);
 }
 
 static struct xor_block_template xor_block_arm64 = {
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 19/21] net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 18/21] arm64/xorblocks: " Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 20/21] arm64/fpu: Enforce task-context only for generic kernel mode FPU Ard Biesheuvel
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch

From: Ard Biesheuvel <ardb@kernel.org>

Instead of calling kernel_neon_begin/end directly, switch to the scoped
guard API which encapsulates those calls. This is needed because the
prototypes of those APIs are going to be modified and will require a
kernel mode FP/SIMD buffer to be provided, which the scoped guard API
will do transparently.

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/wc.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
index 05e5fd777d4f..815a7c97d6b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
@@ -9,6 +9,7 @@
 
 #if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
 #include <asm/neon.h>
+#include <asm/simd.h>
 #endif
 
 #define TEST_WC_NUM_WQES 255
@@ -264,15 +265,15 @@ static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
 {
 #if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
 	if (cpu_has_neon()) {
-		kernel_neon_begin();
-		asm volatile
-		(".arch_extension simd\n\t"
-		"ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
-		"st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]"
-		:
-		: "r"(mmio_wqe), "r"(sq->bfreg.map + offset)
-		: "memory", "v0", "v1", "v2", "v3");
-		kernel_neon_end();
+		scoped_ksimd() {
+			asm volatile(
+				".arch_extension simd\n\t"
+				"ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
+				"st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]"
+				:
+				: "r"(mmio_wqe), "r"(sq->bfreg.map + offset)
+				: "memory", "v0", "v1", "v2", "v3");
+		}
 		return;
 	}
 #endif
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 20/21] arm64/fpu: Enforce task-context only for generic kernel mode FPU
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 19/21] net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 10:39 ` [PATCH v4 21/21] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack Ard Biesheuvel
  2025-11-11 17:12 ` [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to " Catalin Marinas
  21 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

The generic kernel mode FPU API, which is used by the AMDGPU driver to
perform floating point calculations, is modeled after the most
restrictive architecture that supports it. This means it doesn't support
preemption, and can only be used from task context.

The arm64 implementation is a bit more flexible, but supporting that in
the generic API complicates matters slightly, and for no good reason,
given that the only user does not need it.

So enforce that kernel_fpu_begin() can only be called from task context,
and [redundantly] disable preemption. This removes the need for users of
this API to provide a kernel mode FP/SIMD state after a future patch
that makes that compulsory for preemptible task context.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/fpu.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
index 2ae50bdce59b..bdc4c6304c6a 100644
--- a/arch/arm64/include/asm/fpu.h
+++ b/arch/arm64/include/asm/fpu.h
@@ -6,10 +6,22 @@
 #ifndef __ASM_FPU_H
 #define __ASM_FPU_H
 
+#include <linux/preempt.h>
 #include <asm/neon.h>
 
 #define kernel_fpu_available()	cpu_has_neon()
-#define kernel_fpu_begin()	kernel_neon_begin()
-#define kernel_fpu_end()	kernel_neon_end()
+
+static inline void kernel_fpu_begin(void)
+{
+	BUG_ON(!in_task());
+	preempt_disable();
+	kernel_neon_begin();
+}
+
+static inline void kernel_fpu_end(void)
+{
+	kernel_neon_end();
+	preempt_enable();
+}
 
 #endif /* ! __ASM_FPU_H */
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 21/21] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (19 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 20/21] arm64/fpu: Enforce task-context only for generic kernel mode FPU Ard Biesheuvel
@ 2025-10-31 10:39 ` Ard Biesheuvel
  2025-10-31 14:16   ` Ard Biesheuvel
  2025-11-11 17:12 ` [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to " Catalin Marinas
  21 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 10:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, linux-crypto, herbert, ebiggers, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

Commit aefbab8e77eb16b5

  ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch")

added a 'kernel_fpsimd_state' field to struct thread_struct, which is
the arch-specific portion of struct task_struct, and is allocated for
each task in the system. The size of this field is 528 bytes, resulting
in non-negligible bloat of task_struct, and the resulting memory
overhead may impact performance on systems with many processes.

This allocation is only used if the task is scheduled out or interrupted
by a softirq while using the FP/SIMD unit in kernel mode, and so it is
possible to transparently allocate this buffer on the caller's stack
instead.

So tweak the 'ksimd' scoped guard implementation so that a stack buffer
is allocated and passed to both kernel_neon_begin() and
kernel_neon_end(), and either record it in the task struct, or use it
directly to preserve the task mode kernel FP/SIMD when running in
softirq context. Passing the address to both functions, and checking the
addresses for consistency ensures that callers of the updated bare
begin/end API use it in a manner that is consistent with the new context
switch semantics.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/fpu.h       |  4 +-
 arch/arm64/include/asm/neon.h      |  4 +-
 arch/arm64/include/asm/processor.h |  7 ++-
 arch/arm64/include/asm/simd.h      |  7 ++-
 arch/arm64/kernel/fpsimd.c         | 53 ++++++++++++++------
 5 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
index bdc4c6304c6a..751e88a96734 100644
--- a/arch/arm64/include/asm/fpu.h
+++ b/arch/arm64/include/asm/fpu.h
@@ -15,12 +15,12 @@ static inline void kernel_fpu_begin(void)
 {
 	BUG_ON(!in_task());
 	preempt_disable();
-	kernel_neon_begin();
+	kernel_neon_begin(NULL);
 }
 
 static inline void kernel_fpu_end(void)
 {
-	kernel_neon_end();
+	kernel_neon_end(NULL);
 	preempt_enable();
 }
 
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index d4b1d172a79b..acebee4605b5 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -13,7 +13,7 @@
 
 #define cpu_has_neon()		system_supports_fpsimd()
 
-void kernel_neon_begin(void);
-void kernel_neon_end(void);
+void kernel_neon_begin(struct user_fpsimd_state *);
+void kernel_neon_end(struct user_fpsimd_state *);
 
 #endif /* ! __ASM_NEON_H */
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 61d62bfd5a7b..de3c3b65461d 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -172,7 +172,12 @@ struct thread_struct {
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
 
-	struct user_fpsimd_state	kernel_fpsimd_state;
+	/*
+	 * Set [cleared] by kernel_neon_begin() [kernel_neon_end()] to the
+	 * address of a caller provided buffer that will be used to preserve a
+	 * task's kernel mode FPSIMD state while it is scheduled out.
+	 */
+	struct user_fpsimd_state	*kernel_fpsimd_state;
 	unsigned int			kernel_fpsimd_cpu;
 #ifdef CONFIG_ARM64_PTR_AUTH
 	struct ptrauth_keys_user	keys_user;
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index d9f83c478736..7ddb25df5c98 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -43,8 +43,11 @@ static __must_check inline bool may_use_simd(void) {
 
 #endif /* ! CONFIG_KERNEL_MODE_NEON */
 
-DEFINE_LOCK_GUARD_0(ksimd, kernel_neon_begin(), kernel_neon_end())
+DEFINE_LOCK_GUARD_1(ksimd,
+		    struct user_fpsimd_state,
+		    kernel_neon_begin(_T->lock),
+		    kernel_neon_end(_T->lock))
 
-#define scoped_ksimd()	scoped_guard(ksimd)
+#define scoped_ksimd()	scoped_guard(ksimd, &(struct user_fpsimd_state){})
 
 #endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e3f8f51748bc..1c652ce4d40d 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1489,21 +1489,23 @@ static void fpsimd_load_kernel_state(struct task_struct *task)
 	 * Elide the load if this CPU holds the most recent kernel mode
 	 * FPSIMD context of the current task.
 	 */
-	if (last->st == &task->thread.kernel_fpsimd_state &&
+	if (last->st == task->thread.kernel_fpsimd_state &&
 	    task->thread.kernel_fpsimd_cpu == smp_processor_id())
 		return;
 
-	fpsimd_load_state(&task->thread.kernel_fpsimd_state);
+	fpsimd_load_state(task->thread.kernel_fpsimd_state);
 }
 
 static void fpsimd_save_kernel_state(struct task_struct *task)
 {
 	struct cpu_fp_state cpu_fp_state = {
-		.st		= &task->thread.kernel_fpsimd_state,
+		.st		= task->thread.kernel_fpsimd_state,
 		.to_save	= FP_STATE_FPSIMD,
 	};
 
-	fpsimd_save_state(&task->thread.kernel_fpsimd_state);
+	BUG_ON(!cpu_fp_state.st);
+
+	fpsimd_save_state(task->thread.kernel_fpsimd_state);
 	fpsimd_bind_state_to_cpu(&cpu_fp_state);
 
 	task->thread.kernel_fpsimd_cpu = smp_processor_id();
@@ -1774,6 +1776,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 void fpsimd_flush_task_state(struct task_struct *t)
 {
 	t->thread.fpsimd_cpu = NR_CPUS;
+	t->thread.kernel_fpsimd_state = NULL;
 	/*
 	 * If we don't support fpsimd, bail out after we have
 	 * reset the fpsimd_cpu for this task and clear the
@@ -1833,8 +1836,13 @@ void fpsimd_save_and_flush_cpu_state(void)
  *
  * The caller may freely use the FPSIMD registers until kernel_neon_end() is
  * called.
+ *
+ * Unless called from non-preemptible task context, @state must point to a
+ * caller provided buffer that will be used to preserve the task's kernel mode
+ * FPSIMD context when it is scheduled out, or if it is interrupted by kernel
+ * mode FPSIMD occurring in softirq context. May be %NULL otherwise.
  */
-void kernel_neon_begin(void)
+void kernel_neon_begin(struct user_fpsimd_state *state)
 {
 	if (WARN_ON(!system_supports_fpsimd()))
 		return;
@@ -1846,7 +1854,7 @@ void kernel_neon_begin(void)
 	/* Save unsaved fpsimd state, if any: */
 	if (test_thread_flag(TIF_KERNEL_FPSTATE)) {
 		BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq());
-		fpsimd_save_kernel_state(current);
+		fpsimd_save_state(state);
 	} else {
 		fpsimd_save_user_state();
 
@@ -1867,8 +1875,17 @@ void kernel_neon_begin(void)
 		 * mode in task context. So in this case, setting the flag here
 		 * is always appropriate.
 		 */
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq())
+		if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq()) {
+			/*
+			 * Record the caller provided buffer as the kernel mode
+			 * FP/SIMD buffer for this task, so that the state can
+			 * be preserved and restored on a context switch.
+			 */
+			WARN_ON(current->thread.kernel_fpsimd_state != NULL);
+			WARN_ON(preemptible() && !state);
+			current->thread.kernel_fpsimd_state = state;
 			set_thread_flag(TIF_KERNEL_FPSTATE);
+		}
 	}
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
@@ -1886,22 +1903,30 @@ EXPORT_SYMBOL_GPL(kernel_neon_begin);
  *
  * The caller must not use the FPSIMD registers after this function is called,
  * unless kernel_neon_begin() is called again in the meantime.
+ *
+ * The value of @state must match the value passed to the preceding call to
+ * kernel_neon_begin().
  */
-void kernel_neon_end(void)
+void kernel_neon_end(struct user_fpsimd_state *state)
 {
 	if (!system_supports_fpsimd())
 		return;
 
+	if (!test_thread_flag(TIF_KERNEL_FPSTATE))
+		return;
+
 	/*
 	 * If we are returning from a nested use of kernel mode FPSIMD, restore
 	 * the task context kernel mode FPSIMD state. This can only happen when
 	 * running in softirq context on non-PREEMPT_RT.
 	 */
-	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() &&
-	    test_thread_flag(TIF_KERNEL_FPSTATE))
-		fpsimd_load_kernel_state(current);
-	else
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq()) {
+		fpsimd_load_state(state);
+	} else {
 		clear_thread_flag(TIF_KERNEL_FPSTATE);
+		WARN_ON(current->thread.kernel_fpsimd_state != state);
+		current->thread.kernel_fpsimd_state = NULL;
+	}
 }
 EXPORT_SYMBOL_GPL(kernel_neon_end);
 
@@ -1937,7 +1962,7 @@ void __efi_fpsimd_begin(void)
 	WARN_ON(preemptible());
 
 	if (may_use_simd()) {
-		kernel_neon_begin();
+		kernel_neon_begin(&efi_fpsimd_state);
 	} else {
 		/*
 		 * If !efi_sve_state, SVE can't be in use yet and doesn't need
@@ -1986,7 +2011,7 @@ void __efi_fpsimd_end(void)
 		return;
 
 	if (!efi_fpsimd_state_used) {
-		kernel_neon_end();
+		kernel_neon_end(&efi_fpsimd_state);
 	} else {
 		if (system_supports_sve() && efi_sve_state_used) {
 			bool ffr = true;
-- 
2.51.1.930.gacf6e81ea2-goog



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-10-31 10:39 ` [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped " Ard Biesheuvel
@ 2025-10-31 13:49   ` Jonathan Cameron
  2025-10-31 13:52     ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2025-10-31 13:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, herbert, ebiggers,
	Ard Biesheuvel

On Fri, 31 Oct 2025 11:39:07 +0100
Ard Biesheuvel <ardb+git@google.com> wrote:

> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Before modifying the prototypes of kernel_neon_begin() and
> kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
> allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
> which encapsulates the calls to those functions.
> 
> For symmetry, do the same for 32-bit ARM too.
> 
> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
>  lib/crc/arm/crc32.h        | 11 ++++-------
>  lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
>  lib/crc/arm64/crc32.h      | 16 ++++++----------
>  4 files changed, 20 insertions(+), 39 deletions(-)
> 
> diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
> index 63441de5e3f1..aaeeab0defb5 100644
> --- a/lib/crc/arm/crc-t10dif.h
> +++ b/lib/crc/arm/crc-t10dif.h

>  static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
> @@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
>  static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
>  {
>  	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
> -		if (static_branch_likely(&have_pmull)) {
> -			if (likely(may_use_simd())) {
> -				kernel_neon_begin();
> -				crc = crc_t10dif_pmull64(crc, data, length);
> -				kernel_neon_end();
> -				return crc;
> -			}
> +		if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
> +			scoped_ksimd()
> +				return crc_t10dif_pmull64(crc, data, length);

>  		} else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
>  			   static_branch_likely(&have_neon) &&
>  			   likely(may_use_simd())) {

I briefly thought this was a functional change but it's not because
of may_use_simd() being something that isn't going to change between
the two evaluations.  

Would it hurt at all to pull that up to be
	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && likely(may_use_simd())) {
		if (static_branch_likely(&have_pmull)) {
			scoped_ksimd()
				return crc_t10dif_pmull64(crc, data, length);

 		} else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
 			   static_branch_likely(&have_neon)) {
		...

?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 16/21] crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
  2025-10-31 10:39 ` [PATCH v4 16/21] crypto/arm64: sm3 " Ard Biesheuvel
@ 2025-10-31 13:52   ` Jonathan Cameron
  2025-10-31 13:55     ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2025-10-31 13:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, herbert, ebiggers,
	Ard Biesheuvel

On Fri, 31 Oct 2025 11:39:15 +0100
Ard Biesheuvel <ardb+git@google.com> wrote:

> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Switch to the more abstract 'scoped_ksimd()' API, which will be modified
> in a future patch to transparently allocate a kernel mode FP/SIMD state
> buffer on the stack, so that kernel mode FP/SIMD code remains
> preemptible in principe, but without the memory overhead that adds 528
> bytes to the size of struct task_struct.
> 
> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Hi Ard,

Trivial comment inline.

> ---
>  arch/arm64/crypto/sm3-ce-glue.c   | 15 ++++++++-------
>  arch/arm64/crypto/sm3-neon-glue.c | 16 ++++++----------
>  2 files changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c
> index eac6f5fa0abe..24c1fcfae072 100644
> --- a/arch/arm64/crypto/sm3-ce-glue.c
> +++ b/arch/arm64/crypto/sm3-ce-glue.c
> @@ -5,7 +5,6 @@
>   * Copyright (C) 2018 Linaro Ltd <ard.biesheuvel@linaro.org>
>   */
>  
> -#include <asm/neon.h>
>  #include <crypto/internal/hash.h>
>  #include <crypto/sm3.h>
>  #include <crypto/sm3_base.h>
> @@ -13,6 +12,8 @@
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  
> +#include <asm/simd.h>
> +
>  MODULE_DESCRIPTION("SM3 secure hash using ARMv8 Crypto Extensions");
>  MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
>  MODULE_LICENSE("GPL v2");
> @@ -25,18 +26,18 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
>  {
>  	int remain;
>  
> -	kernel_neon_begin();
> -	remain = sm3_base_do_update_blocks(desc, data, len, sm3_ce_transform);
> -	kernel_neon_end();
> +	scoped_ksimd() {

Why does this get brackets unlike other cases?

> +		remain = sm3_base_do_update_blocks(desc, data, len, sm3_ce_transform);
> +	}
>  	return remain;
>  }
>  
>  static int sm3_ce_finup(struct shash_desc *desc, const u8 *data,
>  			unsigned int len, u8 *out)
>  {
> -	kernel_neon_begin();
> -	sm3_base_do_finup(desc, data, len, sm3_ce_transform);
> -	kernel_neon_end();
> +	scoped_ksimd() {
> +		sm3_base_do_finup(desc, data, len, sm3_ce_transform);
> +	}
>  	return sm3_base_finish(desc, out);
>  }
>  
> diff --git a/arch/arm64/crypto/sm3-neon-glue.c b/arch/arm64/crypto/sm3-neon-glue.c
> index 6c4611a503a3..15f30cc24f32 100644
> --- a/arch/arm64/crypto/sm3-neon-glue.c
> +++ b/arch/arm64/crypto/sm3-neon-glue.c
> @@ -5,7 +5,7 @@
>   * Copyright (C) 2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
>   */
>  
> -#include <asm/neon.h>
> +#include <asm/simd.h>
>  #include <crypto/internal/hash.h>
>  #include <crypto/sm3.h>
>  #include <crypto/sm3_base.h>
> @@ -20,20 +20,16 @@ asmlinkage void sm3_neon_transform(struct sm3_state *sst, u8 const *src,
>  static int sm3_neon_update(struct shash_desc *desc, const u8 *data,
>  			   unsigned int len)
>  {
> -	int remain;
> -
> -	kernel_neon_begin();
> -	remain = sm3_base_do_update_blocks(desc, data, len, sm3_neon_transform);
> -	kernel_neon_end();
> -	return remain;
> +	scoped_ksimd()
> +		return sm3_base_do_update_blocks(desc, data, len,
> +						 sm3_neon_transform);
>  }
>  
>  static int sm3_neon_finup(struct shash_desc *desc, const u8 *data,
>  			  unsigned int len, u8 *out)
>  {
> -	kernel_neon_begin();
> -	sm3_base_do_finup(desc, data, len, sm3_neon_transform);
> -	kernel_neon_end();
> +	scoped_ksimd()
> +		sm3_base_do_finup(desc, data, len, sm3_neon_transform);
>  	return sm3_base_finish(desc, out);
>  }
>  



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-10-31 13:49   ` Jonathan Cameron
@ 2025-10-31 13:52     ` Ard Biesheuvel
  2025-10-31 14:05       ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 13:52 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ard Biesheuvel, linux-arm-kernel, linux-kernel, linux-crypto,
	herbert, ebiggers

On Fri, 31 Oct 2025 at 14:49, Jonathan Cameron
<jonathan.cameron@huawei.com> wrote:
>
> On Fri, 31 Oct 2025 11:39:07 +0100
> Ard Biesheuvel <ardb+git@google.com> wrote:
>
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Before modifying the prototypes of kernel_neon_begin() and
> > kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
> > allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
> > which encapsulates the calls to those functions.
> >
> > For symmetry, do the same for 32-bit ARM too.
> >
> > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
> >  lib/crc/arm/crc32.h        | 11 ++++-------
> >  lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
> >  lib/crc/arm64/crc32.h      | 16 ++++++----------
> >  4 files changed, 20 insertions(+), 39 deletions(-)
> >
> > diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
> > index 63441de5e3f1..aaeeab0defb5 100644
> > --- a/lib/crc/arm/crc-t10dif.h
> > +++ b/lib/crc/arm/crc-t10dif.h
>
> >  static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
> > @@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
> >  static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
> >  {
> >       if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
> > -             if (static_branch_likely(&have_pmull)) {
> > -                     if (likely(may_use_simd())) {
> > -                             kernel_neon_begin();
> > -                             crc = crc_t10dif_pmull64(crc, data, length);
> > -                             kernel_neon_end();
> > -                             return crc;
> > -                     }
> > +             if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
> > +                     scoped_ksimd()
> > +                             return crc_t10dif_pmull64(crc, data, length);
>
> >               } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> >                          static_branch_likely(&have_neon) &&
> >                          likely(may_use_simd())) {
>
> I briefly thought this was a functional change but it's not because
> of may_use_simd() being something that isn't going to change between
> the two evaluations.
>
> Would it hurt at all to pull that up to be
>         if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && likely(may_use_simd())) {
>                 if (static_branch_likely(&have_pmull)) {
>                         scoped_ksimd()
>                                 return crc_t10dif_pmull64(crc, data, length);
>
>                 } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
>                            static_branch_likely(&have_neon)) {
>                 ...
>
> ?
>

Yeah that would be a reasonable cleanup, I guess.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 16/21] crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
  2025-10-31 13:52   ` Jonathan Cameron
@ 2025-10-31 13:55     ` Ard Biesheuvel
  0 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 13:55 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ard Biesheuvel, linux-arm-kernel, linux-kernel, linux-crypto,
	herbert, ebiggers

On Fri, 31 Oct 2025 at 14:52, Jonathan Cameron
<jonathan.cameron@huawei.com> wrote:
>
> On Fri, 31 Oct 2025 11:39:15 +0100
> Ard Biesheuvel <ardb+git@google.com> wrote:
>
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Switch to the more abstract 'scoped_ksimd()' API, which will be modified
> > in a future patch to transparently allocate a kernel mode FP/SIMD state
> > buffer on the stack, so that kernel mode FP/SIMD code remains
> > preemptible in principe, but without the memory overhead that adds 528
> > bytes to the size of struct task_struct.
> >
> > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Hi Ard,
>
> Trivial comment inline.
>
> > ---
> >  arch/arm64/crypto/sm3-ce-glue.c   | 15 ++++++++-------
> >  arch/arm64/crypto/sm3-neon-glue.c | 16 ++++++----------
> >  2 files changed, 14 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/arm64/crypto/sm3-ce-glue.c b/arch/arm64/crypto/sm3-ce-glue.c
> > index eac6f5fa0abe..24c1fcfae072 100644
> > --- a/arch/arm64/crypto/sm3-ce-glue.c
> > +++ b/arch/arm64/crypto/sm3-ce-glue.c
> > @@ -5,7 +5,6 @@
> >   * Copyright (C) 2018 Linaro Ltd <ard.biesheuvel@linaro.org>
> >   */
> >
> > -#include <asm/neon.h>
> >  #include <crypto/internal/hash.h>
> >  #include <crypto/sm3.h>
> >  #include <crypto/sm3_base.h>
> > @@ -13,6 +12,8 @@
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> >
> > +#include <asm/simd.h>
> > +
> >  MODULE_DESCRIPTION("SM3 secure hash using ARMv8 Crypto Extensions");
> >  MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
> >  MODULE_LICENSE("GPL v2");
> > @@ -25,18 +26,18 @@ static int sm3_ce_update(struct shash_desc *desc, const u8 *data,
> >  {
> >       int remain;
> >
> > -     kernel_neon_begin();
> > -     remain = sm3_base_do_update_blocks(desc, data, len, sm3_ce_transform);
> > -     kernel_neon_end();
> > +     scoped_ksimd() {
>
> Why does this get brackets unlike other cases?
>

No reason other than the fact that some time passed between writing
the other patches and these ones, and there is no functional
difference.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD
  2025-10-31 10:39 ` [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD Ard Biesheuvel
@ 2025-10-31 13:55   ` Jonathan Cameron
  2025-10-31 14:05     ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2025-10-31 13:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, herbert, ebiggers,
	Ard Biesheuvel, Kees Cook

On Fri, 31 Oct 2025 11:39:03 +0100
Ard Biesheuvel <ardb+git@google.com> wrote:

> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Encapsulate kernel_neon_begin() and kernel_neon_end() using a 'ksimd'
> cleanup guard. This hides the prototype of those functions, allowing
> them to be changed for arm64 but not ARM, without breaking code that is
> shared between those architectures (RAID6, AEGIS-128)
> 
> It probably makes sense to expose this API more widely across
> architectures, as it affords more flexibility to the arch code to
> plumb it in, while imposing more rigid rules regarding the start/end
> bookends appearing in matched pairs.
> 
> Reviewed-by: Kees Cook <kees@kernel.org>
> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Very nice.

FWIW I looked at all the usecases and other than a couple of trivial
comments on individual patches they look good to me.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
For patches 4-19


> ---
>  arch/arm64/include/asm/simd.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
> index 8e86c9e70e48..d9f83c478736 100644
> --- a/arch/arm64/include/asm/simd.h
> +++ b/arch/arm64/include/asm/simd.h
> @@ -6,12 +6,15 @@
>  #ifndef __ASM_SIMD_H
>  #define __ASM_SIMD_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/compiler.h>
>  #include <linux/irqflags.h>
>  #include <linux/percpu.h>
>  #include <linux/preempt.h>
>  #include <linux/types.h>
>  
> +#include <asm/neon.h>
> +
>  #ifdef CONFIG_KERNEL_MODE_NEON
>  
>  /*
> @@ -40,4 +43,8 @@ static __must_check inline bool may_use_simd(void) {
>  
>  #endif /* ! CONFIG_KERNEL_MODE_NEON */
>  
> +DEFINE_LOCK_GUARD_0(ksimd, kernel_neon_begin(), kernel_neon_end())
> +
> +#define scoped_ksimd()	scoped_guard(ksimd)
> +
>  #endif



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-10-31 13:52     ` Ard Biesheuvel
@ 2025-10-31 14:05       ` Ard Biesheuvel
  2025-11-03 11:28         ` Jonathan Cameron
  0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 14:05 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ard Biesheuvel, linux-arm-kernel, linux-kernel, linux-crypto,
	herbert, ebiggers

On Fri, 31 Oct 2025 at 14:52, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 31 Oct 2025 at 14:49, Jonathan Cameron
> <jonathan.cameron@huawei.com> wrote:
> >
> > On Fri, 31 Oct 2025 11:39:07 +0100
> > Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Before modifying the prototypes of kernel_neon_begin() and
> > > kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
> > > allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
> > > which encapsulates the calls to those functions.
> > >
> > > For symmetry, do the same for 32-bit ARM too.
> > >
> > > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
> > >  lib/crc/arm/crc32.h        | 11 ++++-------
> > >  lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
> > >  lib/crc/arm64/crc32.h      | 16 ++++++----------
> > >  4 files changed, 20 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
> > > index 63441de5e3f1..aaeeab0defb5 100644
> > > --- a/lib/crc/arm/crc-t10dif.h
> > > +++ b/lib/crc/arm/crc-t10dif.h
> >
> > >  static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
> > > @@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
> > >  static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
> > >  {
> > >       if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
> > > -             if (static_branch_likely(&have_pmull)) {
> > > -                     if (likely(may_use_simd())) {
> > > -                             kernel_neon_begin();
> > > -                             crc = crc_t10dif_pmull64(crc, data, length);
> > > -                             kernel_neon_end();
> > > -                             return crc;
> > > -                     }
> > > +             if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
> > > +                     scoped_ksimd()
> > > +                             return crc_t10dif_pmull64(crc, data, length);
> >
> > >               } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > >                          static_branch_likely(&have_neon) &&
> > >                          likely(may_use_simd())) {
> >
> > I briefly thought this was a functional change but it's not because
> > of may_use_simd() being something that isn't going to change between
> > the two evaluations.
> >
> > Would it hurt at all to pull that up to be
> >         if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && likely(may_use_simd())) {
> >                 if (static_branch_likely(&have_pmull)) {
> >                         scoped_ksimd()
> >                                 return crc_t10dif_pmull64(crc, data, length);
> >
> >                 } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> >                            static_branch_likely(&have_neon)) {
> >                 ...
> >
> > ?
> >
>
> Yeah that would be a reasonable cleanup, I guess.

Actually, looking more closely, that would result in may_use_simd()
being evaluated even when the static keys are set to false, given that
the compiler is unlikely to be able to figure out by itself that
may_use_simd() has no side effects.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD
  2025-10-31 13:55   ` Jonathan Cameron
@ 2025-10-31 14:05     ` Ard Biesheuvel
  0 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 14:05 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ard Biesheuvel, linux-arm-kernel, linux-kernel, linux-crypto,
	herbert, ebiggers, Kees Cook

On Fri, 31 Oct 2025 at 14:55, Jonathan Cameron
<jonathan.cameron@huawei.com> wrote:
>
> On Fri, 31 Oct 2025 11:39:03 +0100
> Ard Biesheuvel <ardb+git@google.com> wrote:
>
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Encapsulate kernel_neon_begin() and kernel_neon_end() using a 'ksimd'
> > cleanup guard. This hides the prototype of those functions, allowing
> > them to be changed for arm64 but not ARM, without breaking code that is
> > shared between those architectures (RAID6, AEGIS-128)
> >
> > It probably makes sense to expose this API more widely across
> > architectures, as it affords more flexibility to the arch code to
> > plumb it in, while imposing more rigid rules regarding the start/end
> > bookends appearing in matched pairs.
> >
> > Reviewed-by: Kees Cook <kees@kernel.org>
> > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Very nice.
>
> FWIW I looked at all the usecases and other than a couple of trivial
> comments on individual patches they look good to me.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> For patches 4-19
>

Thanks!


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 21/21] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
  2025-10-31 10:39 ` [PATCH v4 21/21] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack Ard Biesheuvel
@ 2025-10-31 14:16   ` Ard Biesheuvel
  0 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2025-10-31 14:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, herbert, ebiggers

On Fri, 31 Oct 2025 at 11:40, Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Commit aefbab8e77eb16b5
>
>   ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch")
>
> added a 'kernel_fpsimd_state' field to struct thread_struct, which is
> the arch-specific portion of struct task_struct, and is allocated for
> each task in the system. The size of this field is 528 bytes, resulting
> in non-negligible bloat of task_struct, and the resulting memory
> overhead may impact performance on systems with many processes.
>
> This allocation is only used if the task is scheduled out or interrupted
> by a softirq while using the FP/SIMD unit in kernel mode, and so it is
> possible to transparently allocate this buffer on the caller's stack
> instead.
>
> So tweak the 'ksimd' scoped guard implementation so that a stack buffer
> is allocated and passed to both kernel_neon_begin() and
> kernel_neon_end(), and either record it in the task struct, or use it
> directly to preserve the task mode kernel FP/SIMD when running in
> softirq context. Passing the address to both functions, and checking the
> addresses for consistency ensures that callers of the updated bare
> begin/end API use it in a manner that is consistent with the new context
> switch semantics.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/fpu.h       |  4 +-
>  arch/arm64/include/asm/neon.h      |  4 +-
>  arch/arm64/include/asm/processor.h |  7 ++-
>  arch/arm64/include/asm/simd.h      |  7 ++-
>  arch/arm64/kernel/fpsimd.c         | 53 ++++++++++++++------
>  5 files changed, 54 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
> index bdc4c6304c6a..751e88a96734 100644
> --- a/arch/arm64/include/asm/fpu.h
> +++ b/arch/arm64/include/asm/fpu.h
> @@ -15,12 +15,12 @@ static inline void kernel_fpu_begin(void)
>  {
>         BUG_ON(!in_task());
>         preempt_disable();
> -       kernel_neon_begin();
> +       kernel_neon_begin(NULL);
>  }
>
>  static inline void kernel_fpu_end(void)
>  {
> -       kernel_neon_end();
> +       kernel_neon_end(NULL);
>         preempt_enable();
>  }
>
> diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
> index d4b1d172a79b..acebee4605b5 100644
> --- a/arch/arm64/include/asm/neon.h
> +++ b/arch/arm64/include/asm/neon.h
> @@ -13,7 +13,7 @@
>
>  #define cpu_has_neon()         system_supports_fpsimd()
>
> -void kernel_neon_begin(void);
> -void kernel_neon_end(void);
> +void kernel_neon_begin(struct user_fpsimd_state *);
> +void kernel_neon_end(struct user_fpsimd_state *);
>
>  #endif /* ! __ASM_NEON_H */
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 61d62bfd5a7b..de3c3b65461d 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -172,7 +172,12 @@ struct thread_struct {
>         unsigned long           fault_code;     /* ESR_EL1 value */
>         struct debug_info       debug;          /* debugging */
>
> -       struct user_fpsimd_state        kernel_fpsimd_state;
> +       /*
> +        * Set [cleared] by kernel_neon_begin() [kernel_neon_end()] to the
> +        * address of a caller provided buffer that will be used to preserve a
> +        * task's kernel mode FPSIMD state while it is scheduled out.
> +        */
> +       struct user_fpsimd_state        *kernel_fpsimd_state;
>         unsigned int                    kernel_fpsimd_cpu;
>  #ifdef CONFIG_ARM64_PTR_AUTH
>         struct ptrauth_keys_user        keys_user;
> diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
> index d9f83c478736..7ddb25df5c98 100644
> --- a/arch/arm64/include/asm/simd.h
> +++ b/arch/arm64/include/asm/simd.h
> @@ -43,8 +43,11 @@ static __must_check inline bool may_use_simd(void) {
>
>  #endif /* ! CONFIG_KERNEL_MODE_NEON */
>
> -DEFINE_LOCK_GUARD_0(ksimd, kernel_neon_begin(), kernel_neon_end())
> +DEFINE_LOCK_GUARD_1(ksimd,
> +                   struct user_fpsimd_state,
> +                   kernel_neon_begin(_T->lock),
> +                   kernel_neon_end(_T->lock))
>
> -#define scoped_ksimd() scoped_guard(ksimd)
> +#define scoped_ksimd() scoped_guard(ksimd, &(struct user_fpsimd_state){})
>
>  #endif
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index e3f8f51748bc..1c652ce4d40d 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1489,21 +1489,23 @@ static void fpsimd_load_kernel_state(struct task_struct *task)
>          * Elide the load if this CPU holds the most recent kernel mode
>          * FPSIMD context of the current task.
>          */
> -       if (last->st == &task->thread.kernel_fpsimd_state &&
> +       if (last->st == task->thread.kernel_fpsimd_state &&
>             task->thread.kernel_fpsimd_cpu == smp_processor_id())
>                 return;
>
> -       fpsimd_load_state(&task->thread.kernel_fpsimd_state);
> +       fpsimd_load_state(task->thread.kernel_fpsimd_state);
>  }
>
>  static void fpsimd_save_kernel_state(struct task_struct *task)
>  {
>         struct cpu_fp_state cpu_fp_state = {
> -               .st             = &task->thread.kernel_fpsimd_state,
> +               .st             = task->thread.kernel_fpsimd_state,
>                 .to_save        = FP_STATE_FPSIMD,
>         };
>
> -       fpsimd_save_state(&task->thread.kernel_fpsimd_state);
> +       BUG_ON(!cpu_fp_state.st);
> +
> +       fpsimd_save_state(task->thread.kernel_fpsimd_state);
>         fpsimd_bind_state_to_cpu(&cpu_fp_state);
>
>         task->thread.kernel_fpsimd_cpu = smp_processor_id();
> @@ -1774,6 +1776,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  void fpsimd_flush_task_state(struct task_struct *t)
>  {
>         t->thread.fpsimd_cpu = NR_CPUS;
> +       t->thread.kernel_fpsimd_state = NULL;
>         /*
>          * If we don't support fpsimd, bail out after we have
>          * reset the fpsimd_cpu for this task and clear the
> @@ -1833,8 +1836,13 @@ void fpsimd_save_and_flush_cpu_state(void)
>   *
>   * The caller may freely use the FPSIMD registers until kernel_neon_end() is
>   * called.
> + *
> + * Unless called from non-preemptible task context, @state must point to a
> + * caller provided buffer that will be used to preserve the task's kernel mode
> + * FPSIMD context when it is scheduled out, or if it is interrupted by kernel
> + * mode FPSIMD occurring in softirq context. May be %NULL otherwise.
>   */
> -void kernel_neon_begin(void)
> +void kernel_neon_begin(struct user_fpsimd_state *state)
>  {
>         if (WARN_ON(!system_supports_fpsimd()))
>                 return;
> @@ -1846,7 +1854,7 @@ void kernel_neon_begin(void)
>         /* Save unsaved fpsimd state, if any: */
>         if (test_thread_flag(TIF_KERNEL_FPSTATE)) {
>                 BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq());
> -               fpsimd_save_kernel_state(current);
> +               fpsimd_save_state(state);
>         } else {
>                 fpsimd_save_user_state();
>
> @@ -1867,8 +1875,17 @@ void kernel_neon_begin(void)
>                  * mode in task context. So in this case, setting the flag here
>                  * is always appropriate.
>                  */
> -               if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq())
> +               if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq()) {
> +                       /*
> +                        * Record the caller provided buffer as the kernel mode
> +                        * FP/SIMD buffer for this task, so that the state can
> +                        * be preserved and restored on a context switch.
> +                        */
> +                       WARN_ON(current->thread.kernel_fpsimd_state != NULL);
> +                       WARN_ON(preemptible() && !state);

This is in the wrong place: we are never preemptible here, even when
called from preemptible context.

Will fix for the next rev.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-10-31 14:05       ` Ard Biesheuvel
@ 2025-11-03 11:28         ` Jonathan Cameron
  2025-11-04 15:32           ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2025-11-03 11:28 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ard Biesheuvel, linux-arm-kernel, linux-kernel, linux-crypto,
	herbert, ebiggers

On Fri, 31 Oct 2025 15:05:19 +0100
Ard Biesheuvel <ardb@kernel.org> wrote:

> On Fri, 31 Oct 2025 at 14:52, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Fri, 31 Oct 2025 at 14:49, Jonathan Cameron
> > <jonathan.cameron@huawei.com> wrote:  
> > >
> > > On Fri, 31 Oct 2025 11:39:07 +0100
> > > Ard Biesheuvel <ardb+git@google.com> wrote:
> > >  
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > >
> > > > Before modifying the prototypes of kernel_neon_begin() and
> > > > kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
> > > > allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
> > > > which encapsulates the calls to those functions.
> > > >
> > > > For symmetry, do the same for 32-bit ARM too.
> > > >
> > > > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
> > > >  lib/crc/arm/crc32.h        | 11 ++++-------
> > > >  lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
> > > >  lib/crc/arm64/crc32.h      | 16 ++++++----------
> > > >  4 files changed, 20 insertions(+), 39 deletions(-)
> > > >
> > > > diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
> > > > index 63441de5e3f1..aaeeab0defb5 100644
> > > > --- a/lib/crc/arm/crc-t10dif.h
> > > > +++ b/lib/crc/arm/crc-t10dif.h  
> > >  
> > > >  static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
> > > > @@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
> > > >  static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
> > > >  {
> > > >       if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
> > > > -             if (static_branch_likely(&have_pmull)) {
> > > > -                     if (likely(may_use_simd())) {
> > > > -                             kernel_neon_begin();
> > > > -                             crc = crc_t10dif_pmull64(crc, data, length);
> > > > -                             kernel_neon_end();
> > > > -                             return crc;
> > > > -                     }
> > > > +             if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
> > > > +                     scoped_ksimd()
> > > > +                             return crc_t10dif_pmull64(crc, data, length);  
> > >  
> > > >               } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > > >                          static_branch_likely(&have_neon) &&
> > > >                          likely(may_use_simd())) {  
> > >
> > > I briefly thought this was a functional change but it's not because
> > > of may_use_simd() being something that isn't going to change between
> > > the two evaluations.
> > >
> > > Would it hurt at all to pull that up to be
> > >         if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && likely(may_use_simd())) {
> > >                 if (static_branch_likely(&have_pmull)) {
> > >                         scoped_ksimd()
> > >                                 return crc_t10dif_pmull64(crc, data, length);
> > >
> > >                 } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > >                            static_branch_likely(&have_neon)) {
> > >                 ...
> > >
> > > ?
> > >  
> >
> > Yeah that would be a reasonable cleanup, I guess.  
> 
> Actually, looking more closely, that would result in may_use_simd()
> being evaluated even when the static keys are set to false, given that
> the compiler is unlikely to be able to figure out by itself that
> may_use_simd() has no side effects.
Yeah. That was why it was a question :)
Given everything is marked as likely I wasn't sure if we cared about when
the static keys aren't set.

Jonathan

> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-11-03 11:28         ` Jonathan Cameron
@ 2025-11-04 15:32           ` Ard Biesheuvel
  2025-11-12 20:58             ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-11-04 15:32 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ard Biesheuvel, linux-arm-kernel, linux-kernel, linux-crypto,
	herbert, ebiggers

On Mon, 3 Nov 2025 at 12:28, Jonathan Cameron
<jonathan.cameron@huawei.com> wrote:
>
> On Fri, 31 Oct 2025 15:05:19 +0100
> Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > On Fri, 31 Oct 2025 at 14:52, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Fri, 31 Oct 2025 at 14:49, Jonathan Cameron
> > > <jonathan.cameron@huawei.com> wrote:
> > > >
> > > > On Fri, 31 Oct 2025 11:39:07 +0100
> > > > Ard Biesheuvel <ardb+git@google.com> wrote:
> > > >
> > > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > >
> > > > > Before modifying the prototypes of kernel_neon_begin() and
> > > > > kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
> > > > > allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
> > > > > which encapsulates the calls to those functions.
> > > > >
> > > > > For symmetry, do the same for 32-bit ARM too.
> > > > >
> > > > > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > ---
> > > > >  lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
> > > > >  lib/crc/arm/crc32.h        | 11 ++++-------
> > > > >  lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
> > > > >  lib/crc/arm64/crc32.h      | 16 ++++++----------
> > > > >  4 files changed, 20 insertions(+), 39 deletions(-)
> > > > >
> > > > > diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
> > > > > index 63441de5e3f1..aaeeab0defb5 100644
> > > > > --- a/lib/crc/arm/crc-t10dif.h
> > > > > +++ b/lib/crc/arm/crc-t10dif.h
> > > >
> > > > >  static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
> > > > > @@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
> > > > >  static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
> > > > >  {
> > > > >       if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
> > > > > -             if (static_branch_likely(&have_pmull)) {
> > > > > -                     if (likely(may_use_simd())) {
> > > > > -                             kernel_neon_begin();
> > > > > -                             crc = crc_t10dif_pmull64(crc, data, length);
> > > > > -                             kernel_neon_end();
> > > > > -                             return crc;
> > > > > -                     }
> > > > > +             if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
> > > > > +                     scoped_ksimd()
> > > > > +                             return crc_t10dif_pmull64(crc, data, length);
> > > >
> > > > >               } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > > > >                          static_branch_likely(&have_neon) &&
> > > > >                          likely(may_use_simd())) {
> > > >
> > > > I briefly thought this was a functional change but it's not because
> > > > of may_use_simd() being something that isn't going to change between
> > > > the two evaluations.
> > > >
> > > > Would it hurt at all to pull that up to be
> > > >         if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && likely(may_use_simd())) {
> > > >                 if (static_branch_likely(&have_pmull)) {
> > > >                         scoped_ksimd()
> > > >                                 return crc_t10dif_pmull64(crc, data, length);
> > > >
> > > >                 } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > > >                            static_branch_likely(&have_neon)) {
> > > >                 ...
> > > >
> > > > ?
> > > >
> > >
> > > Yeah that would be a reasonable cleanup, I guess.
> >
> > Actually, looking more closely, that would result in may_use_simd()
> > being evaluated even when the static keys are set to false, given that
> > the compiler is unlikely to be able to figure out by itself that
> > may_use_simd() has no side effects.
> Yeah. That was why it was a question :)
> Given everything is marked as likely I wasn't sure if we cared about when
> the static keys aren't set.
>

Yeah, it is rather doubtful that those annotations (or the use of
static keys, for that matter) make a meaningful difference here.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
  2025-10-31 10:39 ` [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit Ard Biesheuvel
@ 2025-11-06  7:08   ` Herbert Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Herbert Xu @ 2025-11-06  7:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, ebiggers,
	Ard Biesheuvel

On Fri, Oct 31, 2025 at 11:39:00AM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Kernel mode NEON sections are now preemptible on arm64, and so there is
> no need to yield it explicitly in order to prevent scheduling latency
> spikes.
> 
> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 02/21] crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
  2025-10-31 10:39 ` [PATCH v4 02/21] crypto/arm64: sm4-ce-ccm " Ard Biesheuvel
@ 2025-11-06  7:11   ` Herbert Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Herbert Xu @ 2025-11-06  7:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, ebiggers,
	Ard Biesheuvel

On Fri, Oct 31, 2025 at 11:39:01AM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Kernel mode NEON sections are now preemptible on arm64, and so there is
> no need to yield it when calling APIs that may sleep.
> 
> Also, move the calls to kernel_neon_end() to the same scope as
> kernel_neon_begin(). This is needed for a subsequent change where a
> stack buffer is allocated transparently and passed to
> kernel_neon_begin().
> 
> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/crypto/sm4-ce-ccm-glue.c | 25 +++++---------------
>  1 file changed, 6 insertions(+), 19 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 03/21] crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
  2025-10-31 10:39 ` [PATCH v4 03/21] crypto/arm64: sm4-ce-gcm " Ard Biesheuvel
@ 2025-11-06  7:15   ` Herbert Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Herbert Xu @ 2025-11-06  7:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, ebiggers,
	Ard Biesheuvel

On Fri, Oct 31, 2025 at 11:39:02AM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Kernel mode NEON sections are now preemptible on arm64, and so there is
> no need to yield it when calling APIs that may sleep.
> 
> Also, move the calls to kernel_neon_end() to the same scope as
> kernel_neon_begin(). This is needed for a subsequent change where a
> stack buffer is allocated transparently and passed to
> kernel_neon_begin().
> 
> While at it, simplify the logic.
> 
> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/crypto/sm4-ce-gcm-glue.c | 25 +++++---------------
>  1 file changed, 6 insertions(+), 19 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack
  2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
                   ` (20 preceding siblings ...)
  2025-10-31 10:39 ` [PATCH v4 21/21] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack Ard Biesheuvel
@ 2025-11-11 17:12 ` Catalin Marinas
  21 siblings, 0 replies; 37+ messages in thread
From: Catalin Marinas @ 2025-11-11 17:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, linux-crypto, herbert, ebiggers,
	Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Mark Brown

On Fri, Oct 31, 2025 at 11:38:59AM +0100, Ard Biesheuvel wrote:
> Ard Biesheuvel (21):
>   crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
>   crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
>   crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
>   arm64/simd: Add scoped guard API for kernel mode SIMD
>   ARM/simd: Add scoped guard API for kernel mode SIMD
>   crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
>   raid6: Move to more abstract 'ksimd' guard API
>   lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
>   lib/crypto: Switch ARM and arm64 to 'ksimd' scoped guard API
>   crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
>   crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
>   crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
>   crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
>   crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
>   crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
>   crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
>   crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
>   arm64/xorblocks:  Switch to 'ksimd' scoped guard API
>   net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
>   arm64/fpu: Enforce task-context only for generic kernel mode FPU
>   arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack

For the series, especially the arm64 bits:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Since most of this is crypto library changes, I'm fine for it to go
upstream via your tree but please keep it on a stable branch in case we
need to solve any conflicts.

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
  2025-11-04 15:32           ` Ard Biesheuvel
@ 2025-11-12 20:58             ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-11-12 20:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Cameron, Ard Biesheuvel, linux-arm-kernel, linux-kernel,
	linux-crypto, herbert

On Tue, Nov 04, 2025 at 04:32:28PM +0100, Ard Biesheuvel wrote:
> On Mon, 3 Nov 2025 at 12:28, Jonathan Cameron
> <jonathan.cameron@huawei.com> wrote:
> >
> > On Fri, 31 Oct 2025 15:05:19 +0100
> > Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > On Fri, 31 Oct 2025 at 14:52, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Fri, 31 Oct 2025 at 14:49, Jonathan Cameron
> > > > <jonathan.cameron@huawei.com> wrote:
> > > > >
> > > > > On Fri, 31 Oct 2025 11:39:07 +0100
> > > > > Ard Biesheuvel <ardb+git@google.com> wrote:
> > > > >
> > > > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > > >
> > > > > > Before modifying the prototypes of kernel_neon_begin() and
> > > > > > kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers
> > > > > > allocated on the stack, move arm64 to the new 'ksimd' scoped guard API,
> > > > > > which encapsulates the calls to those functions.
> > > > > >
> > > > > > For symmetry, do the same for 32-bit ARM too.
> > > > > >
> > > > > > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > ---
> > > > > >  lib/crc/arm/crc-t10dif.h   | 16 +++++-----------
> > > > > >  lib/crc/arm/crc32.h        | 11 ++++-------
> > > > > >  lib/crc/arm64/crc-t10dif.h | 16 +++++-----------
> > > > > >  lib/crc/arm64/crc32.h      | 16 ++++++----------
> > > > > >  4 files changed, 20 insertions(+), 39 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/crc/arm/crc-t10dif.h b/lib/crc/arm/crc-t10dif.h
> > > > > > index 63441de5e3f1..aaeeab0defb5 100644
> > > > > > --- a/lib/crc/arm/crc-t10dif.h
> > > > > > +++ b/lib/crc/arm/crc-t10dif.h
> > > > >
> > > > > >  static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
> > > > > > @@ -20,21 +19,16 @@ asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len,
> > > > > >  static inline u16 crc_t10dif_arch(u16 crc, const u8 *data, size_t length)
> > > > > >  {
> > > > > >       if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
> > > > > > -             if (static_branch_likely(&have_pmull)) {
> > > > > > -                     if (likely(may_use_simd())) {
> > > > > > -                             kernel_neon_begin();
> > > > > > -                             crc = crc_t10dif_pmull64(crc, data, length);
> > > > > > -                             kernel_neon_end();
> > > > > > -                             return crc;
> > > > > > -                     }
> > > > > > +             if (static_branch_likely(&have_pmull) && likely(may_use_simd())) {
> > > > > > +                     scoped_ksimd()
> > > > > > +                             return crc_t10dif_pmull64(crc, data, length);
> > > > >
> > > > > >               } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > > > > >                          static_branch_likely(&have_neon) &&
> > > > > >                          likely(may_use_simd())) {
> > > > >
> > > > > I briefly thought this was a functional change but it's not because
> > > > > of may_use_simd() being something that isn't going to change between
> > > > > the two evaluations.
> > > > >
> > > > > Would it hurt at all to pull that up to be
> > > > >         if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && likely(may_use_simd())) {
> > > > >                 if (static_branch_likely(&have_pmull)) {
> > > > >                         scoped_ksimd()
> > > > >                                 return crc_t10dif_pmull64(crc, data, length);
> > > > >
> > > > >                 } else if (length > CRC_T10DIF_PMULL_CHUNK_SIZE &&
> > > > >                            static_branch_likely(&have_neon)) {
> > > > >                 ...
> > > > >
> > > > > ?
> > > > >
> > > >
> > > > Yeah that would be a reasonable cleanup, I guess.
> > >
> > > Actually, looking more closely, that would result in may_use_simd()
> > > being evaluated even when the static keys are set to false, given that
> > > the compiler is unlikely to be able to figure out by itself that
> > > may_use_simd() has no side effects.
> > Yeah. That was why it was a question :)
> > Given everything is marked as likely I wasn't sure if we cared about when
> > the static keys aren't set.
> >
> 
> Yeah, it is rather doubtful that those annotations (or the use of
> static keys, for that matter) make a meaningful difference here.

Well, this change makes crc_t10dif_update() not usable during early boot
on arm64, as it will start hitting the
WARN_ON(!system_capabilities_finalized() in may_use_simd().

I doubt there are any current callers where that matters, but I've been
trying to avoid this sort of unnecessary pitfall in the CRC functions.

Checking the static key first eliminates this pitfall and is also more
efficient on CPUs that don't support the relevant CPU feature.

- Eric


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-11-12 21:00 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 10:38 [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 01/21] crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit Ard Biesheuvel
2025-11-06  7:08   ` Herbert Xu
2025-10-31 10:39 ` [PATCH v4 02/21] crypto/arm64: sm4-ce-ccm " Ard Biesheuvel
2025-11-06  7:11   ` Herbert Xu
2025-10-31 10:39 ` [PATCH v4 03/21] crypto/arm64: sm4-ce-gcm " Ard Biesheuvel
2025-11-06  7:15   ` Herbert Xu
2025-10-31 10:39 ` [PATCH v4 04/21] arm64/simd: Add scoped guard API for kernel mode SIMD Ard Biesheuvel
2025-10-31 13:55   ` Jonathan Cameron
2025-10-31 14:05     ` Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 05/21] ARM/simd: " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 06/21] crypto: aegis128-neon - Move to more abstract 'ksimd' guard API Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 07/21] raid6: " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 08/21] lib/crc: Switch ARM and arm64 to 'ksimd' scoped " Ard Biesheuvel
2025-10-31 13:49   ` Jonathan Cameron
2025-10-31 13:52     ` Ard Biesheuvel
2025-10-31 14:05       ` Ard Biesheuvel
2025-11-03 11:28         ` Jonathan Cameron
2025-11-04 15:32           ` Ard Biesheuvel
2025-11-12 20:58             ` Eric Biggers
2025-10-31 10:39 ` [PATCH v4 09/21] lib/crypto: " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 10/21] crypto/arm64: aes-ccm - Switch " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 11/21] crypto/arm64: aes-blk " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 12/21] crypto/arm64: aes-gcm " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 13/21] crypto/arm64: nhpoly1305 " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 14/21] crypto/arm64: polyval " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 15/21] crypto/arm64: sha3 " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 16/21] crypto/arm64: sm3 " Ard Biesheuvel
2025-10-31 13:52   ` Jonathan Cameron
2025-10-31 13:55     ` Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 17/21] crypto/arm64: sm4 " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 18/21] arm64/xorblocks: " Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 19/21] net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 20/21] arm64/fpu: Enforce task-context only for generic kernel mode FPU Ard Biesheuvel
2025-10-31 10:39 ` [PATCH v4 21/21] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack Ard Biesheuvel
2025-10-31 14:16   ` Ard Biesheuvel
2025-11-11 17:12 ` [PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to " Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).