linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
@ 2025-04-02  0:24 Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 1/9] crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code Eric Biggers
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

Patches 2-9 are almost identical to
https://lore.kernel.org/r/20250220051325.340691-3-ebiggers@kernel.org/
but now split into multiple patches.  Patch 1 is just a resend of
https://lore.kernel.org/r/20250320220648.121990-1-ebiggers@kernel.org/
which is needed for the series to apply cleanly but is otherwise
unrelated.  Description of patches 2-9 follows:

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

En/decryption gets at least somewhat faster for everyone, since the
crypto API functions such as crypto_skcipher_encrypt() now go directly
to the underlying algorithm rather than taking a detour through
crypto/simd.c which involved an extra indirect call.  For example, on a
Ryzen 9 9950X desktop processor, AES-256-XTS is now 23% faster for
512-byte messages and 7% faster for 4096-byte messages (when accessed
through crypto_skcipher_encrypt() or crypto_skcipher_decrypt()).

There's also a much larger performance improvement for crypto API users
that only support synchronous algorithms.  These users will now actually
use the x86 SIMD (e.g. AES-NI or VAES) optimized en/decryption modes,
which they couldn't before because they were marked as asynchronous.

Eric Biggers (9):
  crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code
  crypto: x86/aegis - stop using the SIMD helper
  crypto: x86/aes - stop using the SIMD helper
  crypto: x86/aria - stop using the SIMD helper
  crypto: x86/camellia - stop using the SIMD helper
  crypto: x86/cast - stop using the SIMD helper
  crypto: x86/serpent - stop using the SIMD helper
  crypto: x86/sm4 - stop using the SIMD helper
  crypto: x86/twofish - stop using the SIMD helper

 arch/x86/crypto/Kconfig                    |  14 --
 arch/x86/crypto/aegis128-aesni-glue.c      |  13 +-
 arch/x86/crypto/aes-ctr-avx-x86_64.S       |  47 ++----
 arch/x86/crypto/aes-xts-avx-x86_64.S       | 118 ++++++--------
 arch/x86/crypto/aesni-intel_glue.c         | 174 ++++++++-------------
 arch/x86/crypto/aria_aesni_avx2_glue.c     |  22 +--
 arch/x86/crypto/aria_aesni_avx_glue.c      |  20 +--
 arch/x86/crypto/aria_gfni_avx512_glue.c    |  22 +--
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  21 +--
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  21 +--
 arch/x86/crypto/cast5_avx_glue.c           |  21 +--
 arch/x86/crypto/cast6_avx_glue.c           |  20 +--
 arch/x86/crypto/serpent_avx2_glue.c        |  21 +--
 arch/x86/crypto/serpent_avx_glue.c         |  21 +--
 arch/x86/crypto/serpent_sse2_glue.c        |  21 +--
 arch/x86/crypto/sm4_aesni_avx2_glue.c      |  31 ++--
 arch/x86/crypto/sm4_aesni_avx_glue.c       |  31 ++--
 arch/x86/crypto/twofish_avx_glue.c         |  21 +--
 18 files changed, 227 insertions(+), 432 deletions(-)


base-commit: 91e5bfe317d8f8471fbaa3e70cf66cae1314a516
-- 
2.49.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/9] crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 2/9] crypto: x86/aegis - stop using the SIMD helper Eric Biggers
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Intel made a late change to the AVX10 specification that removes support
for a 256-bit maximum vector length and enumeration of the maximum
vector length.  AVX10 will imply a maximum vector length of 512 bits.
I.e. there won't be any such thing as AVX10/256 or AVX10/512; there will
just be AVX10, and it will essentially just consolidate AVX512 features.

As a result of this new development, my strategy of providing both
*_avx10_256 and *_avx10_512 functions didn't turn out to be that useful.
The only remaining motivation for the 256-bit AVX512 / AVX10 functions
is to avoid downclocking on older Intel CPUs.  But in the case of
AES-XTS and AES-CTR, I already wrote *_avx2 code too (primarily to
support CPUs without AVX512), which performs almost as well as
*_avx10_256.  So we should just use that.

Therefore, remove the *_avx10_256 AES-XTS and AES-CTR functions and
algorithms, and rename the *_avx10_512 AES-XTS and AES-CTR functions and
algorithms to *_avx512.  Make Ice Lake and Tiger Lake use *_avx2 instead
of *_avx10_256 which they previously used.

I've left AES-GCM unchanged for now.  There is no VAES+AVX2 optimized
AES-GCM in the kernel yet, so the path forward for that is not as clear.
However, I did write a VAES+AVX2 optimized AES-GCM for BoringSSL.  So
one option is to port that to the kernel and then do the same cleanup.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/aes-ctr-avx-x86_64.S |  47 ++++-------
 arch/x86/crypto/aes-xts-avx-x86_64.S | 118 ++++++++++++---------------
 arch/x86/crypto/aesni-intel_glue.c   |  30 +++----
 3 files changed, 74 insertions(+), 121 deletions(-)

diff --git a/arch/x86/crypto/aes-ctr-avx-x86_64.S b/arch/x86/crypto/aes-ctr-avx-x86_64.S
index 1685d8b24b2ca..bbbfd80f5a502 100644
--- a/arch/x86/crypto/aes-ctr-avx-x86_64.S
+++ b/arch/x86/crypto/aes-ctr-avx-x86_64.S
@@ -46,12 +46,11 @@
 //
 // This file contains x86_64 assembly implementations of AES-CTR and AES-XCTR
 // using the following sets of CPU features:
 //	- AES-NI && AVX
 //	- VAES && AVX2
-//	- VAES && (AVX10/256 || (AVX512BW && AVX512VL)) && BMI2
-//	- VAES && (AVX10/512 || (AVX512BW && AVX512VL)) && BMI2
+//	- VAES && AVX512BW && AVX512VL && BMI2
 //
 // See the function definitions at the bottom of the file for more information.
 
 #include <linux/linkage.h>
 #include <linux/cfi_types.h>
@@ -74,31 +73,29 @@
 	.quad	4, 0
 
 .text
 
 // Move a vector between memory and a register.
-// The register operand must be in the first 16 vector registers.
 .macro	_vmovdqu	src, dst
 .if VL < 64
 	vmovdqu		\src, \dst
 .else
 	vmovdqu8	\src, \dst
 .endif
 .endm
 
 // Move a vector between registers.
-// The registers must be in the first 16 vector registers.
 .macro	_vmovdqa	src, dst
 .if VL < 64
 	vmovdqa		\src, \dst
 .else
 	vmovdqa64	\src, \dst
 .endif
 .endm
 
 // Broadcast a 128-bit value from memory to all 128-bit lanes of a vector
-// register.  The register operand must be in the first 16 vector registers.
+// register.
 .macro	_vbroadcast128	src, dst
 .if VL == 16
 	vmovdqu		\src, \dst
 .elseif VL == 32
 	vbroadcasti128	\src, \dst
@@ -106,11 +103,10 @@
 	vbroadcasti32x4	\src, \dst
 .endif
 .endm
 
 // XOR two vectors together.
-// Any register operands must be in the first 16 vector registers.
 .macro	_vpxor	src1, src2, dst
 .if VL < 64
 	vpxor		\src1, \src2, \dst
 .else
 	vpxord		\src1, \src2, \dst
@@ -197,20 +193,20 @@
 
 // Prepare the next two vectors of AES inputs in AESDATA\i0 and AESDATA\i1, and
 // XOR each with the zero-th round key.  Also update LE_CTR if !\final.
 .macro	_prepare_2_ctr_vecs	is_xctr, i0, i1, final=0
 .if \is_xctr
-  .if USE_AVX10
-	_vmovdqa	LE_CTR, AESDATA\i0
+  .if USE_AVX512
+	vmovdqa64	LE_CTR, AESDATA\i0
 	vpternlogd	$0x96, XCTR_IV, RNDKEY0, AESDATA\i0
   .else
 	vpxor		XCTR_IV, LE_CTR, AESDATA\i0
 	vpxor		RNDKEY0, AESDATA\i0, AESDATA\i0
   .endif
 	vpaddq		LE_CTR_INC1, LE_CTR, AESDATA\i1
 
-  .if USE_AVX10
+  .if USE_AVX512
 	vpternlogd	$0x96, XCTR_IV, RNDKEY0, AESDATA\i1
   .else
 	vpxor		XCTR_IV, AESDATA\i1, AESDATA\i1
 	vpxor		RNDKEY0, AESDATA\i1, AESDATA\i1
   .endif
@@ -479,22 +475,16 @@
 	_vmovdqa	AESDATA3, AESDATA0
 
 .Lxor_tail_partial_vec_0\@:
 	// XOR the remaining 1 <= LEN < VL bytes.  It's easy if masked
 	// loads/stores are available; otherwise it's a bit harder...
-.if USE_AVX10
-  .if VL <= 32
-	mov		$-1, %eax
-	bzhi		LEN, %eax, %eax
-	kmovd		%eax, %k1
-  .else
+.if USE_AVX512
 	mov		$-1, %rax
 	bzhi		LEN64, %rax, %rax
 	kmovq		%rax, %k1
-  .endif
 	vmovdqu8	(SRC), AESDATA1{%k1}{z}
-	_vpxor		AESDATA1, AESDATA0, AESDATA0
+	vpxord		AESDATA1, AESDATA0, AESDATA0
 	vmovdqu8	AESDATA0, (DST){%k1}
 .else
   .if VL == 32
 	cmp		$16, LEN
 	jl		1f
@@ -552,41 +542,32 @@
 // with HCTR2" (https://eprint.iacr.org/2021/1441.pdf).  XCTR is an
 // easier-to-implement variant of CTR that uses little endian byte order and
 // eliminates carries.  |ctr| is the per-message block counter starting at 1.
 
 .set	VL, 16
-.set	USE_AVX10, 0
+.set	USE_AVX512, 0
 SYM_TYPED_FUNC_START(aes_ctr64_crypt_aesni_avx)
 	_aes_ctr_crypt	0
 SYM_FUNC_END(aes_ctr64_crypt_aesni_avx)
 SYM_TYPED_FUNC_START(aes_xctr_crypt_aesni_avx)
 	_aes_ctr_crypt	1
 SYM_FUNC_END(aes_xctr_crypt_aesni_avx)
 
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 .set	VL, 32
-.set	USE_AVX10, 0
+.set	USE_AVX512, 0
 SYM_TYPED_FUNC_START(aes_ctr64_crypt_vaes_avx2)
 	_aes_ctr_crypt	0
 SYM_FUNC_END(aes_ctr64_crypt_vaes_avx2)
 SYM_TYPED_FUNC_START(aes_xctr_crypt_vaes_avx2)
 	_aes_ctr_crypt	1
 SYM_FUNC_END(aes_xctr_crypt_vaes_avx2)
 
-.set	VL, 32
-.set	USE_AVX10, 1
-SYM_TYPED_FUNC_START(aes_ctr64_crypt_vaes_avx10_256)
-	_aes_ctr_crypt	0
-SYM_FUNC_END(aes_ctr64_crypt_vaes_avx10_256)
-SYM_TYPED_FUNC_START(aes_xctr_crypt_vaes_avx10_256)
-	_aes_ctr_crypt	1
-SYM_FUNC_END(aes_xctr_crypt_vaes_avx10_256)
-
 .set	VL, 64
-.set	USE_AVX10, 1
-SYM_TYPED_FUNC_START(aes_ctr64_crypt_vaes_avx10_512)
+.set	USE_AVX512, 1
+SYM_TYPED_FUNC_START(aes_ctr64_crypt_vaes_avx512)
 	_aes_ctr_crypt	0
-SYM_FUNC_END(aes_ctr64_crypt_vaes_avx10_512)
-SYM_TYPED_FUNC_START(aes_xctr_crypt_vaes_avx10_512)
+SYM_FUNC_END(aes_ctr64_crypt_vaes_avx512)
+SYM_TYPED_FUNC_START(aes_xctr_crypt_vaes_avx512)
 	_aes_ctr_crypt	1
-SYM_FUNC_END(aes_xctr_crypt_vaes_avx10_512)
+SYM_FUNC_END(aes_xctr_crypt_vaes_avx512)
 #endif // CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ
diff --git a/arch/x86/crypto/aes-xts-avx-x86_64.S b/arch/x86/crypto/aes-xts-avx-x86_64.S
index 93ba0ddbe0092..bbeaccbd1c51f 100644
--- a/arch/x86/crypto/aes-xts-avx-x86_64.S
+++ b/arch/x86/crypto/aes-xts-avx-x86_64.S
@@ -50,36 +50,29 @@
  * This file implements AES-XTS for modern x86_64 CPUs.  To handle the
  * complexities of coding for x86 SIMD, e.g. where every vector length needs
  * different code, it uses a macro to generate several implementations that
  * share similar source code but are targeted at different CPUs, listed below:
  *
- * AES-NI + AVX
+ * AES-NI && AVX
  *    - 128-bit vectors (1 AES block per vector)
  *    - VEX-coded instructions
  *    - xmm0-xmm15
  *    - This is for older CPUs that lack VAES but do have AVX.
  *
- * VAES + VPCLMULQDQ + AVX2
+ * VAES && VPCLMULQDQ && AVX2
  *    - 256-bit vectors (2 AES blocks per vector)
  *    - VEX-coded instructions
  *    - ymm0-ymm15
- *    - This is for CPUs that have VAES but lack AVX512 or AVX10,
- *      e.g. Intel's Alder Lake and AMD's Zen 3.
+ *    - This is for CPUs that have VAES but either lack AVX512 (e.g. Intel's
+ *      Alder Lake and AMD's Zen 3) or downclock too eagerly when using zmm
+ *      registers (e.g. Intel's Ice Lake).
  *
- * VAES + VPCLMULQDQ + AVX10/256 + BMI2
- *    - 256-bit vectors (2 AES blocks per vector)
+ * VAES && VPCLMULQDQ && AVX512BW && AVX512VL && BMI2
+ *    - 512-bit vectors (4 AES blocks per vector)
  *    - EVEX-coded instructions
- *    - ymm0-ymm31
- *    - This is for CPUs that have AVX512 but where using zmm registers causes
- *      downclocking, and for CPUs that have AVX10/256 but not AVX10/512.
- *    - By "AVX10/256" we really mean (AVX512BW + AVX512VL) || AVX10/256.
- *      To avoid confusion with 512-bit, we just write AVX10/256.
- *
- * VAES + VPCLMULQDQ + AVX10/512 + BMI2
- *    - Same as the previous one, but upgrades to 512-bit vectors
- *      (4 AES blocks per vector) in zmm0-zmm31.
- *    - This is for CPUs that have good AVX512 or AVX10/512 support.
+ *    - zmm0-zmm31
+ *    - This is for CPUs that have good AVX512 support.
  *
  * This file doesn't have an implementation for AES-NI alone (without AVX), as
  * the lack of VEX would make all the assembly code different.
  *
  * When we use VAES, we also use VPCLMULQDQ to parallelize the computation of
@@ -107,11 +100,11 @@
 	// exists when there's a carry out of the low 64 bits of the tweak.
 	.quad	0x87, 1
 
 	// This table contains constants for vpshufb and vpblendvb, used to
 	// handle variable byte shifts and blending during ciphertext stealing
-	// on CPUs that don't support AVX10-style masking.
+	// on CPUs that don't support AVX512-style masking.
 .Lcts_permute_table:
 	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
 	.byte	0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
 	.byte	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
 	.byte	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
@@ -136,11 +129,11 @@
 	// are available, that map to the xmm, ymm, or zmm registers according
 	// to the selected Vector Length (VL).
 .irp i, 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
 	_define_Vi	\i
 .endr
-.if USE_AVX10
+.if USE_AVX512
 .irp i, 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
 	_define_Vi	\i
 .endr
 .endif
 
@@ -191,11 +184,11 @@
 	// AES-128, AES-192, and AES-256 use different numbers of round keys.
 	// To allow handling all three variants efficiently, we align the round
 	// keys to the *end* of this register range.  I.e., AES-128 uses
 	// KEY5-KEY14, AES-192 uses KEY3-KEY14, and AES-256 uses KEY1-KEY14.
 	// (All also use KEY0 for the XOR-only "round" at the beginning.)
-.if USE_AVX10
+.if USE_AVX512
 	.set	KEY1_XMM,	%xmm16
 	.set	KEY1,		V16
 	.set	KEY2_XMM,	%xmm17
 	.set	KEY2,		V17
 	.set	KEY3_XMM,	%xmm18
@@ -225,43 +218,41 @@
 .endif
 	// V30-V31 are currently unused.
 .endm
 
 // Move a vector between memory and a register.
-// The register operand must be in the first 16 vector registers.
 .macro	_vmovdqu	src, dst
 .if VL < 64
 	vmovdqu		\src, \dst
 .else
 	vmovdqu8	\src, \dst
 .endif
 .endm
 
 // Broadcast a 128-bit value into a vector.
 .macro	_vbroadcast128	src, dst
-.if VL == 16 && !USE_AVX10
+.if VL == 16
 	vmovdqu		\src, \dst
-.elseif VL == 32 && !USE_AVX10
+.elseif VL == 32
 	vbroadcasti128	\src, \dst
 .else
 	vbroadcasti32x4	\src, \dst
 .endif
 .endm
 
 // XOR two vectors together.
-// Any register operands must be in the first 16 vector registers.
 .macro	_vpxor	src1, src2, dst
 .if VL < 64
 	vpxor		\src1, \src2, \dst
 .else
 	vpxord		\src1, \src2, \dst
 .endif
 .endm
 
 // XOR three vectors together.
 .macro	_xor3	src1, src2, src3_and_dst
-.if USE_AVX10
+.if USE_AVX512
 	// vpternlogd with immediate 0x96 is a three-argument XOR.
 	vpternlogd	$0x96, \src1, \src2, \src3_and_dst
 .else
 	vpxor		\src1, \src3_and_dst, \src3_and_dst
 	vpxor		\src2, \src3_and_dst, \src3_and_dst
@@ -272,11 +263,11 @@
 // (by multiplying by the polynomial 'x') and write it to \dst.
 .macro	_next_tweak	src, tmp, dst
 	vpshufd		$0x13, \src, \tmp
 	vpaddq		\src, \src, \dst
 	vpsrad		$31, \tmp, \tmp
-.if USE_AVX10
+.if USE_AVX512
 	vpternlogd	$0x78, GF_POLY_XMM, \tmp, \dst
 .else
 	vpand		GF_POLY_XMM, \tmp, \tmp
 	vpxor		\tmp, \dst, \dst
 .endif
@@ -335,11 +326,11 @@
 	vpslldq		$8, V2, V2
 	vpslldq		$8, V4, V4
 	vpsllq		$1*VL/16, TWEAK0, TWEAK1
 	vpsllq		$2*VL/16, TWEAK0, TWEAK2
 	vpsllq		$3*VL/16, TWEAK0, TWEAK3
-.if USE_AVX10
+.if USE_AVX512
 	vpternlogd	$0x96, V0, V1, TWEAK1
 	vpternlogd	$0x96, V2, V3, TWEAK2
 	vpternlogd	$0x96, V4, V5, TWEAK3
 .else
 	vpxor		V0, TWEAK1, TWEAK1
@@ -472,30 +463,30 @@
 	// interleave the AES rounds with the XTS tweak computation, and (c) it
 	// seems unwise to rely *too* heavily on the CPU's branch predictor.
 	lea		OFFS-16(KEY, KEYLEN64, 4), KEY
 
 	// If all 32 SIMD registers are available, cache all the round keys.
-.if USE_AVX10
+.if USE_AVX512
 	cmp		$24, KEYLEN
 	jl		.Laes128\@
 	je		.Laes192\@
-	_vbroadcast128	-6*16(KEY), KEY1
-	_vbroadcast128	-5*16(KEY), KEY2
+	vbroadcasti32x4	-6*16(KEY), KEY1
+	vbroadcasti32x4	-5*16(KEY), KEY2
 .Laes192\@:
-	_vbroadcast128	-4*16(KEY), KEY3
-	_vbroadcast128	-3*16(KEY), KEY4
+	vbroadcasti32x4	-4*16(KEY), KEY3
+	vbroadcasti32x4	-3*16(KEY), KEY4
 .Laes128\@:
-	_vbroadcast128	-2*16(KEY), KEY5
-	_vbroadcast128	-1*16(KEY), KEY6
-	_vbroadcast128	0*16(KEY), KEY7
-	_vbroadcast128	1*16(KEY), KEY8
-	_vbroadcast128	2*16(KEY), KEY9
-	_vbroadcast128	3*16(KEY), KEY10
-	_vbroadcast128	4*16(KEY), KEY11
-	_vbroadcast128	5*16(KEY), KEY12
-	_vbroadcast128	6*16(KEY), KEY13
-	_vbroadcast128	7*16(KEY), KEY14
+	vbroadcasti32x4	-2*16(KEY), KEY5
+	vbroadcasti32x4	-1*16(KEY), KEY6
+	vbroadcasti32x4	0*16(KEY), KEY7
+	vbroadcasti32x4	1*16(KEY), KEY8
+	vbroadcasti32x4	2*16(KEY), KEY9
+	vbroadcasti32x4	3*16(KEY), KEY10
+	vbroadcasti32x4	4*16(KEY), KEY11
+	vbroadcasti32x4	5*16(KEY), KEY12
+	vbroadcasti32x4	6*16(KEY), KEY13
+	vbroadcasti32x4	7*16(KEY), KEY14
 .endif
 .endm
 
 // Do a single non-last round of AES encryption (if \enc==1) or decryption (if
 // \enc==0) on the block(s) in \data using the round key(s) in \key.  The
@@ -519,11 +510,11 @@
 
 // Do a single non-last round of AES en/decryption on the block(s) in \data,
 // using the same key for all block(s).  The round key is loaded from the
 // appropriate register or memory location for round \i.  May clobber \tmp.
 .macro _vaes_1x		enc, i, xmm_suffix, data, tmp
-.if USE_AVX10
+.if USE_AVX512
 	_vaes		\enc, KEY\i\xmm_suffix, \data
 .else
 .ifnb \xmm_suffix
 	_vaes		\enc, (\i-7)*16(KEY), \data
 .else
@@ -536,11 +527,11 @@
 // Do a single non-last round of AES en/decryption on the blocks in registers
 // V0-V3, using the same key for all blocks.  The round key is loaded from the
 // appropriate register or memory location for round \i.  In addition, does two
 // steps of the computation of the next set of tweaks.  May clobber V4 and V5.
 .macro	_vaes_4x	enc, i
-.if USE_AVX10
+.if USE_AVX512
 	_tweak_step	(2*(\i-5))
 	_vaes		\enc, KEY\i, V0
 	_vaes		\enc, KEY\i, V1
 	_tweak_step	(2*(\i-5) + 1)
 	_vaes		\enc, KEY\i, V2
@@ -572,11 +563,11 @@
 	_vaes_1x	\enc, 4, \xmm_suffix, \data, tmp=\tmp
 .Laes128\@:
 .irp i, 5,6,7,8,9,10,11,12,13
 	_vaes_1x	\enc, \i, \xmm_suffix, \data, tmp=\tmp
 .endr
-.if USE_AVX10
+.if USE_AVX512
 	vpxord		KEY14\xmm_suffix, \tweak, \tmp
 .else
 .ifnb \xmm_suffix
 	vpxor		7*16(KEY), \tweak, \tmp
 .else
@@ -615,15 +606,15 @@
 
 .Lmain_loop\@:
 	// This is the main loop, en/decrypting 4*VL bytes per iteration.
 
 	// XOR each source block with its tweak and the zero-th round key.
-.if USE_AVX10
-	_vmovdqu	0*VL(SRC), V0
-	_vmovdqu	1*VL(SRC), V1
-	_vmovdqu	2*VL(SRC), V2
-	_vmovdqu	3*VL(SRC), V3
+.if USE_AVX512
+	vmovdqu8	0*VL(SRC), V0
+	vmovdqu8	1*VL(SRC), V1
+	vmovdqu8	2*VL(SRC), V2
+	vmovdqu8	3*VL(SRC), V3
 	vpternlogd	$0x96, TWEAK0, KEY0, V0
 	vpternlogd	$0x96, TWEAK1, KEY0, V1
 	vpternlogd	$0x96, TWEAK2, KEY0, V2
 	vpternlogd	$0x96, TWEAK3, KEY0, V3
 .else
@@ -652,11 +643,11 @@
 .endr
 	// Do the last AES round, then XOR the results with the tweaks again.
 	// Reduce latency by doing the XOR before the vaesenclast, utilizing the
 	// property vaesenclast(key, a) ^ b == vaesenclast(key ^ b, a)
 	// (and likewise for vaesdeclast).
-.if USE_AVX10
+.if USE_AVX512
 	_tweak_step	18
 	_tweak_step	19
 	vpxord		TWEAK0, KEY14, V4
 	vpxord		TWEAK1, KEY14, V5
 	_vaeslast	\enc, V4, V0
@@ -760,11 +751,11 @@
 	_next_tweak	TWEAK0_XMM, %xmm0, TWEAK1_XMM
 	vmovdqu		(SRC), %xmm0
 	_aes_crypt	\enc, _XMM, TWEAK1_XMM, %xmm0, tmp=%xmm1
 .endif
 
-.if USE_AVX10
+.if USE_AVX512
 	// Create a mask that has the first LEN bits set.
 	mov		$-1, %r9d
 	bzhi		LEN, %r9d, %r9d
 	kmovd		%r9d, %k1
 
@@ -809,11 +800,11 @@
 
 // void aes_xts_encrypt_iv(const struct crypto_aes_ctx *tweak_key,
 //			   u8 iv[AES_BLOCK_SIZE]);
 //
 // Encrypt |iv| using the AES key |tweak_key| to get the first tweak.  Assumes
-// that the CPU supports AES-NI and AVX, but not necessarily VAES or AVX10.
+// that the CPU supports AES-NI and AVX, but not necessarily VAES or AVX512.
 SYM_TYPED_FUNC_START(aes_xts_encrypt_iv)
 	.set	TWEAK_KEY,	%rdi
 	.set	IV,		%rsi
 	.set	KEYLEN,		%eax
 	.set	KEYLEN64,	%rax
@@ -851,41 +842,32 @@ SYM_FUNC_END(aes_xts_encrypt_iv)
 // incremental computation, but |len| must always be >= 16 (AES_BLOCK_SIZE), and
 // |len| must be a multiple of 16 except on the last call.  If |len| is a
 // multiple of 16, then this function updates |tweak| to contain the next tweak.
 
 .set	VL, 16
-.set	USE_AVX10, 0
+.set	USE_AVX512, 0
 SYM_TYPED_FUNC_START(aes_xts_encrypt_aesni_avx)
 	_aes_xts_crypt	1
 SYM_FUNC_END(aes_xts_encrypt_aesni_avx)
 SYM_TYPED_FUNC_START(aes_xts_decrypt_aesni_avx)
 	_aes_xts_crypt	0
 SYM_FUNC_END(aes_xts_decrypt_aesni_avx)
 
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 .set	VL, 32
-.set	USE_AVX10, 0
+.set	USE_AVX512, 0
 SYM_TYPED_FUNC_START(aes_xts_encrypt_vaes_avx2)
 	_aes_xts_crypt	1
 SYM_FUNC_END(aes_xts_encrypt_vaes_avx2)
 SYM_TYPED_FUNC_START(aes_xts_decrypt_vaes_avx2)
 	_aes_xts_crypt	0
 SYM_FUNC_END(aes_xts_decrypt_vaes_avx2)
 
-.set	VL, 32
-.set	USE_AVX10, 1
-SYM_TYPED_FUNC_START(aes_xts_encrypt_vaes_avx10_256)
-	_aes_xts_crypt	1
-SYM_FUNC_END(aes_xts_encrypt_vaes_avx10_256)
-SYM_TYPED_FUNC_START(aes_xts_decrypt_vaes_avx10_256)
-	_aes_xts_crypt	0
-SYM_FUNC_END(aes_xts_decrypt_vaes_avx10_256)
-
 .set	VL, 64
-.set	USE_AVX10, 1
-SYM_TYPED_FUNC_START(aes_xts_encrypt_vaes_avx10_512)
+.set	USE_AVX512, 1
+SYM_TYPED_FUNC_START(aes_xts_encrypt_vaes_avx512)
 	_aes_xts_crypt	1
-SYM_FUNC_END(aes_xts_encrypt_vaes_avx10_512)
-SYM_TYPED_FUNC_START(aes_xts_decrypt_vaes_avx10_512)
+SYM_FUNC_END(aes_xts_encrypt_vaes_avx512)
+SYM_TYPED_FUNC_START(aes_xts_decrypt_vaes_avx512)
 	_aes_xts_crypt	0
-SYM_FUNC_END(aes_xts_decrypt_vaes_avx10_512)
+SYM_FUNC_END(aes_xts_decrypt_vaes_avx512)
 #endif /* CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ */
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index bc655d794a95c..5bdeda39cef65 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -842,12 +842,11 @@ static struct simd_skcipher_alg *					       \
 simd_skcipher_algs_##suffix[ARRAY_SIZE(skcipher_algs_##suffix)]
 
 DEFINE_AVX_SKCIPHER_ALGS(aesni_avx, "aesni-avx", 500);
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 DEFINE_AVX_SKCIPHER_ALGS(vaes_avx2, "vaes-avx2", 600);
-DEFINE_AVX_SKCIPHER_ALGS(vaes_avx10_256, "vaes-avx10_256", 700);
-DEFINE_AVX_SKCIPHER_ALGS(vaes_avx10_512, "vaes-avx10_512", 800);
+DEFINE_AVX_SKCIPHER_ALGS(vaes_avx512, "vaes-avx512", 800);
 #endif
 
 /* The common part of the x86_64 AES-GCM key struct */
 struct aes_gcm_key {
 	/* Expanded AES key and the AES key length in bytes */
@@ -1590,33 +1589,28 @@ static int __init register_avx_algs(void)
 	    !boot_cpu_has(X86_FEATURE_BMI2) ||
 	    !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
 			       XFEATURE_MASK_AVX512, NULL))
 		return 0;
 
-	err = simd_register_skciphers_compat(skcipher_algs_vaes_avx10_256,
-					     ARRAY_SIZE(skcipher_algs_vaes_avx10_256),
-					     simd_skcipher_algs_vaes_avx10_256);
-	if (err)
-		return err;
 	err = simd_register_aeads_compat(aes_gcm_algs_vaes_avx10_256,
 					 ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256),
 					 aes_gcm_simdalgs_vaes_avx10_256);
 	if (err)
 		return err;
 
 	if (boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
 		int i;
 
-		for (i = 0; i < ARRAY_SIZE(skcipher_algs_vaes_avx10_512); i++)
-			skcipher_algs_vaes_avx10_512[i].base.cra_priority = 1;
+		for (i = 0; i < ARRAY_SIZE(skcipher_algs_vaes_avx512); i++)
+			skcipher_algs_vaes_avx512[i].base.cra_priority = 1;
 		for (i = 0; i < ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512); i++)
 			aes_gcm_algs_vaes_avx10_512[i].base.cra_priority = 1;
 	}
 
-	err = simd_register_skciphers_compat(skcipher_algs_vaes_avx10_512,
-					     ARRAY_SIZE(skcipher_algs_vaes_avx10_512),
-					     simd_skcipher_algs_vaes_avx10_512);
+	err = simd_register_skciphers_compat(skcipher_algs_vaes_avx512,
+					     ARRAY_SIZE(skcipher_algs_vaes_avx512),
+					     simd_skcipher_algs_vaes_avx512);
 	if (err)
 		return err;
 	err = simd_register_aeads_compat(aes_gcm_algs_vaes_avx10_512,
 					 ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512),
 					 aes_gcm_simdalgs_vaes_avx10_512);
@@ -1639,22 +1633,18 @@ static void unregister_avx_algs(void)
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 	if (simd_skcipher_algs_vaes_avx2[0])
 		simd_unregister_skciphers(skcipher_algs_vaes_avx2,
 					  ARRAY_SIZE(skcipher_algs_vaes_avx2),
 					  simd_skcipher_algs_vaes_avx2);
-	if (simd_skcipher_algs_vaes_avx10_256[0])
-		simd_unregister_skciphers(skcipher_algs_vaes_avx10_256,
-					  ARRAY_SIZE(skcipher_algs_vaes_avx10_256),
-					  simd_skcipher_algs_vaes_avx10_256);
 	if (aes_gcm_simdalgs_vaes_avx10_256[0])
 		simd_unregister_aeads(aes_gcm_algs_vaes_avx10_256,
 				      ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256),
 				      aes_gcm_simdalgs_vaes_avx10_256);
-	if (simd_skcipher_algs_vaes_avx10_512[0])
-		simd_unregister_skciphers(skcipher_algs_vaes_avx10_512,
-					  ARRAY_SIZE(skcipher_algs_vaes_avx10_512),
-					  simd_skcipher_algs_vaes_avx10_512);
+	if (simd_skcipher_algs_vaes_avx512[0])
+		simd_unregister_skciphers(skcipher_algs_vaes_avx512,
+					  ARRAY_SIZE(skcipher_algs_vaes_avx512),
+					  simd_skcipher_algs_vaes_avx512);
 	if (aes_gcm_simdalgs_vaes_avx10_512[0])
 		simd_unregister_aeads(aes_gcm_algs_vaes_avx10_512,
 				      ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512),
 				      aes_gcm_simdalgs_vaes_avx10_512);
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/9] crypto: x86/aegis - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 1/9] crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 3/9] crypto: x86/aes " Eric Biggers
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig               |  1 -
 arch/x86/crypto/aegis128-aesni-glue.c | 13 ++++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 3d948f10c94cd..c15400efac075 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -367,11 +367,10 @@ config CRYPTO_CHACHA20_X86_64
 
 config CRYPTO_AEGIS128_AESNI_SSE2
 	tristate "AEAD ciphers: AEGIS-128 (AES-NI/SSE4.1)"
 	depends on X86 && 64BIT
 	select CRYPTO_AEAD
-	select CRYPTO_SIMD
 	help
 	  AEGIS-128 AEAD algorithm
 
 	  Architecture: x86_64 using:
 	  - AES-NI (AES New Instructions)
diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index 26786e15abacf..f1b6d40154e35 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -6,11 +6,10 @@
  * Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
  * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
  */
 
 #include <crypto/internal/aead.h>
-#include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/scatterwalk.h>
 #include <linux/module.h>
 #include <asm/fpu/api.h>
 #include <asm/cpu_device_id.h>
@@ -231,39 +230,35 @@ static struct aead_alg crypto_aegis128_aesni_alg = {
 	.ivsize = AEGIS128_NONCE_SIZE,
 	.maxauthsize = AEGIS128_MAX_AUTH_SIZE,
 	.chunksize = AEGIS128_BLOCK_SIZE,
 
 	.base = {
-		.cra_flags = CRYPTO_ALG_INTERNAL,
 		.cra_blocksize = 1,
 		.cra_ctxsize = sizeof(struct aegis_ctx) +
 			       __alignof__(struct aegis_ctx),
 		.cra_priority = 400,
 
-		.cra_name = "__aegis128",
-		.cra_driver_name = "__aegis128-aesni",
+		.cra_name = "aegis128",
+		.cra_driver_name = "aegis128-aesni",
 
 		.cra_module = THIS_MODULE,
 	}
 };
 
-static struct simd_aead_alg *simd_alg;
-
 static int __init crypto_aegis128_aesni_module_init(void)
 {
 	if (!boot_cpu_has(X86_FEATURE_XMM4_1) ||
 	    !boot_cpu_has(X86_FEATURE_AES) ||
 	    !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
 		return -ENODEV;
 
-	return simd_register_aeads_compat(&crypto_aegis128_aesni_alg, 1,
-					  &simd_alg);
+	return crypto_register_aead(&crypto_aegis128_aesni_alg);
 }
 
 static void __exit crypto_aegis128_aesni_module_exit(void)
 {
-	simd_unregister_aeads(&crypto_aegis128_aesni_alg, 1, &simd_alg);
+	crypto_unregister_aead(&crypto_aegis128_aesni_alg);
 }
 
 module_init(crypto_aegis128_aesni_module_init);
 module_exit(crypto_aegis128_aesni_module_exit);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/9] crypto: x86/aes - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 1/9] crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 2/9] crypto: x86/aegis - stop using the SIMD helper Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 4/9] crypto: x86/aria " Eric Biggers
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig            |   1 -
 arch/x86/crypto/aesni-intel_glue.c | 158 +++++++++++------------------
 2 files changed, 59 insertions(+), 100 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c15400efac075..d8f9d6279cb26 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -21,11 +21,10 @@ config CRYPTO_AES_NI_INTEL
 	select CRYPTO_AEAD
 	select CRYPTO_LIB_AES
 	select CRYPTO_LIB_GF128MUL
 	select CRYPTO_ALGAPI
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	help
 	  Block cipher: AES cipher algorithms
 	  AEAD cipher: AES with GCM
 	  Length-preserving ciphers: AES with ECB, CBC, CTS, CTR, XCTR, XTS
 
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 5bdeda39cef65..061b1ced93c51 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -564,14 +564,13 @@ static struct crypto_alg aesni_cipher_alg = {
 };
 
 static struct skcipher_alg aesni_skciphers[] = {
 	{
 		.base = {
-			.cra_name		= "__ecb(aes)",
-			.cra_driver_name	= "__ecb-aes-aesni",
+			.cra_name		= "ecb(aes)",
+			.cra_driver_name	= "ecb-aes-aesni",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= AES_BLOCK_SIZE,
 			.cra_ctxsize		= CRYPTO_AES_CTX_SIZE,
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= AES_MIN_KEY_SIZE,
@@ -579,14 +578,13 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.setkey		= aesni_skcipher_setkey,
 		.encrypt	= ecb_encrypt,
 		.decrypt	= ecb_decrypt,
 	}, {
 		.base = {
-			.cra_name		= "__cbc(aes)",
-			.cra_driver_name	= "__cbc-aes-aesni",
+			.cra_name		= "cbc(aes)",
+			.cra_driver_name	= "cbc-aes-aesni",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= AES_BLOCK_SIZE,
 			.cra_ctxsize		= CRYPTO_AES_CTX_SIZE,
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= AES_MIN_KEY_SIZE,
@@ -595,14 +593,13 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.setkey		= aesni_skcipher_setkey,
 		.encrypt	= cbc_encrypt,
 		.decrypt	= cbc_decrypt,
 	}, {
 		.base = {
-			.cra_name		= "__cts(cbc(aes))",
-			.cra_driver_name	= "__cts-cbc-aes-aesni",
+			.cra_name		= "cts(cbc(aes))",
+			.cra_driver_name	= "cts-cbc-aes-aesni",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= AES_BLOCK_SIZE,
 			.cra_ctxsize		= CRYPTO_AES_CTX_SIZE,
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= AES_MIN_KEY_SIZE,
@@ -613,14 +610,13 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.encrypt	= cts_cbc_encrypt,
 		.decrypt	= cts_cbc_decrypt,
 #ifdef CONFIG_X86_64
 	}, {
 		.base = {
-			.cra_name		= "__ctr(aes)",
-			.cra_driver_name	= "__ctr-aes-aesni",
+			.cra_name		= "ctr(aes)",
+			.cra_driver_name	= "ctr-aes-aesni",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= 1,
 			.cra_ctxsize		= CRYPTO_AES_CTX_SIZE,
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= AES_MIN_KEY_SIZE,
@@ -631,14 +627,13 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.encrypt	= ctr_crypt_aesni,
 		.decrypt	= ctr_crypt_aesni,
 #endif
 	}, {
 		.base = {
-			.cra_name		= "__xts(aes)",
-			.cra_driver_name	= "__xts-aes-aesni",
+			.cra_name		= "xts(aes)",
+			.cra_driver_name	= "xts-aes-aesni",
 			.cra_priority		= 401,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= AES_BLOCK_SIZE,
 			.cra_ctxsize		= XTS_AES_CTX_SIZE,
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
@@ -649,13 +644,10 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.encrypt	= xts_encrypt_aesni,
 		.decrypt	= xts_decrypt_aesni,
 	}
 };
 
-static
-struct simd_skcipher_alg *aesni_simd_skciphers[ARRAY_SIZE(aesni_skciphers)];
-
 #ifdef CONFIG_X86_64
 asmlinkage void aes_xts_encrypt_iv(const struct crypto_aes_ctx *tweak_key,
 				   u8 iv[AES_BLOCK_SIZE]);
 
 /* __always_inline to avoid indirect call */
@@ -790,14 +782,13 @@ static int xctr_crypt_##suffix(struct skcipher_request *req)		       \
 {									       \
 	return xctr_crypt(req, aes_xctr_crypt_##suffix);		       \
 }									       \
 									       \
 static struct skcipher_alg skcipher_algs_##suffix[] = {{		       \
-	.base.cra_name		= "__xts(aes)",				       \
-	.base.cra_driver_name	= "__xts-aes-" driver_name_suffix,	       \
+	.base.cra_name		= "xts(aes)",				       \
+	.base.cra_driver_name	= "xts-aes-" driver_name_suffix,	       \
 	.base.cra_priority	= priority,				       \
-	.base.cra_flags		= CRYPTO_ALG_INTERNAL,			       \
 	.base.cra_blocksize	= AES_BLOCK_SIZE,			       \
 	.base.cra_ctxsize	= XTS_AES_CTX_SIZE,			       \
 	.base.cra_module	= THIS_MODULE,				       \
 	.min_keysize		= 2 * AES_MIN_KEY_SIZE,			       \
 	.max_keysize		= 2 * AES_MAX_KEY_SIZE,			       \
@@ -805,14 +796,13 @@ static struct skcipher_alg skcipher_algs_##suffix[] = {{		       \
 	.walksize		= 2 * AES_BLOCK_SIZE,			       \
 	.setkey			= xts_setkey_aesni,			       \
 	.encrypt		= xts_encrypt_##suffix,			       \
 	.decrypt		= xts_decrypt_##suffix,			       \
 }, {									       \
-	.base.cra_name		= "__ctr(aes)",				       \
-	.base.cra_driver_name	= "__ctr-aes-" driver_name_suffix,	       \
+	.base.cra_name		= "ctr(aes)",				       \
+	.base.cra_driver_name	= "ctr-aes-" driver_name_suffix,	       \
 	.base.cra_priority	= priority,				       \
-	.base.cra_flags		= CRYPTO_ALG_INTERNAL,			       \
 	.base.cra_blocksize	= 1,					       \
 	.base.cra_ctxsize	= CRYPTO_AES_CTX_SIZE,			       \
 	.base.cra_module	= THIS_MODULE,				       \
 	.min_keysize		= AES_MIN_KEY_SIZE,			       \
 	.max_keysize		= AES_MAX_KEY_SIZE,			       \
@@ -820,28 +810,24 @@ static struct skcipher_alg skcipher_algs_##suffix[] = {{		       \
 	.chunksize		= AES_BLOCK_SIZE,			       \
 	.setkey			= aesni_skcipher_setkey,		       \
 	.encrypt		= ctr_crypt_##suffix,			       \
 	.decrypt		= ctr_crypt_##suffix,			       \
 }, {									       \
-	.base.cra_name		= "__xctr(aes)",			       \
-	.base.cra_driver_name	= "__xctr-aes-" driver_name_suffix,	       \
+	.base.cra_name		= "xctr(aes)",				       \
+	.base.cra_driver_name	= "xctr-aes-" driver_name_suffix,	       \
 	.base.cra_priority	= priority,				       \
-	.base.cra_flags		= CRYPTO_ALG_INTERNAL,			       \
 	.base.cra_blocksize	= 1,					       \
 	.base.cra_ctxsize	= CRYPTO_AES_CTX_SIZE,			       \
 	.base.cra_module	= THIS_MODULE,				       \
 	.min_keysize		= AES_MIN_KEY_SIZE,			       \
 	.max_keysize		= AES_MAX_KEY_SIZE,			       \
 	.ivsize			= AES_BLOCK_SIZE,			       \
 	.chunksize		= AES_BLOCK_SIZE,			       \
 	.setkey			= aesni_skcipher_setkey,		       \
 	.encrypt		= xctr_crypt_##suffix,			       \
 	.decrypt		= xctr_crypt_##suffix,			       \
-}};									       \
-									       \
-static struct simd_skcipher_alg *					       \
-simd_skcipher_algs_##suffix[ARRAY_SIZE(skcipher_algs_##suffix)]
+}}
 
 DEFINE_AVX_SKCIPHER_ALGS(aesni_avx, "aesni-avx", 500);
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 DEFINE_AVX_SKCIPHER_ALGS(vaes_avx2, "vaes-avx2", 600);
 DEFINE_AVX_SKCIPHER_ALGS(vaes_avx512, "vaes-avx512", 800);
@@ -1496,14 +1482,13 @@ static struct aead_alg aes_gcm_algs_##suffix[] = { {			       \
 	.decrypt		= gcm_decrypt_##suffix,			       \
 	.ivsize			= GCM_AES_IV_SIZE,			       \
 	.chunksize		= AES_BLOCK_SIZE,			       \
 	.maxauthsize		= 16,					       \
 	.base = {							       \
-		.cra_name		= "__gcm(aes)",			       \
-		.cra_driver_name	= "__" generic_driver_name,	       \
+		.cra_name		= "gcm(aes)",			       \
+		.cra_driver_name	= generic_driver_name,		       \
 		.cra_priority		= (priority),			       \
-		.cra_flags		= CRYPTO_ALG_INTERNAL,		       \
 		.cra_blocksize		= 1,				       \
 		.cra_ctxsize		= (ctxsize),			       \
 		.cra_module		= THIS_MODULE,			       \
 	},								       \
 }, {									       \
@@ -1513,21 +1498,18 @@ static struct aead_alg aes_gcm_algs_##suffix[] = { {			       \
 	.decrypt		= rfc4106_decrypt_##suffix,		       \
 	.ivsize			= GCM_RFC4106_IV_SIZE,			       \
 	.chunksize		= AES_BLOCK_SIZE,			       \
 	.maxauthsize		= 16,					       \
 	.base = {							       \
-		.cra_name		= "__rfc4106(gcm(aes))",	       \
-		.cra_driver_name	= "__" rfc_driver_name,		       \
+		.cra_name		= "rfc4106(gcm(aes))",		       \
+		.cra_driver_name	= rfc_driver_name,		       \
 		.cra_priority		= (priority),			       \
-		.cra_flags		= CRYPTO_ALG_INTERNAL,		       \
 		.cra_blocksize		= 1,				       \
 		.cra_ctxsize		= (ctxsize),			       \
 		.cra_module		= THIS_MODULE,			       \
 	},								       \
-} };									       \
-									       \
-static struct simd_aead_alg *aes_gcm_simdalgs_##suffix[2]		       \
+} }
 
 /* aes_gcm_algs_aesni */
 DEFINE_GCM_ALGS(aesni, /* no flags */ 0,
 		"generic-gcm-aesni", "rfc4106-gcm-aesni",
 		AES_GCM_KEY_AESNI_SIZE, 400);
@@ -1553,18 +1535,16 @@ static int __init register_avx_algs(void)
 {
 	int err;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX))
 		return 0;
-	err = simd_register_skciphers_compat(skcipher_algs_aesni_avx,
-					     ARRAY_SIZE(skcipher_algs_aesni_avx),
-					     simd_skcipher_algs_aesni_avx);
+	err = crypto_register_skciphers(skcipher_algs_aesni_avx,
+					ARRAY_SIZE(skcipher_algs_aesni_avx));
 	if (err)
 		return err;
-	err = simd_register_aeads_compat(aes_gcm_algs_aesni_avx,
-					 ARRAY_SIZE(aes_gcm_algs_aesni_avx),
-					 aes_gcm_simdalgs_aesni_avx);
+	err = crypto_register_aeads(aes_gcm_algs_aesni_avx,
+				    ARRAY_SIZE(aes_gcm_algs_aesni_avx));
 	if (err)
 		return err;
 	/*
 	 * Note: not all the algorithms registered below actually require
 	 * VPCLMULQDQ.  But in practice every CPU with VAES also has VPCLMULQDQ.
@@ -1576,26 +1556,24 @@ static int __init register_avx_algs(void)
 	    !boot_cpu_has(X86_FEATURE_VAES) ||
 	    !boot_cpu_has(X86_FEATURE_VPCLMULQDQ) ||
 	    !boot_cpu_has(X86_FEATURE_PCLMULQDQ) ||
 	    !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
 		return 0;
-	err = simd_register_skciphers_compat(skcipher_algs_vaes_avx2,
-					     ARRAY_SIZE(skcipher_algs_vaes_avx2),
-					     simd_skcipher_algs_vaes_avx2);
+	err = crypto_register_skciphers(skcipher_algs_vaes_avx2,
+					ARRAY_SIZE(skcipher_algs_vaes_avx2));
 	if (err)
 		return err;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX512BW) ||
 	    !boot_cpu_has(X86_FEATURE_AVX512VL) ||
 	    !boot_cpu_has(X86_FEATURE_BMI2) ||
 	    !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
 			       XFEATURE_MASK_AVX512, NULL))
 		return 0;
 
-	err = simd_register_aeads_compat(aes_gcm_algs_vaes_avx10_256,
-					 ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256),
-					 aes_gcm_simdalgs_vaes_avx10_256);
+	err = crypto_register_aeads(aes_gcm_algs_vaes_avx10_256,
+				    ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256));
 	if (err)
 		return err;
 
 	if (boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
 		int i;
@@ -1604,56 +1582,42 @@ static int __init register_avx_algs(void)
 			skcipher_algs_vaes_avx512[i].base.cra_priority = 1;
 		for (i = 0; i < ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512); i++)
 			aes_gcm_algs_vaes_avx10_512[i].base.cra_priority = 1;
 	}
 
-	err = simd_register_skciphers_compat(skcipher_algs_vaes_avx512,
-					     ARRAY_SIZE(skcipher_algs_vaes_avx512),
-					     simd_skcipher_algs_vaes_avx512);
+	err = crypto_register_skciphers(skcipher_algs_vaes_avx512,
+					ARRAY_SIZE(skcipher_algs_vaes_avx512));
 	if (err)
 		return err;
-	err = simd_register_aeads_compat(aes_gcm_algs_vaes_avx10_512,
-					 ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512),
-					 aes_gcm_simdalgs_vaes_avx10_512);
+	err = crypto_register_aeads(aes_gcm_algs_vaes_avx10_512,
+				    ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512));
 	if (err)
 		return err;
 #endif /* CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ */
 	return 0;
 }
 
+#define unregister_skciphers(A) \
+	if (refcount_read(&(A)[0].base.cra_refcnt) != 0) \
+		crypto_unregister_skciphers((A), ARRAY_SIZE(A))
+#define unregister_aeads(A) \
+	if (refcount_read(&(A)[0].base.cra_refcnt) != 0) \
+		crypto_unregister_aeads((A), ARRAY_SIZE(A))
+
 static void unregister_avx_algs(void)
 {
-	if (simd_skcipher_algs_aesni_avx[0])
-		simd_unregister_skciphers(skcipher_algs_aesni_avx,
-					  ARRAY_SIZE(skcipher_algs_aesni_avx),
-					  simd_skcipher_algs_aesni_avx);
-	if (aes_gcm_simdalgs_aesni_avx[0])
-		simd_unregister_aeads(aes_gcm_algs_aesni_avx,
-				      ARRAY_SIZE(aes_gcm_algs_aesni_avx),
-				      aes_gcm_simdalgs_aesni_avx);
+	unregister_skciphers(skcipher_algs_aesni_avx);
+	unregister_aeads(aes_gcm_algs_aesni_avx);
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
-	if (simd_skcipher_algs_vaes_avx2[0])
-		simd_unregister_skciphers(skcipher_algs_vaes_avx2,
-					  ARRAY_SIZE(skcipher_algs_vaes_avx2),
-					  simd_skcipher_algs_vaes_avx2);
-	if (aes_gcm_simdalgs_vaes_avx10_256[0])
-		simd_unregister_aeads(aes_gcm_algs_vaes_avx10_256,
-				      ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256),
-				      aes_gcm_simdalgs_vaes_avx10_256);
-	if (simd_skcipher_algs_vaes_avx512[0])
-		simd_unregister_skciphers(skcipher_algs_vaes_avx512,
-					  ARRAY_SIZE(skcipher_algs_vaes_avx512),
-					  simd_skcipher_algs_vaes_avx512);
-	if (aes_gcm_simdalgs_vaes_avx10_512[0])
-		simd_unregister_aeads(aes_gcm_algs_vaes_avx10_512,
-				      ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512),
-				      aes_gcm_simdalgs_vaes_avx10_512);
+	unregister_skciphers(skcipher_algs_vaes_avx2);
+	unregister_skciphers(skcipher_algs_vaes_avx512);
+	unregister_aeads(aes_gcm_algs_vaes_avx10_256);
+	unregister_aeads(aes_gcm_algs_vaes_avx10_512);
 #endif
 }
 #else /* CONFIG_X86_64 */
 static struct aead_alg aes_gcm_algs_aesni[0];
-static struct simd_aead_alg *aes_gcm_simdalgs_aesni[0];
 
 static int __init register_avx_algs(void)
 {
 	return 0;
 }
@@ -1678,19 +1642,17 @@ static int __init aesni_init(void)
 
 	err = crypto_register_alg(&aesni_cipher_alg);
 	if (err)
 		return err;
 
-	err = simd_register_skciphers_compat(aesni_skciphers,
-					     ARRAY_SIZE(aesni_skciphers),
-					     aesni_simd_skciphers);
+	err = crypto_register_skciphers(aesni_skciphers,
+					ARRAY_SIZE(aesni_skciphers));
 	if (err)
 		goto unregister_cipher;
 
-	err = simd_register_aeads_compat(aes_gcm_algs_aesni,
-					 ARRAY_SIZE(aes_gcm_algs_aesni),
-					 aes_gcm_simdalgs_aesni);
+	err = crypto_register_aeads(aes_gcm_algs_aesni,
+				    ARRAY_SIZE(aes_gcm_algs_aesni));
 	if (err)
 		goto unregister_skciphers;
 
 	err = register_avx_algs();
 	if (err)
@@ -1698,28 +1660,26 @@ static int __init aesni_init(void)
 
 	return 0;
 
 unregister_avx:
 	unregister_avx_algs();
-	simd_unregister_aeads(aes_gcm_algs_aesni,
-			      ARRAY_SIZE(aes_gcm_algs_aesni),
-			      aes_gcm_simdalgs_aesni);
+	crypto_unregister_aeads(aes_gcm_algs_aesni,
+				ARRAY_SIZE(aes_gcm_algs_aesni));
 unregister_skciphers:
-	simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),
-				  aesni_simd_skciphers);
+	crypto_unregister_skciphers(aesni_skciphers,
+				    ARRAY_SIZE(aesni_skciphers));
 unregister_cipher:
 	crypto_unregister_alg(&aesni_cipher_alg);
 	return err;
 }
 
 static void __exit aesni_exit(void)
 {
-	simd_unregister_aeads(aes_gcm_algs_aesni,
-			      ARRAY_SIZE(aes_gcm_algs_aesni),
-			      aes_gcm_simdalgs_aesni);
-	simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),
-				  aesni_simd_skciphers);
+	crypto_unregister_aeads(aes_gcm_algs_aesni,
+				ARRAY_SIZE(aes_gcm_algs_aesni));
+	crypto_unregister_skciphers(aesni_skciphers,
+				    ARRAY_SIZE(aesni_skciphers));
 	crypto_unregister_alg(&aesni_cipher_alg);
 	unregister_avx_algs();
 }
 
 module_init(aesni_init);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/9] crypto: x86/aria - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (2 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 3/9] crypto: x86/aes " Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 5/9] crypto: x86/camellia " Eric Biggers
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig                 |  3 ---
 arch/x86/crypto/aria_aesni_avx2_glue.c  | 22 +++++++---------------
 arch/x86/crypto/aria_aesni_avx_glue.c   | 20 ++++++--------------
 arch/x86/crypto/aria_gfni_avx512_glue.c | 22 +++++++---------------
 4 files changed, 20 insertions(+), 47 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index d8f9d6279cb26..ad00d53ab83d8 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -294,11 +294,10 @@ config CRYPTO_TWOFISH_AVX_X86_64
 
 config CRYPTO_ARIA_AESNI_AVX_X86_64
 	tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX/GFNI)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	select CRYPTO_ALGAPI
 	select CRYPTO_ARIA
 	help
 	  Length-preserving cipher: ARIA cipher algorithms
 	  (RFC 5794) with ECB and CTR modes
@@ -312,11 +311,10 @@ config CRYPTO_ARIA_AESNI_AVX_X86_64
 
 config CRYPTO_ARIA_AESNI_AVX2_X86_64
 	tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX2/GFNI)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	select CRYPTO_ALGAPI
 	select CRYPTO_ARIA
 	select CRYPTO_ARIA_AESNI_AVX_X86_64
 	help
 	  Length-preserving cipher: ARIA cipher algorithms
@@ -331,11 +329,10 @@ config CRYPTO_ARIA_AESNI_AVX2_X86_64
 
 config CRYPTO_ARIA_GFNI_AVX512_X86_64
 	tristate "Ciphers: ARIA with modes: ECB, CTR (AVX512/GFNI)"
 	depends on X86 && 64BIT && AS_AVX512 && AS_GFNI
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	select CRYPTO_ALGAPI
 	select CRYPTO_ARIA
 	select CRYPTO_ARIA_AESNI_AVX_X86_64
 	select CRYPTO_ARIA_AESNI_AVX2_X86_64
 	help
diff --git a/arch/x86/crypto/aria_aesni_avx2_glue.c b/arch/x86/crypto/aria_aesni_avx2_glue.c
index 87a11804fc77f..b4bddcd584577 100644
--- a/arch/x86/crypto/aria_aesni_avx2_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx2_glue.c
@@ -4,11 +4,10 @@
  *
  * Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com>
  */
 
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <crypto/aria.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/types.h>
@@ -163,28 +162,26 @@ static int aria_avx2_init_tfm(struct crypto_skcipher *tfm)
 	return 0;
 }
 
 static struct skcipher_alg aria_algs[] = {
 	{
-		.base.cra_name		= "__ecb(aria)",
-		.base.cra_driver_name	= "__ecb-aria-avx2",
+		.base.cra_name		= "ecb(aria)",
+		.base.cra_driver_name	= "ecb-aria-avx2",
 		.base.cra_priority	= 500,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= ARIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct aria_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= ARIA_MIN_KEY_SIZE,
 		.max_keysize		= ARIA_MAX_KEY_SIZE,
 		.setkey			= aria_avx2_set_key,
 		.encrypt		= aria_avx2_ecb_encrypt,
 		.decrypt		= aria_avx2_ecb_decrypt,
 	}, {
-		.base.cra_name		= "__ctr(aria)",
-		.base.cra_driver_name	= "__ctr-aria-avx2",
+		.base.cra_name		= "ctr(aria)",
+		.base.cra_driver_name	= "ctr-aria-avx2",
 		.base.cra_priority	= 500,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL |
-					  CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE,
+		.base.cra_flags		= CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE,
 		.base.cra_blocksize	= 1,
 		.base.cra_ctxsize	= sizeof(struct aria_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= ARIA_MIN_KEY_SIZE,
 		.max_keysize		= ARIA_MAX_KEY_SIZE,
@@ -195,12 +192,10 @@ static struct skcipher_alg aria_algs[] = {
 		.decrypt		= aria_avx2_ctr_encrypt,
 		.init                   = aria_avx2_init_tfm,
 	}
 };
 
-static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
-
 static int __init aria_avx2_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -231,19 +226,16 @@ static int __init aria_avx2_init(void)
 		aria_ops.aria_encrypt_32way = aria_aesni_avx2_encrypt_32way;
 		aria_ops.aria_decrypt_32way = aria_aesni_avx2_decrypt_32way;
 		aria_ops.aria_ctr_crypt_32way = aria_aesni_avx2_ctr_crypt_32way;
 	}
 
-	return simd_register_skciphers_compat(aria_algs,
-					      ARRAY_SIZE(aria_algs),
-					      aria_simd_algs);
+	return crypto_register_skciphers(aria_algs, ARRAY_SIZE(aria_algs));
 }
 
 static void __exit aria_avx2_exit(void)
 {
-	simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
-				  aria_simd_algs);
+	crypto_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs));
 }
 
 module_init(aria_avx2_init);
 module_exit(aria_avx2_exit);
 
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index 4e1516b76669e..ab9b38d05332a 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -4,11 +4,10 @@
  *
  * Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com>
  */
 
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <crypto/aria.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/types.h>
@@ -150,27 +149,25 @@ static int aria_avx_init_tfm(struct crypto_skcipher *tfm)
 	return 0;
 }
 
 static struct skcipher_alg aria_algs[] = {
 	{
-		.base.cra_name		= "__ecb(aria)",
-		.base.cra_driver_name	= "__ecb-aria-avx",
+		.base.cra_name		= "ecb(aria)",
+		.base.cra_driver_name	= "ecb-aria-avx",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= ARIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct aria_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= ARIA_MIN_KEY_SIZE,
 		.max_keysize		= ARIA_MAX_KEY_SIZE,
 		.setkey			= aria_avx_set_key,
 		.encrypt		= aria_avx_ecb_encrypt,
 		.decrypt		= aria_avx_ecb_decrypt,
 	}, {
-		.base.cra_name		= "__ctr(aria)",
-		.base.cra_driver_name	= "__ctr-aria-avx",
+		.base.cra_name		= "ctr(aria)",
+		.base.cra_driver_name	= "ctr-aria-avx",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= 1,
 		.base.cra_ctxsize	= sizeof(struct aria_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= ARIA_MIN_KEY_SIZE,
 		.max_keysize		= ARIA_MAX_KEY_SIZE,
@@ -182,12 +179,10 @@ static struct skcipher_alg aria_algs[] = {
 		.decrypt		= aria_avx_ctr_encrypt,
 		.init			= aria_avx_init_tfm,
 	}
 };
 
-static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
-
 static int __init aria_avx_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -211,19 +206,16 @@ static int __init aria_avx_init(void)
 		aria_ops.aria_encrypt_16way = aria_aesni_avx_encrypt_16way;
 		aria_ops.aria_decrypt_16way = aria_aesni_avx_decrypt_16way;
 		aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_ctr_crypt_16way;
 	}
 
-	return simd_register_skciphers_compat(aria_algs,
-					      ARRAY_SIZE(aria_algs),
-					      aria_simd_algs);
+	return crypto_register_skciphers(aria_algs, ARRAY_SIZE(aria_algs));
 }
 
 static void __exit aria_avx_exit(void)
 {
-	simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
-				  aria_simd_algs);
+	crypto_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs));
 }
 
 module_init(aria_avx_init);
 module_exit(aria_avx_exit);
 
diff --git a/arch/x86/crypto/aria_gfni_avx512_glue.c b/arch/x86/crypto/aria_gfni_avx512_glue.c
index f4a2208d26383..363cbf4399cca 100644
--- a/arch/x86/crypto/aria_gfni_avx512_glue.c
+++ b/arch/x86/crypto/aria_gfni_avx512_glue.c
@@ -4,11 +4,10 @@
  *
  * Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com>
  */
 
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <crypto/aria.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/types.h>
@@ -163,28 +162,26 @@ static int aria_avx512_init_tfm(struct crypto_skcipher *tfm)
 	return 0;
 }
 
 static struct skcipher_alg aria_algs[] = {
 	{
-		.base.cra_name		= "__ecb(aria)",
-		.base.cra_driver_name	= "__ecb-aria-avx512",
+		.base.cra_name		= "ecb(aria)",
+		.base.cra_driver_name	= "ecb-aria-avx512",
 		.base.cra_priority	= 600,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= ARIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct aria_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= ARIA_MIN_KEY_SIZE,
 		.max_keysize		= ARIA_MAX_KEY_SIZE,
 		.setkey			= aria_avx512_set_key,
 		.encrypt		= aria_avx512_ecb_encrypt,
 		.decrypt		= aria_avx512_ecb_decrypt,
 	}, {
-		.base.cra_name		= "__ctr(aria)",
-		.base.cra_driver_name	= "__ctr-aria-avx512",
+		.base.cra_name		= "ctr(aria)",
+		.base.cra_driver_name	= "ctr-aria-avx512",
 		.base.cra_priority	= 600,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL |
-					  CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE,
+		.base.cra_flags		= CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE,
 		.base.cra_blocksize	= 1,
 		.base.cra_ctxsize	= sizeof(struct aria_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= ARIA_MIN_KEY_SIZE,
 		.max_keysize		= ARIA_MAX_KEY_SIZE,
@@ -195,12 +192,10 @@ static struct skcipher_alg aria_algs[] = {
 		.decrypt		= aria_avx512_ctr_encrypt,
 		.init                   = aria_avx512_init_tfm,
 	}
 };
 
-static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
-
 static int __init aria_avx512_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -227,19 +222,16 @@ static int __init aria_avx512_init(void)
 	aria_ops.aria_ctr_crypt_32way = aria_aesni_avx2_gfni_ctr_crypt_32way;
 	aria_ops.aria_encrypt_64way = aria_gfni_avx512_encrypt_64way;
 	aria_ops.aria_decrypt_64way = aria_gfni_avx512_decrypt_64way;
 	aria_ops.aria_ctr_crypt_64way = aria_gfni_avx512_ctr_crypt_64way;
 
-	return simd_register_skciphers_compat(aria_algs,
-					      ARRAY_SIZE(aria_algs),
-					      aria_simd_algs);
+	return crypto_register_skciphers(aria_algs, ARRAY_SIZE(aria_algs));
 }
 
 static void __exit aria_avx512_exit(void)
 {
-	simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
-				  aria_simd_algs);
+	crypto_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs));
 }
 
 module_init(aria_avx512_init);
 module_exit(aria_avx512_exit);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/9] crypto: x86/camellia - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (3 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 4/9] crypto: x86/aria " Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 6/9] crypto: x86/cast " Eric Biggers
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig                    |  1 -
 arch/x86/crypto/camellia_aesni_avx2_glue.c | 21 +++++++--------------
 arch/x86/crypto/camellia_aesni_avx_glue.c  | 21 +++++++--------------
 3 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index ad00d53ab83d8..de927df89ccf1 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -61,11 +61,10 @@ config CRYPTO_CAMELLIA_X86_64
 config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
 	tristate "Ciphers: Camellia with modes: ECB, CBC (AES-NI/AVX)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
 	select CRYPTO_CAMELLIA_X86_64
-	select CRYPTO_SIMD
 	imply CRYPTO_XTS
 	help
 	  Length-preserving ciphers: Camellia with ECB and CBC modes
 
 	  Architecture: x86_64 using:
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index e7e4d64e9577e..2d2f4e16537c4 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -4,11 +4,10 @@
  *
  * Copyright © 2013 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
  */
 
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/types.h>
 
@@ -67,27 +66,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg camellia_algs[] = {
 	{
-		.base.cra_name		= "__ecb(camellia)",
-		.base.cra_driver_name	= "__ecb-camellia-aesni-avx2",
+		.base.cra_name		= "ecb(camellia)",
+		.base.cra_driver_name	= "ecb-camellia-aesni-avx2",
 		.base.cra_priority	= 500,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAMELLIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct camellia_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAMELLIA_MIN_KEY_SIZE,
 		.max_keysize		= CAMELLIA_MAX_KEY_SIZE,
 		.setkey			= camellia_setkey,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(camellia)",
-		.base.cra_driver_name	= "__cbc-camellia-aesni-avx2",
+		.base.cra_name		= "cbc(camellia)",
+		.base.cra_driver_name	= "cbc-camellia-aesni-avx2",
 		.base.cra_priority	= 500,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAMELLIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct camellia_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAMELLIA_MIN_KEY_SIZE,
 		.max_keysize		= CAMELLIA_MAX_KEY_SIZE,
@@ -96,12 +93,10 @@ static struct skcipher_alg camellia_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	},
 };
 
-static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
-
 static int __init camellia_aesni_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -116,19 +111,17 @@ static int __init camellia_aesni_init(void)
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(camellia_algs,
-					      ARRAY_SIZE(camellia_algs),
-					      camellia_simd_algs);
+	return crypto_register_skciphers(camellia_algs,
+					 ARRAY_SIZE(camellia_algs));
 }
 
 static void __exit camellia_aesni_fini(void)
 {
-	simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs),
-				  camellia_simd_algs);
+	crypto_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs));
 }
 
 module_init(camellia_aesni_init);
 module_exit(camellia_aesni_fini);
 
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index c7ccf63e741e1..a7d1623881424 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -4,11 +4,10 @@
  *
  * Copyright © 2012-2013 Jussi Kivilinna <jussi.kivilinna@iki.fi>
  */
 
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/types.h>
 
@@ -67,27 +66,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg camellia_algs[] = {
 	{
-		.base.cra_name		= "__ecb(camellia)",
-		.base.cra_driver_name	= "__ecb-camellia-aesni",
+		.base.cra_name		= "ecb(camellia)",
+		.base.cra_driver_name	= "ecb-camellia-aesni",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAMELLIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct camellia_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAMELLIA_MIN_KEY_SIZE,
 		.max_keysize		= CAMELLIA_MAX_KEY_SIZE,
 		.setkey			= camellia_setkey,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(camellia)",
-		.base.cra_driver_name	= "__cbc-camellia-aesni",
+		.base.cra_name		= "cbc(camellia)",
+		.base.cra_driver_name	= "cbc-camellia-aesni",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAMELLIA_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct camellia_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAMELLIA_MIN_KEY_SIZE,
 		.max_keysize		= CAMELLIA_MAX_KEY_SIZE,
@@ -96,12 +93,10 @@ static struct skcipher_alg camellia_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	}
 };
 
-static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)];
-
 static int __init camellia_aesni_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -115,19 +110,17 @@ static int __init camellia_aesni_init(void)
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(camellia_algs,
-					      ARRAY_SIZE(camellia_algs),
-					      camellia_simd_algs);
+	return crypto_register_skciphers(camellia_algs,
+					 ARRAY_SIZE(camellia_algs));
 }
 
 static void __exit camellia_aesni_fini(void)
 {
-	simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs),
-				  camellia_simd_algs);
+	crypto_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs));
 }
 
 module_init(camellia_aesni_init);
 module_exit(camellia_aesni_fini);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/9] crypto: x86/cast - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (4 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 5/9] crypto: x86/camellia " Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 7/9] crypto: x86/serpent " Eric Biggers
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig          |  2 --
 arch/x86/crypto/cast5_avx_glue.c | 21 +++++++--------------
 arch/x86/crypto/cast6_avx_glue.c | 20 ++++++--------------
 3 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index de927df89ccf1..55800d1ce668e 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -86,11 +86,10 @@ config CRYPTO_CAST5_AVX_X86_64
 	tristate "Ciphers: CAST5 with modes: ECB, CBC (AVX)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
 	select CRYPTO_CAST5
 	select CRYPTO_CAST_COMMON
-	select CRYPTO_SIMD
 	imply CRYPTO_CTR
 	help
 	  Length-preserving ciphers: CAST5 (CAST-128) cipher algorithm
 	  (RFC2144) with ECB and CBC modes
 
@@ -103,11 +102,10 @@ config CRYPTO_CAST6_AVX_X86_64
 	tristate "Ciphers: CAST6 with modes: ECB, CBC (AVX)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
 	select CRYPTO_CAST6
 	select CRYPTO_CAST_COMMON
-	select CRYPTO_SIMD
 	imply CRYPTO_XTS
 	imply CRYPTO_CTR
 	help
 	  Length-preserving ciphers: CAST6 (CAST-256) cipher algorithm
 	  (RFC2612) with ECB and CBC modes
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 3976a87f92ad5..3aca04d43b34a 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -6,11 +6,10 @@
  *     <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
  */
 
 #include <crypto/algapi.h>
 #include <crypto/cast5.h>
-#include <crypto/internal/simd.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/types.h>
 
@@ -62,27 +61,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg cast5_algs[] = {
 	{
-		.base.cra_name		= "__ecb(cast5)",
-		.base.cra_driver_name	= "__ecb-cast5-avx",
+		.base.cra_name		= "ecb(cast5)",
+		.base.cra_driver_name	= "ecb-cast5-avx",
 		.base.cra_priority	= 200,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAST5_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct cast5_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAST5_MIN_KEY_SIZE,
 		.max_keysize		= CAST5_MAX_KEY_SIZE,
 		.setkey			= cast5_setkey_skcipher,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(cast5)",
-		.base.cra_driver_name	= "__cbc-cast5-avx",
+		.base.cra_name		= "cbc(cast5)",
+		.base.cra_driver_name	= "cbc-cast5-avx",
 		.base.cra_priority	= 200,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAST5_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct cast5_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAST5_MIN_KEY_SIZE,
 		.max_keysize		= CAST5_MAX_KEY_SIZE,
@@ -91,31 +88,27 @@ static struct skcipher_alg cast5_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	}
 };
 
-static struct simd_skcipher_alg *cast5_simd_algs[ARRAY_SIZE(cast5_algs)];
-
 static int __init cast5_init(void)
 {
 	const char *feature_name;
 
 	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(cast5_algs,
-					      ARRAY_SIZE(cast5_algs),
-					      cast5_simd_algs);
+	return crypto_register_skciphers(cast5_algs,
+					 ARRAY_SIZE(cast5_algs));
 }
 
 static void __exit cast5_exit(void)
 {
-	simd_unregister_skciphers(cast5_algs, ARRAY_SIZE(cast5_algs),
-				  cast5_simd_algs);
+	crypto_unregister_skciphers(cast5_algs, ARRAY_SIZE(cast5_algs));
 }
 
 module_init(cast5_init);
 module_exit(cast5_exit);
 
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 7e2aea3723490..c4dd28c303036 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -12,11 +12,10 @@
 #include <linux/types.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <crypto/algapi.h>
 #include <crypto/cast6.h>
-#include <crypto/internal/simd.h>
 
 #include "ecb_cbc_helpers.h"
 
 #define CAST6_PARALLEL_BLOCKS 8
 
@@ -62,27 +61,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg cast6_algs[] = {
 	{
-		.base.cra_name		= "__ecb(cast6)",
-		.base.cra_driver_name	= "__ecb-cast6-avx",
+		.base.cra_name		= "ecb(cast6)",
+		.base.cra_driver_name	= "ecb-cast6-avx",
 		.base.cra_priority	= 200,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAST6_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct cast6_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAST6_MIN_KEY_SIZE,
 		.max_keysize		= CAST6_MAX_KEY_SIZE,
 		.setkey			= cast6_setkey_skcipher,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(cast6)",
-		.base.cra_driver_name	= "__cbc-cast6-avx",
+		.base.cra_name		= "cbc(cast6)",
+		.base.cra_driver_name	= "cbc-cast6-avx",
 		.base.cra_priority	= 200,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= CAST6_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct cast6_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= CAST6_MIN_KEY_SIZE,
 		.max_keysize		= CAST6_MAX_KEY_SIZE,
@@ -91,31 +88,26 @@ static struct skcipher_alg cast6_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	},
 };
 
-static struct simd_skcipher_alg *cast6_simd_algs[ARRAY_SIZE(cast6_algs)];
-
 static int __init cast6_init(void)
 {
 	const char *feature_name;
 
 	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(cast6_algs,
-					      ARRAY_SIZE(cast6_algs),
-					      cast6_simd_algs);
+	return crypto_register_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs));
 }
 
 static void __exit cast6_exit(void)
 {
-	simd_unregister_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs),
-				  cast6_simd_algs);
+	crypto_unregister_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs));
 }
 
 module_init(cast6_init);
 module_exit(cast6_exit);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 7/9] crypto: x86/serpent - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (5 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 6/9] crypto: x86/cast " Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 8/9] crypto: x86/sm4 " Eric Biggers
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig             |  3 ---
 arch/x86/crypto/serpent_avx2_glue.c | 21 +++++++--------------
 arch/x86/crypto/serpent_avx_glue.c  | 21 +++++++--------------
 arch/x86/crypto/serpent_sse2_glue.c | 21 +++++++--------------
 4 files changed, 21 insertions(+), 45 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 55800d1ce668e..51c74a496126d 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -132,11 +132,10 @@ config CRYPTO_DES3_EDE_X86_64
 config CRYPTO_SERPENT_SSE2_X86_64
 	tristate "Ciphers: Serpent with modes: ECB, CBC (SSE2)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SERPENT
-	select CRYPTO_SIMD
 	imply CRYPTO_CTR
 	help
 	  Length-preserving ciphers: Serpent cipher algorithm
 	  with ECB and CBC modes
 
@@ -148,11 +147,10 @@ config CRYPTO_SERPENT_SSE2_X86_64
 config CRYPTO_SERPENT_SSE2_586
 	tristate "Ciphers: Serpent with modes: ECB, CBC (32-bit with SSE2)"
 	depends on X86 && !64BIT
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SERPENT
-	select CRYPTO_SIMD
 	imply CRYPTO_CTR
 	help
 	  Length-preserving ciphers: Serpent cipher algorithm
 	  with ECB and CBC modes
 
@@ -164,11 +162,10 @@ config CRYPTO_SERPENT_SSE2_586
 config CRYPTO_SERPENT_AVX_X86_64
 	tristate "Ciphers: Serpent with modes: ECB, CBC (AVX)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
 	select CRYPTO_SERPENT
-	select CRYPTO_SIMD
 	imply CRYPTO_XTS
 	imply CRYPTO_CTR
 	help
 	  Length-preserving ciphers: Serpent cipher algorithm
 	  with ECB and CBC modes
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 347e97f4b713b..f5f2121b79567 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -8,11 +8,10 @@
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <crypto/serpent.h>
 
 #include "serpent-avx.h"
 #include "ecb_cbc_helpers.h"
 
@@ -63,27 +62,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg serpent_algs[] = {
 	{
-		.base.cra_name		= "__ecb(serpent)",
-		.base.cra_driver_name	= "__ecb-serpent-avx2",
+		.base.cra_name		= "ecb(serpent)",
+		.base.cra_driver_name	= "ecb-serpent-avx2",
 		.base.cra_priority	= 600,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= SERPENT_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct serpent_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= SERPENT_MIN_KEY_SIZE,
 		.max_keysize		= SERPENT_MAX_KEY_SIZE,
 		.setkey			= serpent_setkey_skcipher,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(serpent)",
-		.base.cra_driver_name	= "__cbc-serpent-avx2",
+		.base.cra_name		= "cbc(serpent)",
+		.base.cra_driver_name	= "cbc-serpent-avx2",
 		.base.cra_priority	= 600,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= SERPENT_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct serpent_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= SERPENT_MIN_KEY_SIZE,
 		.max_keysize		= SERPENT_MAX_KEY_SIZE,
@@ -92,12 +89,10 @@ static struct skcipher_alg serpent_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	},
 };
 
-static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
-
 static int __init serpent_avx2_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
@@ -108,19 +103,17 @@ static int __init serpent_avx2_init(void)
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(serpent_algs,
-					      ARRAY_SIZE(serpent_algs),
-					      serpent_simd_algs);
+	return crypto_register_skciphers(serpent_algs,
+					 ARRAY_SIZE(serpent_algs));
 }
 
 static void __exit serpent_avx2_fini(void)
 {
-	simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
-				  serpent_simd_algs);
+	crypto_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs));
 }
 
 module_init(serpent_avx2_init);
 module_exit(serpent_avx2_fini);
 
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 6c248e1ea4ef7..e640abc1cb8a7 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -11,11 +11,10 @@
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <crypto/serpent.h>
 
 #include "serpent-avx.h"
 #include "ecb_cbc_helpers.h"
 
@@ -69,27 +68,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg serpent_algs[] = {
 	{
-		.base.cra_name		= "__ecb(serpent)",
-		.base.cra_driver_name	= "__ecb-serpent-avx",
+		.base.cra_name		= "ecb(serpent)",
+		.base.cra_driver_name	= "ecb-serpent-avx",
 		.base.cra_priority	= 500,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= SERPENT_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct serpent_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= SERPENT_MIN_KEY_SIZE,
 		.max_keysize		= SERPENT_MAX_KEY_SIZE,
 		.setkey			= serpent_setkey_skcipher,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(serpent)",
-		.base.cra_driver_name	= "__cbc-serpent-avx",
+		.base.cra_name		= "cbc(serpent)",
+		.base.cra_driver_name	= "cbc-serpent-avx",
 		.base.cra_priority	= 500,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= SERPENT_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct serpent_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= SERPENT_MIN_KEY_SIZE,
 		.max_keysize		= SERPENT_MAX_KEY_SIZE,
@@ -98,31 +95,27 @@ static struct skcipher_alg serpent_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	},
 };
 
-static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
-
 static int __init serpent_init(void)
 {
 	const char *feature_name;
 
 	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(serpent_algs,
-					      ARRAY_SIZE(serpent_algs),
-					      serpent_simd_algs);
+	return crypto_register_skciphers(serpent_algs,
+					 ARRAY_SIZE(serpent_algs));
 }
 
 static void __exit serpent_exit(void)
 {
-	simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
-				  serpent_simd_algs);
+	crypto_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs));
 }
 
 module_init(serpent_init);
 module_exit(serpent_exit);
 
diff --git a/arch/x86/crypto/serpent_sse2_glue.c b/arch/x86/crypto/serpent_sse2_glue.c
index d78f37e9b2cf7..80ee17ec21b46 100644
--- a/arch/x86/crypto/serpent_sse2_glue.c
+++ b/arch/x86/crypto/serpent_sse2_glue.c
@@ -16,11 +16,10 @@
 #include <linux/types.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <crypto/algapi.h>
 #include <crypto/b128ops.h>
-#include <crypto/internal/simd.h>
 #include <crypto/serpent.h>
 
 #include "serpent-sse2.h"
 #include "ecb_cbc_helpers.h"
 
@@ -72,27 +71,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg serpent_algs[] = {
 	{
-		.base.cra_name		= "__ecb(serpent)",
-		.base.cra_driver_name	= "__ecb-serpent-sse2",
+		.base.cra_name		= "ecb(serpent)",
+		.base.cra_driver_name	= "ecb-serpent-sse2",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= SERPENT_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct serpent_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= SERPENT_MIN_KEY_SIZE,
 		.max_keysize		= SERPENT_MAX_KEY_SIZE,
 		.setkey			= serpent_setkey_skcipher,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(serpent)",
-		.base.cra_driver_name	= "__cbc-serpent-sse2",
+		.base.cra_name		= "cbc(serpent)",
+		.base.cra_driver_name	= "cbc-serpent-sse2",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= SERPENT_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct serpent_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= SERPENT_MIN_KEY_SIZE,
 		.max_keysize		= SERPENT_MAX_KEY_SIZE,
@@ -101,28 +98,24 @@ static struct skcipher_alg serpent_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	},
 };
 
-static struct simd_skcipher_alg *serpent_simd_algs[ARRAY_SIZE(serpent_algs)];
-
 static int __init serpent_sse2_init(void)
 {
 	if (!boot_cpu_has(X86_FEATURE_XMM2)) {
 		printk(KERN_INFO "SSE2 instructions are not detected.\n");
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(serpent_algs,
-					      ARRAY_SIZE(serpent_algs),
-					      serpent_simd_algs);
+	return crypto_register_skciphers(serpent_algs,
+					 ARRAY_SIZE(serpent_algs));
 }
 
 static void __exit serpent_sse2_exit(void)
 {
-	simd_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs),
-				  serpent_simd_algs);
+	crypto_unregister_skciphers(serpent_algs, ARRAY_SIZE(serpent_algs));
 }
 
 module_init(serpent_sse2_init);
 module_exit(serpent_sse2_exit);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 8/9] crypto: x86/sm4 - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (6 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 7/9] crypto: x86/serpent " Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  0:24 ` [PATCH v2 9/9] crypto: x86/twofish " Eric Biggers
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig               |  2 --
 arch/x86/crypto/sm4_aesni_avx2_glue.c | 31 ++++++++++-----------------
 arch/x86/crypto/sm4_aesni_avx_glue.c  | 31 ++++++++++-----------------
 3 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 51c74a496126d..afc1a05e663dd 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -190,11 +190,10 @@ config CRYPTO_SERPENT_AVX2_X86_64
 
 config CRYPTO_SM4_AESNI_AVX_X86_64
 	tristate "Ciphers: SM4 with modes: ECB, CBC, CTR (AES-NI/AVX)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	select CRYPTO_ALGAPI
 	select CRYPTO_SM4
 	help
 	  Length-preserving ciphers: SM4 cipher algorithms
 	  (OSCCA GB/T 32907-2016) with ECB, CBC, and CTR modes
@@ -211,11 +210,10 @@ config CRYPTO_SM4_AESNI_AVX_X86_64
 
 config CRYPTO_SM4_AESNI_AVX2_X86_64
 	tristate "Ciphers: SM4 with modes: ECB, CBC, CTR (AES-NI/AVX2)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	select CRYPTO_ALGAPI
 	select CRYPTO_SM4
 	select CRYPTO_SM4_AESNI_AVX_X86_64
 	help
 	  Length-preserving ciphers: SM4 cipher algorithms
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index 1148fd4cd57f8..fec0ab7a63dd4 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -6,15 +6,14 @@
  *
  * Copyright (c) 2021, Alibaba Group.
  * Copyright (c) 2021 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
  */
 
+#include <asm/fpu/api.h>
 #include <linux/module.h>
 #include <linux/crypto.h>
 #include <linux/kernel.h>
-#include <asm/simd.h>
-#include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/sm4.h>
 #include "sm4-avx.h"
 
 #define SM4_CRYPT16_BLOCK_SIZE	(SM4_BLOCK_SIZE * 16)
@@ -46,14 +45,13 @@ static int ctr_crypt(struct skcipher_request *req)
 }
 
 static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
 	{
 		.base = {
-			.cra_name		= "__ecb(sm4)",
-			.cra_driver_name	= "__ecb-sm4-aesni-avx2",
+			.cra_name		= "ecb(sm4)",
+			.cra_driver_name	= "ecb-sm4-aesni-avx2",
 			.cra_priority		= 500,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= SM4_BLOCK_SIZE,
 			.cra_ctxsize		= sizeof(struct sm4_ctx),
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= SM4_KEY_SIZE,
@@ -62,14 +60,13 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
 		.setkey		= sm4_skcipher_setkey,
 		.encrypt	= sm4_avx_ecb_encrypt,
 		.decrypt	= sm4_avx_ecb_decrypt,
 	}, {
 		.base = {
-			.cra_name		= "__cbc(sm4)",
-			.cra_driver_name	= "__cbc-sm4-aesni-avx2",
+			.cra_name		= "cbc(sm4)",
+			.cra_driver_name	= "cbc-sm4-aesni-avx2",
 			.cra_priority		= 500,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= SM4_BLOCK_SIZE,
 			.cra_ctxsize		= sizeof(struct sm4_ctx),
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= SM4_KEY_SIZE,
@@ -79,14 +76,13 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
 		.setkey		= sm4_skcipher_setkey,
 		.encrypt	= sm4_cbc_encrypt,
 		.decrypt	= cbc_decrypt,
 	}, {
 		.base = {
-			.cra_name		= "__ctr(sm4)",
-			.cra_driver_name	= "__ctr-sm4-aesni-avx2",
+			.cra_name		= "ctr(sm4)",
+			.cra_driver_name	= "ctr-sm4-aesni-avx2",
 			.cra_priority		= 500,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= 1,
 			.cra_ctxsize		= sizeof(struct sm4_ctx),
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= SM4_KEY_SIZE,
@@ -98,13 +94,10 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
 		.encrypt	= ctr_crypt,
 		.decrypt	= ctr_crypt,
 	}
 };
 
-static struct simd_skcipher_alg *
-simd_sm4_aesni_avx2_skciphers[ARRAY_SIZE(sm4_aesni_avx2_skciphers)];
-
 static int __init sm4_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -119,20 +112,18 @@ static int __init sm4_init(void)
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(sm4_aesni_avx2_skciphers,
-					ARRAY_SIZE(sm4_aesni_avx2_skciphers),
-					simd_sm4_aesni_avx2_skciphers);
+	return crypto_register_skciphers(sm4_aesni_avx2_skciphers,
+					 ARRAY_SIZE(sm4_aesni_avx2_skciphers));
 }
 
 static void __exit sm4_exit(void)
 {
-	simd_unregister_skciphers(sm4_aesni_avx2_skciphers,
-				ARRAY_SIZE(sm4_aesni_avx2_skciphers),
-				simd_sm4_aesni_avx2_skciphers);
+	crypto_unregister_skciphers(sm4_aesni_avx2_skciphers,
+				    ARRAY_SIZE(sm4_aesni_avx2_skciphers));
 }
 
 module_init(sm4_init);
 module_exit(sm4_exit);
 
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 85b4ca78b47b5..72867fc49ce8e 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -6,15 +6,14 @@
  *
  * Copyright (c) 2021, Alibaba Group.
  * Copyright (c) 2021 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
  */
 
+#include <asm/fpu/api.h>
 #include <linux/module.h>
 #include <linux/crypto.h>
 #include <linux/kernel.h>
-#include <asm/simd.h>
-#include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/sm4.h>
 #include "sm4-avx.h"
 
 #define SM4_CRYPT8_BLOCK_SIZE	(SM4_BLOCK_SIZE * 8)
@@ -261,14 +260,13 @@ static int ctr_crypt(struct skcipher_request *req)
 }
 
 static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
 	{
 		.base = {
-			.cra_name		= "__ecb(sm4)",
-			.cra_driver_name	= "__ecb-sm4-aesni-avx",
+			.cra_name		= "ecb(sm4)",
+			.cra_driver_name	= "ecb-sm4-aesni-avx",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= SM4_BLOCK_SIZE,
 			.cra_ctxsize		= sizeof(struct sm4_ctx),
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= SM4_KEY_SIZE,
@@ -277,14 +275,13 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
 		.setkey		= sm4_skcipher_setkey,
 		.encrypt	= sm4_avx_ecb_encrypt,
 		.decrypt	= sm4_avx_ecb_decrypt,
 	}, {
 		.base = {
-			.cra_name		= "__cbc(sm4)",
-			.cra_driver_name	= "__cbc-sm4-aesni-avx",
+			.cra_name		= "cbc(sm4)",
+			.cra_driver_name	= "cbc-sm4-aesni-avx",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= SM4_BLOCK_SIZE,
 			.cra_ctxsize		= sizeof(struct sm4_ctx),
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= SM4_KEY_SIZE,
@@ -294,14 +291,13 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
 		.setkey		= sm4_skcipher_setkey,
 		.encrypt	= sm4_cbc_encrypt,
 		.decrypt	= cbc_decrypt,
 	}, {
 		.base = {
-			.cra_name		= "__ctr(sm4)",
-			.cra_driver_name	= "__ctr-sm4-aesni-avx",
+			.cra_name		= "ctr(sm4)",
+			.cra_driver_name	= "ctr-sm4-aesni-avx",
 			.cra_priority		= 400,
-			.cra_flags		= CRYPTO_ALG_INTERNAL,
 			.cra_blocksize		= 1,
 			.cra_ctxsize		= sizeof(struct sm4_ctx),
 			.cra_module		= THIS_MODULE,
 		},
 		.min_keysize	= SM4_KEY_SIZE,
@@ -313,13 +309,10 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
 		.encrypt	= ctr_crypt,
 		.decrypt	= ctr_crypt,
 	}
 };
 
-static struct simd_skcipher_alg *
-simd_sm4_aesni_avx_skciphers[ARRAY_SIZE(sm4_aesni_avx_skciphers)];
-
 static int __init sm4_init(void)
 {
 	const char *feature_name;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX) ||
@@ -333,20 +326,18 @@ static int __init sm4_init(void)
 				&feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(sm4_aesni_avx_skciphers,
-					ARRAY_SIZE(sm4_aesni_avx_skciphers),
-					simd_sm4_aesni_avx_skciphers);
+	return crypto_register_skciphers(sm4_aesni_avx_skciphers,
+					 ARRAY_SIZE(sm4_aesni_avx_skciphers));
 }
 
 static void __exit sm4_exit(void)
 {
-	simd_unregister_skciphers(sm4_aesni_avx_skciphers,
-					ARRAY_SIZE(sm4_aesni_avx_skciphers),
-					simd_sm4_aesni_avx_skciphers);
+	crypto_unregister_skciphers(sm4_aesni_avx_skciphers,
+				    ARRAY_SIZE(sm4_aesni_avx_skciphers));
 }
 
 module_init(sm4_init);
 module_exit(sm4_exit);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 9/9] crypto: x86/twofish - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (7 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 8/9] crypto: x86/sm4 " Eric Biggers
@ 2025-04-02  0:24 ` Eric Biggers
  2025-04-02  3:14 ` [PATCH v2 0/9] crypto: x86 " Herbert Xu
  2025-04-07  5:25 ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu
  10 siblings, 0 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-02  0:24 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-kernel, x86

From: Eric Biggers <ebiggers@google.com>

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Kconfig            |  1 -
 arch/x86/crypto/twofish_avx_glue.c | 21 +++++++--------------
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index afc1a05e663dd..f9e46e83440f1 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -268,11 +268,10 @@ config CRYPTO_TWOFISH_X86_64_3WAY
 
 config CRYPTO_TWOFISH_AVX_X86_64
 	tristate "Ciphers: Twofish with modes: ECB, CBC (AVX)"
 	depends on X86 && 64BIT
 	select CRYPTO_SKCIPHER
-	select CRYPTO_SIMD
 	select CRYPTO_TWOFISH_COMMON
 	select CRYPTO_TWOFISH_X86_64
 	select CRYPTO_TWOFISH_X86_64_3WAY
 	imply CRYPTO_XTS
 	help
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 3eb3440b477a8..9e20db0137501 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -11,11 +11,10 @@
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/crypto.h>
 #include <linux/err.h>
 #include <crypto/algapi.h>
-#include <crypto/internal/simd.h>
 #include <crypto/twofish.h>
 
 #include "twofish.h"
 #include "ecb_cbc_helpers.h"
 
@@ -72,27 +71,25 @@ static int cbc_decrypt(struct skcipher_request *req)
 	CBC_WALK_END();
 }
 
 static struct skcipher_alg twofish_algs[] = {
 	{
-		.base.cra_name		= "__ecb(twofish)",
-		.base.cra_driver_name	= "__ecb-twofish-avx",
+		.base.cra_name		= "ecb(twofish)",
+		.base.cra_driver_name	= "ecb-twofish-avx",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= TF_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct twofish_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= TF_MIN_KEY_SIZE,
 		.max_keysize		= TF_MAX_KEY_SIZE,
 		.setkey			= twofish_setkey_skcipher,
 		.encrypt		= ecb_encrypt,
 		.decrypt		= ecb_decrypt,
 	}, {
-		.base.cra_name		= "__cbc(twofish)",
-		.base.cra_driver_name	= "__cbc-twofish-avx",
+		.base.cra_name		= "cbc(twofish)",
+		.base.cra_driver_name	= "cbc-twofish-avx",
 		.base.cra_priority	= 400,
-		.base.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.base.cra_blocksize	= TF_BLOCK_SIZE,
 		.base.cra_ctxsize	= sizeof(struct twofish_ctx),
 		.base.cra_module	= THIS_MODULE,
 		.min_keysize		= TF_MIN_KEY_SIZE,
 		.max_keysize		= TF_MAX_KEY_SIZE,
@@ -101,30 +98,26 @@ static struct skcipher_alg twofish_algs[] = {
 		.encrypt		= cbc_encrypt,
 		.decrypt		= cbc_decrypt,
 	},
 };
 
-static struct simd_skcipher_alg *twofish_simd_algs[ARRAY_SIZE(twofish_algs)];
-
 static int __init twofish_init(void)
 {
 	const char *feature_name;
 
 	if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
 		return -ENODEV;
 	}
 
-	return simd_register_skciphers_compat(twofish_algs,
-					      ARRAY_SIZE(twofish_algs),
-					      twofish_simd_algs);
+	return crypto_register_skciphers(twofish_algs,
+					 ARRAY_SIZE(twofish_algs));
 }
 
 static void __exit twofish_exit(void)
 {
-	simd_unregister_skciphers(twofish_algs, ARRAY_SIZE(twofish_algs),
-				  twofish_simd_algs);
+	crypto_unregister_skciphers(twofish_algs, ARRAY_SIZE(twofish_algs));
 }
 
 module_init(twofish_init);
 module_exit(twofish_exit);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (8 preceding siblings ...)
  2025-04-02  0:24 ` [PATCH v2 9/9] crypto: x86/twofish " Eric Biggers
@ 2025-04-02  3:14 ` Herbert Xu
  2025-04-02  6:31   ` Ard Biesheuvel
  2025-04-07  5:25 ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu
  10 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2025-04-02  3:14 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, linux-kernel, x86, Ard Biesheuvel, Michael Ellerman,
	Danny Tsen

Eric Biggers <ebiggers@kernel.org> wrote:
>
> Stop wrapping skcipher and aead algorithms with the crypto simd helper
> (crypto/simd.c).  The only purpose of doing so was to work around x86
> not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> hardirq interrupted a task context kernel-mode FPU section and then a
> softirqs were run at the end of that hardirq, those softirqs could not
> use kernel-mode FPU.  This has now been fixed.  In combination with the
> fact that the skcipher and aead APIs only support task and softirq
> contexts, these can now just use kernel-mode FPU unconditionally on x86.

Nice work!

So which platform still needs the simd wrapper? I believe arm/arm64
have both been fixed but we haven't finished removing the legacy
simd code yet? Ard, would you be able to spare some cycles and
finish the removal of simd on arm?

Darn, it looks like powerpc has just started using the simd wrapper
so we need to fix it first before we can completely eliminate simd.

Michael/Danny, any chance you guys could implement something similar
to what's been done on arm/x86 and make simd usable in softirqs?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02  3:14 ` [PATCH v2 0/9] crypto: x86 " Herbert Xu
@ 2025-04-02  6:31   ` Ard Biesheuvel
  2025-04-02  6:34     ` Ard Biesheuvel
  0 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2025-04-02  6:31 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Biggers, linux-crypto, linux-kernel, x86

On Wed, 2 Apr 2025 at 06:14, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Stop wrapping skcipher and aead algorithms with the crypto simd helper
> > (crypto/simd.c).  The only purpose of doing so was to work around x86
> > not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> > hardirq interrupted a task context kernel-mode FPU section and then a
> > softirqs were run at the end of that hardirq, those softirqs could not
> > use kernel-mode FPU.  This has now been fixed.  In combination with the
> > fact that the skcipher and aead APIs only support task and softirq
> > contexts, these can now just use kernel-mode FPU unconditionally on x86.
>
> Nice work!
>

Yeah good riddance.

> So which platform still needs the simd wrapper? I believe arm/arm64
> have both been fixed but we haven't finished removing the legacy
> simd code yet? Ard, would you be able to spare some cycles and
> finish the removal of simd on arm?
>

Removal of what, exactly?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02  6:31   ` Ard Biesheuvel
@ 2025-04-02  6:34     ` Ard Biesheuvel
  2025-04-02  8:22       ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2025-04-02  6:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Biggers, linux-crypto, linux-kernel, x86

On Wed, 2 Apr 2025 at 09:31, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 2 Apr 2025 at 06:14, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> > Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > Stop wrapping skcipher and aead algorithms with the crypto simd helper
> > > (crypto/simd.c).  The only purpose of doing so was to work around x86
> > > not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> > > hardirq interrupted a task context kernel-mode FPU section and then a
> > > softirqs were run at the end of that hardirq, those softirqs could not
> > > use kernel-mode FPU.  This has now been fixed.  In combination with the
> > > fact that the skcipher and aead APIs only support task and softirq
> > > contexts, these can now just use kernel-mode FPU unconditionally on x86.
> >
> > Nice work!
> >
>
> Yeah good riddance.
>
> > So which platform still needs the simd wrapper? I believe arm/arm64
> > have both been fixed but we haven't finished removing the legacy
> > simd code yet? Ard, would you be able to spare some cycles and
> > finish the removal of simd on arm?
> >
>
> Removal of what, exactly?

Ah, never mind - I see some calls on 32-bit ARM to
simd_skcipher_create_compat(), which have become redundant now that
SIMD is guaranteed to be available in softirq context.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02  6:34     ` Ard Biesheuvel
@ 2025-04-02  8:22       ` Herbert Xu
  2025-04-02 17:19         ` Eric Biggers
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2025-04-02  8:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Eric Biggers, linux-crypto, linux-kernel, x86, Jason A. Donenfeld

On Wed, Apr 02, 2025 at 09:34:30AM +0300, Ard Biesheuvel wrote:
>
> Ah, never mind - I see some calls on 32-bit ARM to
> simd_skcipher_create_compat(), which have become redundant now that
> SIMD is guaranteed to be available in softirq context.

Thanks!

We could also remove all the calls to crypto_simd_usable in the
Crypto API hashing code, e.g., arch/arm64/crypto/sha1-ce-glue.c.

For the lib/crypto code I think we should make it a rule to
not allow any hardirq usage just like the Crypto API.  Does
anyone know of any uses of lib/crypto in a hardirq?

I thought /dev/random might do that but it looks like Jason has
fixed it so that crypto code is no longer used in hardirqs:

commit e3e33fc2ea7fcefd0d761db9d6219f83b4248f5c
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Fri May 6 18:30:51 2022 +0200

    random: do not use input pool from hard IRQs

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02  8:22       ` Herbert Xu
@ 2025-04-02 17:19         ` Eric Biggers
  2025-04-03  1:25           ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Biggers @ 2025-04-02 17:19 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld

On Wed, Apr 02, 2025 at 04:22:21PM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 09:34:30AM +0300, Ard Biesheuvel wrote:
> >
> > Ah, never mind - I see some calls on 32-bit ARM to
> > simd_skcipher_create_compat(), which have become redundant now that
> > SIMD is guaranteed to be available in softirq context.
> 
> Thanks!
> 
> We could also remove all the calls to crypto_simd_usable in the
> Crypto API hashing code, e.g., arch/arm64/crypto/sha1-ce-glue.c.
> 
> For the lib/crypto code I think we should make it a rule to
> not allow any hardirq usage just like the Crypto API.  Does
> anyone know of any uses of lib/crypto in a hardirq?

This seems premature.  crypto_shash is documented to be usable in any context.
See the "Context:" comments in include/crypto/hash.h.  Similarly, developers
expect lib/ functions to be available in any context unless otherwise
documented.

For skcipher and aead, there are more reasons why it makes sense to limit the
contexts:

- skcipher_walk_first() already explicitly errors out if in_hardirq(), which
  already prevents them from working in hardirq context in most cases
- Even if it was allowed, the skcipher and aead APIs are already difficult to
  use correctly in a hardirq
- Because of how the crypto API is designed, it's not straightforward to fall
  back to generic skcipher and aead code in no-SIMD contexts

I could see the limitation being brought into crypto_shash too, though the
crypto_shash documentation will need to be updated.  The crypto API also really
needs to be explicitly checking all its requirements.  (It's probably finally
time to add a kconfig option like CONFIG_DEBUG_CRYPTO to lib/Kconfig.debug, and
put the extra assertions under there.  Then they could be added without
impacting performance for normal users.)

IMO, doing it for lib/ too would be going too far though.  The lib/ functions
should be easy to use and not have random requirements on the calling context.
And since they're just functions, it's easy for them to fall back to the generic
functions when needed.  Also note that for very short inputs it can actually be
faster to use no-SIMD code, as that avoids the overhead of a kernel-mode SIMD
section.  So the fallback sometimes exists anyway for that.

- Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02 17:19         ` Eric Biggers
@ 2025-04-03  1:25           ` Herbert Xu
  2025-04-03  2:14             ` Eric Biggers
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2025-04-03  1:25 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld

On Wed, Apr 02, 2025 at 10:19:30AM -0700, Eric Biggers wrote:
>
> This seems premature.  crypto_shash is documented to be usable in any context.
> See the "Context:" comments in include/crypto/hash.h.  Similarly, developers
> expect lib/ functions to be available in any context unless otherwise
> documented.

Doing slow computations in a hard IRQ is a bad idea.  The whole
point of a hard IRQ handler is to set a flag and defer everything
to a different context.

Please show me one good reason why we should allow crypto in
a hard IRQ.
 
> IMO, doing it for lib/ too would be going too far though.  The lib/ functions
> should be easy to use and not have random requirements on the calling context.
> And since they're just functions, it's easy for them to fall back to the generic
> functions when needed.  Also note that for very short inputs it can actually be
> faster to use no-SIMD code, as that avoids the overhead of a kernel-mode SIMD
> section.  So the fallback sometimes exists anyway for that.

We already disallow SIMD in hard IRQs anyway (may_use_simd is
always false in that context).  The only thing you could use
is the generic implementation.

So making this change in lib/crypto does not take any functionality
away.  You could still invoke the generic lib/crypto code directly.

It does mean that we take away a completely useless check for
people who are actually doing crypto because crypto work should
never be done in a hard IRQ.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-03  1:25           ` Herbert Xu
@ 2025-04-03  2:14             ` Eric Biggers
  2025-04-03  2:33               ` [PATCH] crypto: hash - Do not use shash in hard IRQs Herbert Xu
  2025-04-03  2:56               ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-03  2:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld

On Thu, Apr 03, 2025 at 09:25:37AM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 10:19:30AM -0700, Eric Biggers wrote:
> >
> > This seems premature.  crypto_shash is documented to be usable in any context.
> > See the "Context:" comments in include/crypto/hash.h.  Similarly, developers
> > expect lib/ functions to be available in any context unless otherwise
> > documented.
> 
> Doing slow computations in a hard IRQ is a bad idea.  The whole
> point of a hard IRQ handler is to set a flag and defer everything
> to a different context.
> 
> Please show me one good reason why we should allow crypto in
> a hard IRQ.
>  
> > IMO, doing it for lib/ too would be going too far though.  The lib/ functions
> > should be easy to use and not have random requirements on the calling context.
> > And since they're just functions, it's easy for them to fall back to the generic
> > functions when needed.  Also note that for very short inputs it can actually be
> > faster to use no-SIMD code, as that avoids the overhead of a kernel-mode SIMD
> > section.  So the fallback sometimes exists anyway for that.
> 
> We already disallow SIMD in hard IRQs anyway (may_use_simd is
> always false in that context).  The only thing you could use
> is the generic implementation.
> 
> So making this change in lib/crypto does not take any functionality
> away.  You could still invoke the generic lib/crypto code directly.
> 
> It does mean that we take away a completely useless check for
> people who are actually doing crypto because crypto work should
> never be done in a hard IRQ.

It's not the 90s anymore.  Crypto is fast now, and used ubiquitously.

And "crypto" doesn't necessarily mean a large operation.  It can be hashing just
a few bytes of data, for example.

Also as you know, the crypto API includes some non-cryptographic algorithms too.

BTW, x86 does allow SIMD in hardirq context in some cases.

Certainly agreed that crypto in hardirqs is something to be avoided in general,
though.

So maybe your proposal is okay, if it's done properly.

The thing I actually have more of a problem with is that you tend to start
making random API changes without any of the necessary prerequisites like
updating documentation, or adding debug assertions to catch violations of new
requirements.  You've already started removing the fallbacks from shash (commit
3846c01d42526bc31), but neither of those things have been done.  So we're
currently in a weird state where the shash API is explicitly documented to work
in all contexts, but you've broken that.

- Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] crypto: hash - Do not use shash in hard IRQs
  2025-04-03  2:14             ` Eric Biggers
@ 2025-04-03  2:33               ` Herbert Xu
  2025-04-03  2:56               ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2025-04-03  2:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld

On Thu, Apr 03, 2025 at 02:14:53AM +0000, Eric Biggers wrote:
>
> The thing I actually have more of a problem with is that you tend to start
> making random API changes without any of the necessary prerequisites like
> updating documentation, or adding debug assertions to catch violations of new
> requirements.  You've already started removing the fallbacks from shash (commit
> 3846c01d42526bc31), but neither of those things have been done.  So we're
> currently in a weird state where the shash API is explicitly documented to work
> in all contexts, but you've broken that.

The documentation is easy enough to fix.

---8<---
Update the documentation to be consistent with the fact that shash
may not be used in hard IRQs.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 58f9d3c9d006..5fde27039a06 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -847,7 +847,7 @@ static inline void *shash_desc_ctx(struct shash_desc *desc)
  * cipher handle must point to a keyed message digest cipher in order for this
  * function to succeed.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the setting of the key was successful; < 0 if an error occurred
  */
 int crypto_shash_setkey(struct crypto_shash *tfm, const u8 *key,
@@ -864,7 +864,7 @@ int crypto_shash_setkey(struct crypto_shash *tfm, const u8 *key,
  * crypto_shash_update and crypto_shash_final. The parameters have the same
  * meaning as discussed for those separate three functions.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the message digest creation was successful; < 0 if an error
  *	   occurred
  */
@@ -884,7 +884,7 @@ int crypto_shash_digest(struct shash_desc *desc, const u8 *data,
  * directly, and it allocates a hash descriptor on the stack internally.
  * Note that this stack allocation may be fairly large.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 on success; < 0 if an error occurred.
  */
 int crypto_shash_tfm_digest(struct crypto_shash *tfm, const u8 *data,
@@ -902,7 +902,7 @@ int crypto_hash_digest(struct crypto_ahash *tfm, const u8 *data,
  * caller-allocated output buffer out which must have sufficient size (e.g. by
  * calling crypto_shash_descsize).
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the export creation was successful; < 0 if an error occurred
  */
 int crypto_shash_export(struct shash_desc *desc, void *out);
@@ -916,7 +916,7 @@ int crypto_shash_export(struct shash_desc *desc, void *out);
  * the input buffer. That buffer should have been generated with the
  * crypto_ahash_export function.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the import was successful; < 0 if an error occurred
  */
 int crypto_shash_import(struct shash_desc *desc, const void *in);
@@ -929,7 +929,7 @@ int crypto_shash_import(struct shash_desc *desc, const void *in);
  * operational state handle. Any potentially existing state created by
  * previous operations is discarded.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the message digest initialization was successful; < 0 if an
  *	   error occurred
  */
@@ -951,7 +951,7 @@ static inline int crypto_shash_init(struct shash_desc *desc)
  *
  * Updates the message digest state of the operational state handle.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the message digest update was successful; < 0 if an error
  *	   occurred
  */
@@ -968,7 +968,7 @@ int crypto_shash_update(struct shash_desc *desc, const u8 *data,
  * into the output buffer. The caller must ensure that the output buffer is
  * large enough by using crypto_shash_digestsize.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the message digest creation was successful; < 0 if an error
  *	   occurred
  */
@@ -985,7 +985,7 @@ int crypto_shash_final(struct shash_desc *desc, u8 *out);
  * crypto_shash_update and crypto_shash_final. The parameters have the same
  * meaning as discussed for those separate functions.
  *
- * Context: Any context.
+ * Context: Softirq or process context.
  * Return: 0 if the message digest creation was successful; < 0 if an error
  *	   occurred
  */
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-03  2:14             ` Eric Biggers
  2025-04-03  2:33               ` [PATCH] crypto: hash - Do not use shash in hard IRQs Herbert Xu
@ 2025-04-03  2:56               ` Herbert Xu
  2025-04-03  3:20                 ` Eric Biggers
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2025-04-03  2:56 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld

On Thu, Apr 03, 2025 at 02:14:53AM +0000, Eric Biggers wrote:
>
> It's not the 90s anymore.  Crypto is fast now, and used ubiquitously.

I have to say that you've done a great job in improving crypto
performance on x86 and I'm very pleased with being able to
encrypt 256 bytes in just over 100 CPU cycles and doing a
whole page takes less than 1000 cycles.

But this is only possible with SIMD instructions which we do not
support in hard IRQ context.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-03  2:56               ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu
@ 2025-04-03  3:20                 ` Eric Biggers
  2025-04-03  3:42                   ` Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper) Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Biggers @ 2025-04-03  3:20 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld

On Thu, Apr 03, 2025 at 10:56:35AM +0800, Herbert Xu wrote:
> On Thu, Apr 03, 2025 at 02:14:53AM +0000, Eric Biggers wrote:
> >
> > It's not the 90s anymore.  Crypto is fast now, and used ubiquitously.
> 
> I have to say that you've done a great job in improving crypto
> performance on x86 and I'm very pleased with being able to
> encrypt 256 bytes in just over 100 CPU cycles and doing a
> whole page takes less than 1000 cycles.
> 
> But this is only possible with SIMD instructions which we do not
> support in hard IRQ context.
> 

What?  Take a look at siphash_1u32(), for example.  That is crypto, and it is
fast.  It doesn't use, or need to use, SIMD instructions.

Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
but they could be.  The CRC code already uses scalar carryless multiplication.)

Obviously, it's also very common to really need the SIMD unit.  That's the way
it is.  But those are not all cases.

Also, as I said already, x86 does support SIMD instructions in hardirq context
in some cases.  Whether anyone actually uses that, I don't know, but it is
explicitly supported.  Check out irq_fpu_usable().

- Eric


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper)
  2025-04-03  3:20                 ` Eric Biggers
@ 2025-04-03  3:42                   ` Herbert Xu
  2025-04-03  3:59                     ` Eric Biggers
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2025-04-03  3:42 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld, Linus Torvalds

On Wed, Apr 02, 2025 at 08:20:08PM -0700, Eric Biggers wrote:
>
> Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
> but they could be.  The CRC code already uses scalar carryless multiplication.)

It still doesn't mean that it's a good idea to use AES in a
hard IRQ handler, especially if the code is meant to be portable.

> Also, as I said already, x86 does support SIMD instructions in hardirq context
> in some cases.  Whether anyone actually uses that, I don't know, but it is
> explicitly supported.  Check out irq_fpu_usable().

This is more of an accident than some deliberate strategy of
supporting FPU usage in hard IRQs.  This test was initially
added for aesni:

commit 54b6a1bd5364aca95cd6ffae00f2b64c6511122c
Author: Ying Huang <huang.ying.caritas@gmail.com>
Date:   Sun Jan 18 16:28:34 2009 +1100

    crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform

It was then improved by:

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Feb 13 13:56:14 2012 -0800

    i387: make irq_fpu_usable() tests more robust
    
    Some code - especially the crypto layer - wants to use the x86
    FP/MMX/AVX register set in what may be interrupt (typically softirq)
    context.

At no point was there any intention of using this in a hardirq
context.

Until such a time when you have a valid application for using
lib/crypto code in a hardirq context, I don't think we should
be supporting that at the expense of real users who are in
process/softirq context only.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper)
  2025-04-03  3:42                   ` Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper) Herbert Xu
@ 2025-04-03  3:59                     ` Eric Biggers
  2025-04-03  4:14                       ` [PATCH] crypto: x86/chacha - Remove SIMD fallback path Herbert Xu
  2025-04-03  7:03                       ` Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper) Ard Biesheuvel
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Biggers @ 2025-04-03  3:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld, Linus Torvalds

On Thu, Apr 03, 2025 at 11:42:34AM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 08:20:08PM -0700, Eric Biggers wrote:
> >
> > Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
> > but they could be.  The CRC code already uses scalar carryless multiplication.)
> 
> It still doesn't mean that it's a good idea to use AES in a
> hard IRQ handler, especially if the code is meant to be portable.
> 
> > Also, as I said already, x86 does support SIMD instructions in hardirq context
> > in some cases.  Whether anyone actually uses that, I don't know, but it is
> > explicitly supported.  Check out irq_fpu_usable().
> 
> This is more of an accident than some deliberate strategy of
> supporting FPU usage in hard IRQs.  This test was initially
> added for aesni:
> 
> commit 54b6a1bd5364aca95cd6ffae00f2b64c6511122c
> Author: Ying Huang <huang.ying.caritas@gmail.com>
> Date:   Sun Jan 18 16:28:34 2009 +1100
> 
>     crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform
> 
> It was then improved by:
> 
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Mon Feb 13 13:56:14 2012 -0800
> 
>     i387: make irq_fpu_usable() tests more robust
>     
>     Some code - especially the crypto layer - wants to use the x86
>     FP/MMX/AVX register set in what may be interrupt (typically softirq)
>     context.
> 
> At no point was there any intention of using this in a hardirq
> context.
> 
> Until such a time when you have a valid application for using
> lib/crypto code in a hardirq context, I don't think we should
> be supporting that at the expense of real users who are in
> process/softirq context only.

Whatever.  We agree that "crypto in hardirq" is not a good idea in general.  I'm
just pointing out that there are certain cases, like SipHash used in a hash
table, where it easily could happen and would be fine.  And all the shash and
crypto library functions currently work in any context, unlike e.g. skcipher and
aead which do not.  You seem to be trying to claim that it was never supported,
but that is incorrect.  Making it unsupported would be a change that needs to be
properly documented (the functions would no longer be simply "Any context")
*and* have proper debug assertions added to enforce it and prevent usage errors.
But in a lot of cases there is also no reason to even add that restriction.  I'm
not sure why you're so eager to make the library functions harder to use.

- Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] crypto: x86/chacha - Remove SIMD fallback path
  2025-04-03  3:59                     ` Eric Biggers
@ 2025-04-03  4:14                       ` Herbert Xu
  2025-04-07 16:48                         ` Eric Biggers
  2025-04-03  7:03                       ` Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper) Ard Biesheuvel
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2025-04-03  4:14 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld, Linus Torvalds

On Wed, Apr 02, 2025 at 08:59:34PM -0700, Eric Biggers wrote:
>
> But in a lot of cases there is also no reason to even add that restriction.  I'm
> not sure why you're so eager to make the library functions harder to use.

I have no intention of making any changes to siphash.  It doesn't
even use SIMD.

All I want to do is get rid of the crypto_simd_usable() fallback
paths that we currently have in arch/x86/crypto.  This code is
never used in hardirq context (and should never be).

For example:

---8<---
Get rid of the fallback path as SIMD is now always usable in softirq
context.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 8bb74a272879..6a3d60cf3192 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -6,9 +6,7 @@
  * Copyright (C) 2015 Martin Willi
  */
 
-#include <crypto/algapi.h>
 #include <crypto/internal/chacha.h>
-#include <crypto/internal/simd.h>
 #include <crypto/internal/skcipher.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -35,7 +33,6 @@ asmlinkage void chacha_4block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
 asmlinkage void chacha_8block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
 					   unsigned int len, int nrounds);
 
-static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_simd);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_avx2);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_avx512vl);
 
@@ -123,23 +120,15 @@ static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src,
 
 void hchacha_block_arch(const u32 *state, u32 *stream, int nrounds)
 {
-	if (!static_branch_likely(&chacha_use_simd) || !crypto_simd_usable()) {
-		hchacha_block_generic(state, stream, nrounds);
-	} else {
-		kernel_fpu_begin();
-		hchacha_block_ssse3(state, stream, nrounds);
-		kernel_fpu_end();
-	}
+	kernel_fpu_begin();
+	hchacha_block_ssse3(state, stream, nrounds);
+	kernel_fpu_end();
 }
 EXPORT_SYMBOL(hchacha_block_arch);
 
 void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
 		       int nrounds)
 {
-	if (!static_branch_likely(&chacha_use_simd) || !crypto_simd_usable() ||
-	    bytes <= CHACHA_BLOCK_SIZE)
-		return chacha_crypt_generic(state, dst, src, bytes, nrounds);
-
 	do {
 		unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
 
@@ -171,18 +160,11 @@ static int chacha_simd_stream_xor(struct skcipher_request *req,
 		if (nbytes < walk.total)
 			nbytes = round_down(nbytes, walk.stride);
 
-		if (!static_branch_likely(&chacha_use_simd) ||
-		    !crypto_simd_usable()) {
-			chacha_crypt_generic(state, walk.dst.virt.addr,
-					     walk.src.virt.addr, nbytes,
-					     ctx->nrounds);
-		} else {
-			kernel_fpu_begin();
-			chacha_dosimd(state, walk.dst.virt.addr,
-				      walk.src.virt.addr, nbytes,
-				      ctx->nrounds);
-			kernel_fpu_end();
-		}
+		kernel_fpu_begin();
+		chacha_dosimd(state, walk.dst.virt.addr,
+			      walk.src.virt.addr, nbytes,
+			      ctx->nrounds);
+		kernel_fpu_end();
 		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
 	}
 
@@ -207,13 +189,9 @@ static int xchacha_simd(struct skcipher_request *req)
 
 	chacha_init(state, ctx->key, req->iv);
 
-	if (req->cryptlen > CHACHA_BLOCK_SIZE && crypto_simd_usable()) {
-		kernel_fpu_begin();
-		hchacha_block_ssse3(state, subctx.key, ctx->nrounds);
-		kernel_fpu_end();
-	} else {
-		hchacha_block_generic(state, subctx.key, ctx->nrounds);
-	}
+	kernel_fpu_begin();
+	hchacha_block_ssse3(state, subctx.key, ctx->nrounds);
+	kernel_fpu_end();
 	subctx.nrounds = ctx->nrounds;
 
 	memcpy(&real_iv[0], req->iv + 24, 8);
@@ -275,8 +253,6 @@ static int __init chacha_simd_mod_init(void)
 	if (!boot_cpu_has(X86_FEATURE_SSSE3))
 		return 0;
 
-	static_branch_enable(&chacha_use_simd);
-
 	if (boot_cpu_has(X86_FEATURE_AVX) &&
 	    boot_cpu_has(X86_FEATURE_AVX2) &&
 	    cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper)
  2025-04-03  3:59                     ` Eric Biggers
  2025-04-03  4:14                       ` [PATCH] crypto: x86/chacha - Remove SIMD fallback path Herbert Xu
@ 2025-04-03  7:03                       ` Ard Biesheuvel
  1 sibling, 0 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2025-04-03  7:03 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, linux-crypto, linux-kernel, x86, Jason A. Donenfeld,
	Linus Torvalds

On Thu, 3 Apr 2025 at 06:59, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Thu, Apr 03, 2025 at 11:42:34AM +0800, Herbert Xu wrote:
> > On Wed, Apr 02, 2025 at 08:20:08PM -0700, Eric Biggers wrote:
> > >
> > > Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
> > > but they could be.  The CRC code already uses scalar carryless multiplication.)
> >
> > It still doesn't mean that it's a good idea to use AES in a
> > hard IRQ handler, especially if the code is meant to be portable.
> >
> > > Also, as I said already, x86 does support SIMD instructions in hardirq context
> > > in some cases.  Whether anyone actually uses that, I don't know, but it is
> > > explicitly supported.  Check out irq_fpu_usable().
> >
> > This is more of an accident than some deliberate strategy of
> > supporting FPU usage in hard IRQs.  This test was initially
> > added for aesni:
> >
> > commit 54b6a1bd5364aca95cd6ffae00f2b64c6511122c
> > Author: Ying Huang <huang.ying.caritas@gmail.com>
> > Date:   Sun Jan 18 16:28:34 2009 +1100
> >
> >     crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform
> >
> > It was then improved by:
> >
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Mon Feb 13 13:56:14 2012 -0800
> >
> >     i387: make irq_fpu_usable() tests more robust
> >
> >     Some code - especially the crypto layer - wants to use the x86
> >     FP/MMX/AVX register set in what may be interrupt (typically softirq)
> >     context.
> >
> > At no point was there any intention of using this in a hardirq
> > context.
> >
> > Until such a time when you have a valid application for using
> > lib/crypto code in a hardirq context, I don't think we should
> > be supporting that at the expense of real users who are in
> > process/softirq context only.
>
> Whatever.  We agree that "crypto in hardirq" is not a good idea in general.  I'm
> just pointing out that there are certain cases, like SipHash used in a hash
> table, where it easily could happen and would be fine.  And all the shash and
> crypto library functions currently work in any context, unlike e.g. skcipher and
> aead which do not.  You seem to be trying to claim that it was never supported,
> but that is incorrect.  Making it unsupported would be a change that needs to be
> properly documented (the functions would no longer be simply "Any context")
> *and* have proper debug assertions added to enforce it and prevent usage errors.
> But in a lot of cases there is also no reason to even add that restriction.  I'm
> not sure why you're so eager to make the library functions harder to use.
>

Agree with Eric.

There may be cases where some error condition (machine check etc) is
hit while running in hard IRQ context or with IRQs disabled, and the
code that produces the diagnostic, writes to pstore, generates the QR
code for  etc etc may actually be where the library calls to crc32 etc
originate from. So pedantically disallowing that rather than falling
back to a non-SIMD code path make things worse, because now, the
original diagnostic may get lost while the only information left to
debug the issue is an OOPS complaining about a library call in hard
IRQ context.

So while I agree that knowingly invoking library interfaces with IRQs
disabled should be avoided, that is just a variation on the general
adage that IRQs should only be disabled when absolutely necessary. But
that necessity may derive from a condition that exists one or several
layers up.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper
  2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
                   ` (9 preceding siblings ...)
  2025-04-02  3:14 ` [PATCH v2 0/9] crypto: x86 " Herbert Xu
@ 2025-04-07  5:25 ` Herbert Xu
  10 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2025-04-07  5:25 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, linux-kernel, x86

Eric Biggers <ebiggers@kernel.org> wrote:
> Patches 2-9 are almost identical to
> https://lore.kernel.org/r/20250220051325.340691-3-ebiggers@kernel.org/
> but now split into multiple patches.  Patch 1 is just a resend of
> https://lore.kernel.org/r/20250320220648.121990-1-ebiggers@kernel.org/
> which is needed for the series to apply cleanly but is otherwise
> unrelated.  Description of patches 2-9 follows:
> 
> Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
> (crypto/simd.c).  The only purpose of doing so was to work around x86
> not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> hardirq interrupted a task context kernel-mode FPU section and then a
> softirqs were run at the end of that hardirq, those softirqs could not
> use kernel-mode FPU.  This has now been fixed.  In combination with the
> fact that the skcipher and aead APIs only support task and softirq
> contexts, these can now just use kernel-mode FPU unconditionally on x86.
> 
> This simplifies the code and improves performance.
> 
> En/decryption gets at least somewhat faster for everyone, since the
> crypto API functions such as crypto_skcipher_encrypt() now go directly
> to the underlying algorithm rather than taking a detour through
> crypto/simd.c which involved an extra indirect call.  For example, on a
> Ryzen 9 9950X desktop processor, AES-256-XTS is now 23% faster for
> 512-byte messages and 7% faster for 4096-byte messages (when accessed
> through crypto_skcipher_encrypt() or crypto_skcipher_decrypt()).
> 
> There's also a much larger performance improvement for crypto API users
> that only support synchronous algorithms.  These users will now actually
> use the x86 SIMD (e.g. AES-NI or VAES) optimized en/decryption modes,
> which they couldn't before because they were marked as asynchronous.
> 
> Eric Biggers (9):
>  crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code
>  crypto: x86/aegis - stop using the SIMD helper
>  crypto: x86/aes - stop using the SIMD helper
>  crypto: x86/aria - stop using the SIMD helper
>  crypto: x86/camellia - stop using the SIMD helper
>  crypto: x86/cast - stop using the SIMD helper
>  crypto: x86/serpent - stop using the SIMD helper
>  crypto: x86/sm4 - stop using the SIMD helper
>  crypto: x86/twofish - stop using the SIMD helper
> 
> arch/x86/crypto/Kconfig                    |  14 --
> arch/x86/crypto/aegis128-aesni-glue.c      |  13 +-
> arch/x86/crypto/aes-ctr-avx-x86_64.S       |  47 ++----
> arch/x86/crypto/aes-xts-avx-x86_64.S       | 118 ++++++--------
> arch/x86/crypto/aesni-intel_glue.c         | 174 ++++++++-------------
> arch/x86/crypto/aria_aesni_avx2_glue.c     |  22 +--
> arch/x86/crypto/aria_aesni_avx_glue.c      |  20 +--
> arch/x86/crypto/aria_gfni_avx512_glue.c    |  22 +--
> arch/x86/crypto/camellia_aesni_avx2_glue.c |  21 +--
> arch/x86/crypto/camellia_aesni_avx_glue.c  |  21 +--
> arch/x86/crypto/cast5_avx_glue.c           |  21 +--
> arch/x86/crypto/cast6_avx_glue.c           |  20 +--
> arch/x86/crypto/serpent_avx2_glue.c        |  21 +--
> arch/x86/crypto/serpent_avx_glue.c         |  21 +--
> arch/x86/crypto/serpent_sse2_glue.c        |  21 +--
> arch/x86/crypto/sm4_aesni_avx2_glue.c      |  31 ++--
> arch/x86/crypto/sm4_aesni_avx_glue.c       |  31 ++--
> arch/x86/crypto/twofish_avx_glue.c         |  21 +--
> 18 files changed, 227 insertions(+), 432 deletions(-)
> 
> 
> base-commit: 91e5bfe317d8f8471fbaa3e70cf66cae1314a516

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] crypto: x86/chacha - Remove SIMD fallback path
  2025-04-03  4:14                       ` [PATCH] crypto: x86/chacha - Remove SIMD fallback path Herbert Xu
@ 2025-04-07 16:48                         ` Eric Biggers
  2025-04-08  2:12                           ` [PATCH] crypto: x86/chacha - Restore SSSE3 " Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Biggers @ 2025-04-07 16:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld, Linus Torvalds

On Thu, Apr 03, 2025 at 12:14:50PM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 08:59:34PM -0700, Eric Biggers wrote:
> >
> > But in a lot of cases there is also no reason to even add that restriction.  I'm
> > not sure why you're so eager to make the library functions harder to use.
> 
> I have no intention of making any changes to siphash.  It doesn't
> even use SIMD.
> 
> All I want to do is get rid of the crypto_simd_usable() fallback
> paths that we currently have in arch/x86/crypto.  This code is
> never used in hardirq context (and should never be).
> 
> For example:
> 
> ---8<---
> Get rid of the fallback path as SIMD is now always usable in softirq
> context.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 

It looks like this broken patch already got applied for some reason.

First, there doesn't seem to be agreement yet that the library functions should
have requirements on the calling context.

Second, your patch made unrelated changes that deleted the checks for SSSE3
support.  Thus dropping support for CPUs that don't support SSSE3.

- Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] crypto: x86/chacha - Restore SSSE3 fallback path
  2025-04-07 16:48                         ` Eric Biggers
@ 2025-04-08  2:12                           ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2025-04-08  2:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, linux-crypto, linux-kernel, x86,
	Jason A. Donenfeld, Linus Torvalds

On Mon, Apr 07, 2025 at 09:48:42AM -0700, Eric Biggers wrote:
> 
> First, there doesn't seem to be agreement yet that the library functions should
> have requirements on the calling context.

Do you have a real example of hard IRQ usage for chacha? Not some
imaginary post-crash scenario that ends up calling into generic code.

And if you really wanted to do that, it's much better to fix up
kernel_fpu_begin to support hard IRQs rather than adding useless
may_use_simd() checks all over the place.

> Second, your patch made unrelated changes that deleted the checks for SSSE3
> support.  Thus dropping support for CPUs that don't support SSSE3.

Sorry.  That was an oversight.

---8<---
The chacha_use_simd static branch is required for x86 machines that
lack SSSE3 support.  Restore it and the generic fallback code.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 9b4400215e0e ("crypto: x86/chacha - Remove SIMD fallback path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index b7fd7a1f0e15..fcc14c006bde 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -5,11 +5,12 @@
  * Copyright (C) 2015 Martin Willi
  */
 
+#include <asm/simd.h>
 #include <crypto/chacha.h>
+#include <linux/jump_label.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/sizes.h>
-#include <asm/simd.h>
 
 asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
 				       unsigned int len, int nrounds);
@@ -31,6 +32,7 @@ asmlinkage void chacha_4block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
 asmlinkage void chacha_8block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
 					   unsigned int len, int nrounds);
 
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_simd);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_avx2);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_avx512vl);
 
@@ -117,15 +119,23 @@ static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src,
 
 void hchacha_block_arch(const u32 *state, u32 *stream, int nrounds)
 {
-	kernel_fpu_begin();
-	hchacha_block_ssse3(state, stream, nrounds);
-	kernel_fpu_end();
+	if (!static_branch_likely(&chacha_use_simd)) {
+		hchacha_block_generic(state, stream, nrounds);
+	} else {
+		kernel_fpu_begin();
+		hchacha_block_ssse3(state, stream, nrounds);
+		kernel_fpu_end();
+	}
 }
 EXPORT_SYMBOL(hchacha_block_arch);
 
 void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
 		       int nrounds)
 {
+	if (!static_branch_likely(&chacha_use_simd) ||
+	    bytes <= CHACHA_BLOCK_SIZE)
+		return chacha_crypt_generic(state, dst, src, bytes, nrounds);
+
 	do {
 		unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
 
@@ -142,7 +152,7 @@ EXPORT_SYMBOL(chacha_crypt_arch);
 
 bool chacha_is_arch_optimized(void)
 {
-	return true;
+	return static_key_enabled(&chacha_use_simd);
 }
 EXPORT_SYMBOL(chacha_is_arch_optimized);
 
@@ -151,6 +161,8 @@ static int __init chacha_simd_mod_init(void)
 	if (!boot_cpu_has(X86_FEATURE_SSSE3))
 		return 0;
 
+	static_branch_enable(&chacha_use_simd);
+
 	if (boot_cpu_has(X86_FEATURE_AVX) &&
 	    boot_cpu_has(X86_FEATURE_AVX2) &&
 	    cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-04-08  2:12 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-02  0:24 [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Eric Biggers
2025-04-02  0:24 ` [PATCH v2 1/9] crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code Eric Biggers
2025-04-02  0:24 ` [PATCH v2 2/9] crypto: x86/aegis - stop using the SIMD helper Eric Biggers
2025-04-02  0:24 ` [PATCH v2 3/9] crypto: x86/aes " Eric Biggers
2025-04-02  0:24 ` [PATCH v2 4/9] crypto: x86/aria " Eric Biggers
2025-04-02  0:24 ` [PATCH v2 5/9] crypto: x86/camellia " Eric Biggers
2025-04-02  0:24 ` [PATCH v2 6/9] crypto: x86/cast " Eric Biggers
2025-04-02  0:24 ` [PATCH v2 7/9] crypto: x86/serpent " Eric Biggers
2025-04-02  0:24 ` [PATCH v2 8/9] crypto: x86/sm4 " Eric Biggers
2025-04-02  0:24 ` [PATCH v2 9/9] crypto: x86/twofish " Eric Biggers
2025-04-02  3:14 ` [PATCH v2 0/9] crypto: x86 " Herbert Xu
2025-04-02  6:31   ` Ard Biesheuvel
2025-04-02  6:34     ` Ard Biesheuvel
2025-04-02  8:22       ` Herbert Xu
2025-04-02 17:19         ` Eric Biggers
2025-04-03  1:25           ` Herbert Xu
2025-04-03  2:14             ` Eric Biggers
2025-04-03  2:33               ` [PATCH] crypto: hash - Do not use shash in hard IRQs Herbert Xu
2025-04-03  2:56               ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu
2025-04-03  3:20                 ` Eric Biggers
2025-04-03  3:42                   ` Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper) Herbert Xu
2025-04-03  3:59                     ` Eric Biggers
2025-04-03  4:14                       ` [PATCH] crypto: x86/chacha - Remove SIMD fallback path Herbert Xu
2025-04-07 16:48                         ` Eric Biggers
2025-04-08  2:12                           ` [PATCH] crypto: x86/chacha - Restore SSSE3 " Herbert Xu
2025-04-03  7:03                       ` Banning crypto in hardirq context (was: [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper) Ard Biesheuvel
2025-04-07  5:25 ` [PATCH v2 0/9] crypto: x86 - stop using the SIMD helper Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).