public inbox for linux-crypto@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM
@ 2026-03-30 14:46 Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64 Ard Biesheuvel
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-30 14:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, Ard Biesheuvel, Demian Shulhan, Eric Biggers

Apply some tweaks to the new arm64 crc64 NEON intrinsics code, and wire
it up for the 32-bit ARM build. Note that true 32-bit ARM CPUs usually
don't implement the prerequisite 64x64 PMULL instructions, but 32-bit
kernels are commonly used on 64-bit capable hardware too, which do
implement the 32-bit versions of the crypto instructions if they are
implemented for the 64-bit ISA (as per the architecture).

Cc: Demian Shulhan <demyansh@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>

Ard Biesheuvel (5):
  lib/crc: arm64: Drop unnecessary chunking logic from crc64
  lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
  ARM: Add a neon-intrinsics.h header like on arm64
  lib/crc: arm64: Simplify intrinsics implementation
  lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64

 Documentation/arch/arm/kernel_mode_neon.rst |   4 +-
 arch/arm/include/asm/neon-intrinsics.h      |  64 ++++++++++++
 lib/crc/Kconfig                             |   1 +
 lib/crc/Makefile                            |   8 +-
 lib/crc/arm/crc64.h                         |  36 +++++++
 lib/crc/arm64/crc64-neon-inner.c            | 108 ++++++++++++--------
 lib/crc/arm64/crc64.h                       |  12 +--
 7 files changed, 179 insertions(+), 54 deletions(-)
 create mode 100644 arch/arm/include/asm/neon-intrinsics.h
 create mode 100644 lib/crc/arm/crc64.h


base-commit: 63432fd625372a0e79fb00a4009af204f4edc013
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64
  2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
@ 2026-03-30 14:46 ` Ard Biesheuvel
  2026-03-31 22:33   ` Eric Biggers
  2026-03-30 14:46 ` [PATCH 2/5] lib/crc: arm64: Use existing macros for kernel-mode FPU cflags Ard Biesheuvel
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-30 14:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, Ard Biesheuvel, Demian Shulhan, Eric Biggers

On arm64, kernel mode NEON executes with preemption enabled, so there is
no need to chunk the input by hand.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/crc/arm64/crc64.h | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/crc/arm64/crc64.h b/lib/crc/arm64/crc64.h
index cc65abeee24c..ab052a782c07 100644
--- a/lib/crc/arm64/crc64.h
+++ b/lib/crc/arm64/crc64.h
@@ -16,15 +16,13 @@ static inline u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len)
 {
 	if (len >= 128 && cpu_have_named_feature(PMULL) &&
 	    likely(may_use_simd())) {
-		do {
-			size_t chunk = min_t(size_t, len & ~15, SZ_4K);
+		size_t chunk = len & ~15;
 
-			scoped_ksimd()
-				crc = crc64_nvme_arm64_c(crc, p, chunk);
+		scoped_ksimd()
+			crc = crc64_nvme_arm64_c(crc, p, chunk);
 
-			p += chunk;
-			len -= chunk;
-		} while (len >= 128);
+		p += chunk;
+		len &= 15;
 	}
 	return crc64_nvme_generic(crc, p, len);
 }
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
  2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64 Ard Biesheuvel
@ 2026-03-30 14:46 ` Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 3/5] ARM: Add a neon-intrinsics.h header like on arm64 Ard Biesheuvel
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-30 14:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, Ard Biesheuvel, Demian Shulhan, Eric Biggers

Use the existing CC_FPU_CFLAGS and CC_NO_FPU_CFLAGS to pass the
appropriate compiler command line options for building kernel mode NEON
intrinsics code. This is tidier, and will make it easier to reuse the
code for 32-bit ARM.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/crc/Makefile | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/crc/Makefile b/lib/crc/Makefile
index c9c35419b39c..ff213590e4e3 100644
--- a/lib/crc/Makefile
+++ b/lib/crc/Makefile
@@ -39,9 +39,8 @@ crc64-y := crc64-main.o
 ifeq ($(CONFIG_CRC64_ARCH),y)
 CFLAGS_crc64-main.o += -I$(src)/$(SRCARCH)
 
-CFLAGS_REMOVE_arm64/crc64-neon-inner.o += -mgeneral-regs-only
-CFLAGS_arm64/crc64-neon-inner.o += -ffreestanding -march=armv8-a+crypto
-CFLAGS_arm64/crc64-neon-inner.o += -isystem $(shell $(CC) -print-file-name=include)
+CFLAGS_REMOVE_arm64/crc64-neon-inner.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_arm64/crc64-neon-inner.o += $(CC_FLAGS_FPU) -march=armv8-a+crypto
 crc64-$(CONFIG_ARM64) += arm64/crc64-neon-inner.o
 
 crc64-$(CONFIG_RISCV) += riscv/crc64_lsb.o riscv/crc64_msb.o
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] ARM: Add a neon-intrinsics.h header like on arm64
  2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64 Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 2/5] lib/crc: arm64: Use existing macros for kernel-mode FPU cflags Ard Biesheuvel
@ 2026-03-30 14:46 ` Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 4/5] lib/crc: arm64: Simplify intrinsics implementation Ard Biesheuvel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-30 14:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, Ard Biesheuvel, Demian Shulhan, Eric Biggers

Add a header asm/neon-intrinsics.h similar to the one that arm64 has.
This makes it possible for NEON intrinsics code to be shared seamlessly
between ARM and arm64.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 Documentation/arch/arm/kernel_mode_neon.rst |  4 +-
 arch/arm/include/asm/neon-intrinsics.h      | 64 ++++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/arm/kernel_mode_neon.rst b/Documentation/arch/arm/kernel_mode_neon.rst
index 9bfb71a2a9b9..1efb6d35b7bd 100644
--- a/Documentation/arch/arm/kernel_mode_neon.rst
+++ b/Documentation/arch/arm/kernel_mode_neon.rst
@@ -121,4 +121,6 @@ observe the following in addition to the rules above:
 * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
   uses its builtin version of <stdint.h> (this is a C99 header which the kernel
   does not supply);
-* Include <arm_neon.h> last, or at least after <linux/types.h>
+* Do not include <arm_neon.h> directly: instead, include <asm/neon-intrinsics.h>,
+  which tweaks some macro definitions so that system headers can be included
+  safely.
diff --git a/arch/arm/include/asm/neon-intrinsics.h b/arch/arm/include/asm/neon-intrinsics.h
new file mode 100644
index 000000000000..3fe0b5ab9659
--- /dev/null
+++ b/arch/arm/include/asm/neon-intrinsics.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_NEON_INTRINSICS_H
+#define __ASM_NEON_INTRINSICS_H
+
+#ifndef __ARM_NEON__
+#error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
+#endif
+
+#include <asm-generic/int-ll64.h>
+
+/*
+ * The C99 types uintXX_t that are usually defined in 'stdint.h' are not as
+ * unambiguous on ARM as you would expect. For the types below, there is a
+ * difference on ARM between GCC built for bare metal ARM, GCC built for glibc
+ * and the kernel itself, which results in build errors if you try to build
+ * with -ffreestanding and include 'stdint.h' (such as when you include
+ * 'arm_neon.h' in order to use NEON intrinsics)
+ *
+ * As the typedefs for these types in 'stdint.h' are based on builtin defines
+ * supplied by GCC, we can tweak these to align with the kernel's idea of those
+ * types, so 'linux/types.h' and 'stdint.h' can be safely included from the
+ * same source file (provided that -ffreestanding is used).
+ *
+ *                    int32_t     uint32_t          intptr_t     uintptr_t
+ * bare metal GCC     long        unsigned long     int          unsigned int
+ * glibc GCC          int         unsigned int      int          unsigned int
+ * kernel             int         unsigned int      long         unsigned long
+ */
+
+#ifdef __INT32_TYPE__
+#undef __INT32_TYPE__
+#define __INT32_TYPE__		int
+#endif
+
+#ifdef __UINT32_TYPE__
+#undef __UINT32_TYPE__
+#define __UINT32_TYPE__		unsigned int
+#endif
+
+#ifdef __INTPTR_TYPE__
+#undef __INTPTR_TYPE__
+#define __INTPTR_TYPE__		long
+#endif
+
+#ifdef __UINTPTR_TYPE__
+#undef __UINTPTR_TYPE__
+#define __UINTPTR_TYPE__	unsigned long
+#endif
+
+/*
+ * genksyms chokes on the ARM NEON instrinsics system header, but we
+ * don't export anything it defines anyway, so just disregard when
+ * genksyms execute.
+ */
+#ifndef __GENKSYMS__
+#include <arm_neon.h>
+#endif
+
+#ifdef CONFIG_CC_IS_CLANG
+#pragma clang diagnostic ignored "-Wincompatible-pointer-types"
+#endif
+
+#endif /* __ASM_NEON_INTRINSICS_H */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] lib/crc: arm64: Simplify intrinsics implementation
  2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2026-03-30 14:46 ` [PATCH 3/5] ARM: Add a neon-intrinsics.h header like on arm64 Ard Biesheuvel
@ 2026-03-30 14:46 ` Ard Biesheuvel
  2026-03-30 14:46 ` [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64 Ard Biesheuvel
  2026-04-01 19:59 ` [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Eric Biggers
  5 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-30 14:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, Ard Biesheuvel, Demian Shulhan, Eric Biggers

NEON intrinsics are useful because they remove the need for manual
register allocation, and the resulting code can be re-compiled and
optimized for different micro-architectures, and shared between arm64
and 32-bit ARM.

However, the strong typing of the vector variables can lead to
incomprehensible gibberish, as is the case with the new CRC64
implementation. To address this, let's repaint all variables as
uint64x2_t to minimize the number of vreinterpretq_xxx() calls, and to
be able to rely on the ^ operator for exclusive OR operations. This
makes the code much more concise and readable.

While at it, wrap the calls to vmull_p64() et al in order to have a more
consistent calling convention, and encapsulate any remaining
vreinterpret() calls that are still needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/crc/arm64/crc64-neon-inner.c | 77 ++++++++------------
 1 file changed, 32 insertions(+), 45 deletions(-)

diff --git a/lib/crc/arm64/crc64-neon-inner.c b/lib/crc/arm64/crc64-neon-inner.c
index 881cdafadb37..28527e544ff6 100644
--- a/lib/crc/arm64/crc64-neon-inner.c
+++ b/lib/crc/arm64/crc64-neon-inner.c
@@ -8,9 +8,6 @@
 
 u64 crc64_nvme_arm64_c(u64 crc, const u8 *p, size_t len);
 
-#define GET_P64_0(v) ((poly64_t)vgetq_lane_u64(vreinterpretq_u64_p64(v), 0))
-#define GET_P64_1(v) ((poly64_t)vgetq_lane_u64(vreinterpretq_u64_p64(v), 1))
-
 /* x^191 mod G, x^127 mod G */
 static const u64 fold_consts_val[2] = { 0xeadc41fd2ba3d420ULL,
 					0x21e9761e252621acULL };
@@ -18,61 +15,51 @@ static const u64 fold_consts_val[2] = { 0xeadc41fd2ba3d420ULL,
 static const u64 bconsts_val[2] = { 0x27ecfa329aef9f77ULL,
 				    0x34d926535897936aULL };
 
-u64 crc64_nvme_arm64_c(u64 crc, const u8 *p, size_t len)
+static inline uint64x2_t pmull64(uint64x2_t a, uint64x2_t b)
 {
-	uint64x2_t v0_u64 = { crc, 0 };
-	poly64x2_t v0 = vreinterpretq_p64_u64(v0_u64);
-	poly64x2_t fold_consts =
-		vreinterpretq_p64_u64(vld1q_u64(fold_consts_val));
-	poly64x2_t v1 = vreinterpretq_p64_u8(vld1q_u8(p));
+	return vreinterpretq_u64_p128(vmull_p64(vgetq_lane_u64(a, 0),
+						vgetq_lane_u64(b, 0)));
+}
 
-	v0 = vreinterpretq_p64_u8(veorq_u8(vreinterpretq_u8_p64(v0),
-					   vreinterpretq_u8_p64(v1)));
-	p += 16;
-	len -= 16;
+static inline uint64x2_t pmull64_high(uint64x2_t a, uint64x2_t b)
+{
+	poly64x2_t l = vreinterpretq_p64_u64(a);
+	poly64x2_t m = vreinterpretq_p64_u64(b);
 
-	do {
-		v1 = vreinterpretq_p64_u8(vld1q_u8(p));
+	return vreinterpretq_u64_p128(vmull_high_p64(l, m));
+}
 
-		poly128_t v2 = vmull_high_p64(fold_consts, v0);
-		poly128_t v0_128 =
-			vmull_p64(GET_P64_0(fold_consts), GET_P64_0(v0));
+static inline uint64x2_t pmull64_hi_lo(uint64x2_t a, uint64x2_t b)
+{
+	return vreinterpretq_u64_p128(vmull_p64(vgetq_lane_u64(a, 1),
+						vgetq_lane_u64(b, 0)));
+}
 
-		uint8x16_t x0 = veorq_u8(vreinterpretq_u8_p128(v0_128),
-					 vreinterpretq_u8_p128(v2));
+u64 crc64_nvme_arm64_c(u64 crc, const u8 *p, size_t len)
+{
+	uint64x2_t fold_consts = vld1q_u64(fold_consts_val);
+	uint64x2_t v0 = { crc, 0 };
+	uint64x2_t zero = { };
 
-		x0 = veorq_u8(x0, vreinterpretq_u8_p64(v1));
-		v0 = vreinterpretq_p64_u8(x0);
+	for (;;) {
+		v0 ^= vreinterpretq_u64_u8(vld1q_u8(p));
 
 		p += 16;
 		len -= 16;
-	} while (len >= 16);
-
-	/* Multiply the 128-bit value by x^64 and reduce it back to 128 bits. */
-	poly64x2_t v7 = vreinterpretq_p64_u64((uint64x2_t){ 0, 0 });
-	poly128_t v1_128 = vmull_p64(GET_P64_1(fold_consts), GET_P64_0(v0));
+		if (len < 16)
+			break;
 
-	uint8x16_t ext_v0 =
-		vextq_u8(vreinterpretq_u8_p64(v0), vreinterpretq_u8_p64(v7), 8);
-	uint8x16_t x0 = veorq_u8(ext_v0, vreinterpretq_u8_p128(v1_128));
+		v0 = pmull64(fold_consts, v0) ^ pmull64_high(fold_consts, v0);
+	}
 
-	v0 = vreinterpretq_p64_u8(x0);
+	/* Multiply the 128-bit value by x^64 and reduce it back to 128 bits. */
+	v0 = vextq_u64(v0, zero, 1) ^ pmull64_hi_lo(fold_consts, v0);
 
 	/* Final Barrett reduction */
-	poly64x2_t bconsts = vreinterpretq_p64_u64(vld1q_u64(bconsts_val));
-
-	v1_128 = vmull_p64(GET_P64_0(bconsts), GET_P64_0(v0));
-
-	poly64x2_t v1_64 = vreinterpretq_p64_u8(vreinterpretq_u8_p128(v1_128));
-	poly128_t v3_128 = vmull_p64(GET_P64_1(bconsts), GET_P64_0(v1_64));
-
-	x0 = veorq_u8(vreinterpretq_u8_p64(v0), vreinterpretq_u8_p128(v3_128));
-
-	uint8x16_t ext_v2 = vextq_u8(vreinterpretq_u8_p64(v7),
-				     vreinterpretq_u8_p128(v1_128), 8);
+	uint64x2_t bconsts = vld1q_u64(bconsts_val);
+	uint64x2_t final = pmull64(bconsts, v0);
 
-	x0 = veorq_u8(x0, ext_v2);
+	v0 ^= vextq_u64(zero, final, 1) ^ pmull64_hi_lo(bconsts, final);
 
-	v0 = vreinterpretq_p64_u8(x0);
-	return vgetq_lane_u64(vreinterpretq_u64_p64(v0), 1);
+	return vgetq_lane_u64(v0, 1);
 }
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
  2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2026-03-30 14:46 ` [PATCH 4/5] lib/crc: arm64: Simplify intrinsics implementation Ard Biesheuvel
@ 2026-03-30 14:46 ` Ard Biesheuvel
  2026-03-31  6:47   ` Christoph Hellwig
  2026-03-31 22:41   ` Eric Biggers
  2026-04-01 19:59 ` [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Eric Biggers
  5 siblings, 2 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-30 14:46 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, Ard Biesheuvel, Demian Shulhan, Eric Biggers

Tweak the NEON intrinsics crc64 code written for arm64 so it can be
built for 32-bit ARM as well. The only workaround needed is to provide
alternatives for vmull_p64() and vmull_high_p64() on Clang, which only
defines those when building for the AArch64 or arm64ec ISA.

KUnit benchmark results (Cortex-A53 @ 1 Ghz)

Before:

   # crc64_nvme_benchmark: len=1: 35 MB/s
   # crc64_nvme_benchmark: len=16: 78 MB/s
   # crc64_nvme_benchmark: len=64: 87 MB/s
   # crc64_nvme_benchmark: len=127: 88 MB/s
   # crc64_nvme_benchmark: len=128: 88 MB/s
   # crc64_nvme_benchmark: len=200: 89 MB/s
   # crc64_nvme_benchmark: len=256: 89 MB/s
   # crc64_nvme_benchmark: len=511: 89 MB/s
   # crc64_nvme_benchmark: len=512: 89 MB/s
   # crc64_nvme_benchmark: len=1024: 90 MB/s
   # crc64_nvme_benchmark: len=3173: 90 MB/s
   # crc64_nvme_benchmark: len=4096: 90 MB/s
   # crc64_nvme_benchmark: len=16384: 90 MB/s

After:

   # crc64_nvme_benchmark: len=1: 32 MB/s
   # crc64_nvme_benchmark: len=16: 76 MB/s
   # crc64_nvme_benchmark: len=64: 71 MB/s
   # crc64_nvme_benchmark: len=127: 88 MB/s
   # crc64_nvme_benchmark: len=128: 618 MB/s
   # crc64_nvme_benchmark: len=200: 542 MB/s
   # crc64_nvme_benchmark: len=256: 920 MB/s
   # crc64_nvme_benchmark: len=511: 836 MB/s
   # crc64_nvme_benchmark: len=512: 1261 MB/s
   # crc64_nvme_benchmark: len=1024: 1531 MB/s
   # crc64_nvme_benchmark: len=3173: 1731 MB/s
   # crc64_nvme_benchmark: len=4096: 1851 MB/s
   # crc64_nvme_benchmark: len=16384: 1858 MB/s

Enable big-endian support only on GCC - the code generated by Clang is
horribly broken.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/crc/Kconfig                  |  1 +
 lib/crc/Makefile                 |  5 ++-
 lib/crc/arm/crc64.h              | 36 ++++++++++++++++++++
 lib/crc/arm64/crc64-neon-inner.c | 35 +++++++++++++++++++
 4 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/lib/crc/Kconfig b/lib/crc/Kconfig
index 31038c8d111a..2f93d4c4d52d 100644
--- a/lib/crc/Kconfig
+++ b/lib/crc/Kconfig
@@ -82,6 +82,7 @@ config CRC64
 config CRC64_ARCH
 	bool
 	depends on CRC64 && CRC_OPTIMIZATIONS
+	default y if ARM && KERNEL_MODE_NEON && !(CPU_BIG_ENDIAN && CC_IS_CLANG)
 	default y if ARM64
 	default y if RISCV && RISCV_ISA_ZBC && 64BIT
 	default y if X86_64
diff --git a/lib/crc/Makefile b/lib/crc/Makefile
index ff213590e4e3..b6c381cc66bb 100644
--- a/lib/crc/Makefile
+++ b/lib/crc/Makefile
@@ -39,8 +39,11 @@ crc64-y := crc64-main.o
 ifeq ($(CONFIG_CRC64_ARCH),y)
 CFLAGS_crc64-main.o += -I$(src)/$(SRCARCH)
 
+crc64-cflags-$(CONFIG_ARM) += -march=armv8-a -mfpu=crypto-neon-fp-armv8
+crc64-cflags-$(CONFIG_ARM64) += -march=armv8-a+crypto
 CFLAGS_REMOVE_arm64/crc64-neon-inner.o += $(CC_FLAGS_NO_FPU)
-CFLAGS_arm64/crc64-neon-inner.o += $(CC_FLAGS_FPU) -march=armv8-a+crypto
+CFLAGS_arm64/crc64-neon-inner.o += $(CC_FLAGS_FPU) $(crc64-cflags-y)
+crc64-$(CONFIG_ARM) += arm64/crc64-neon-inner.o
 crc64-$(CONFIG_ARM64) += arm64/crc64-neon-inner.o
 
 crc64-$(CONFIG_RISCV) += riscv/crc64_lsb.o riscv/crc64_msb.o
diff --git a/lib/crc/arm/crc64.h b/lib/crc/arm/crc64.h
new file mode 100644
index 000000000000..7c8d54f38e5c
--- /dev/null
+++ b/lib/crc/arm/crc64.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * CRC64 using ARM PMULL instructions
+ */
+
+#include <asm/simd.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pmull);
+
+u64 crc64_nvme_arm64_c(u64 crc, const u8 *p, size_t len);
+
+#define crc64_be_arch crc64_be_generic
+
+static inline u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len)
+{
+	if (len >= 128 && static_branch_likely(&have_pmull) &&
+	    likely(may_use_simd())) {
+		do {
+			size_t chunk = min_t(size_t, len & ~15, SZ_4K);
+
+			scoped_ksimd()
+				crc = crc64_nvme_arm64_c(crc, p, chunk);
+
+			p += chunk;
+			len -= chunk;
+		} while (len >= 128);
+	}
+	return crc64_nvme_generic(crc, p, len);
+}
+
+#define crc64_mod_init_arch crc64_mod_init_arch
+static void crc64_mod_init_arch(void)
+{
+	if (elf_hwcap2 & HWCAP2_PMULL)
+		static_branch_enable(&have_pmull);
+}
diff --git a/lib/crc/arm64/crc64-neon-inner.c b/lib/crc/arm64/crc64-neon-inner.c
index 28527e544ff6..99607dbb7bfd 100644
--- a/lib/crc/arm64/crc64-neon-inner.c
+++ b/lib/crc/arm64/crc64-neon-inner.c
@@ -15,6 +15,40 @@ static const u64 fold_consts_val[2] = { 0xeadc41fd2ba3d420ULL,
 static const u64 bconsts_val[2] = { 0x27ecfa329aef9f77ULL,
 				    0x34d926535897936aULL };
 
+#if defined(CONFIG_ARM) && defined(CONFIG_CC_IS_CLANG)
+static inline uint64x2_t pmull64(uint64x2_t a, uint64x2_t b)
+{
+	uint64_t l = vgetq_lane_u64(a, 0);
+	uint64_t m = vgetq_lane_u64(b, 0);
+	uint64x2_t result;
+
+	asm("vmull.p64	%q0, %1, %2" : "=w"(result) : "w"(l), "w"(m));
+
+	return result;
+}
+
+static inline uint64x2_t pmull64_high(uint64x2_t a, uint64x2_t b)
+{
+	uint64_t l = vgetq_lane_u64(a, 1);
+	uint64_t m = vgetq_lane_u64(b, 1);
+	uint64x2_t result;
+
+	asm("vmull.p64	%q0, %1, %2" : "=w"(result) : "w"(l), "w"(m));
+
+	return result;
+}
+
+static inline uint64x2_t pmull64_hi_lo(uint64x2_t a, uint64x2_t b)
+{
+	uint64_t l = vgetq_lane_u64(a, 1);
+	uint64_t m = vgetq_lane_u64(b, 0);
+	uint64x2_t result;
+
+	asm("vmull.p64	%q0, %1, %2" : "=w"(result) : "w"(l), "w"(m));
+
+	return result;
+}
+#else
 static inline uint64x2_t pmull64(uint64x2_t a, uint64x2_t b)
 {
 	return vreinterpretq_u64_p128(vmull_p64(vgetq_lane_u64(a, 0),
@@ -34,6 +68,7 @@ static inline uint64x2_t pmull64_hi_lo(uint64x2_t a, uint64x2_t b)
 	return vreinterpretq_u64_p128(vmull_p64(vgetq_lane_u64(a, 1),
 						vgetq_lane_u64(b, 0)));
 }
+#endif
 
 u64 crc64_nvme_arm64_c(u64 crc, const u8 *p, size_t len)
 {
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
  2026-03-30 14:46 ` [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64 Ard Biesheuvel
@ 2026-03-31  6:47   ` Christoph Hellwig
  2026-03-31  8:20     ` Ard Biesheuvel
  2026-03-31 22:41   ` Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2026-03-31  6:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-crypto, linux-arm-kernel, Demian Shulhan, Eric Biggers

>  	depends on CRC64 && CRC_OPTIMIZATIONS
> +	default y if ARM && KERNEL_MODE_NEON && !(CPU_BIG_ENDIAN && CC_IS_CLANG)

It would be useful to throw in a comment here why it is disabled for
big-endian on clang.

> +#define crc64_be_arch crc64_be_generic
> +
> +static inline u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len)
> +{
> +	if (len >= 128 && static_branch_likely(&have_pmull) &&
> +	    likely(may_use_simd())) {
> +		do {
> +			size_t chunk = min_t(size_t, len & ~15, SZ_4K);
> +
> +			scoped_ksimd()
> +				crc = crc64_nvme_arm64_c(crc, p, chunk);
> +
> +			p += chunk;
> +			len -= chunk;
> +		} while (len >= 128);
> +	}

From reading the earlier patches, I'll assume arm SIMD code is
non-preemptable and thus you want the chunking here?  Maybe add
a little comment explaining that?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
  2026-03-31  6:47   ` Christoph Hellwig
@ 2026-03-31  8:20     ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-03-31  8:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-crypto, linux-arm-kernel, Demian Shulhan, Eric Biggers


On Tue, 31 Mar 2026, at 08:47, Christoph Hellwig wrote:
>>  	depends on CRC64 && CRC_OPTIMIZATIONS
>> +	default y if ARM && KERNEL_MODE_NEON && !(CPU_BIG_ENDIAN && CC_IS_CLANG)
>
> It would be useful to throw in a comment here why it is disabled for
> big-endian on clang.
>

Ack.

>> +#define crc64_be_arch crc64_be_generic
>> +
>> +static inline u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len)
>> +{
>> +	if (len >= 128 && static_branch_likely(&have_pmull) &&
>> +	    likely(may_use_simd())) {
>> +		do {
>> +			size_t chunk = min_t(size_t, len & ~15, SZ_4K);
>> +
>> +			scoped_ksimd()
>> +				crc = crc64_nvme_arm64_c(crc, p, chunk);
>> +
>> +			p += chunk;
>> +			len -= chunk;
>> +		} while (len >= 128);
>> +	}
>
> From reading the earlier patches, I'll assume arm SIMD code is
> non-preemptable and thus you want the chunking here?  Maybe add
> a little comment explaining that?

Indeed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64
  2026-03-30 14:46 ` [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64 Ard Biesheuvel
@ 2026-03-31 22:33   ` Eric Biggers
  2026-04-01  0:09     ` Eric Biggers
  2026-04-01  6:57     ` Ard Biesheuvel
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Biggers @ 2026-03-31 22:33 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan

On Mon, Mar 30, 2026 at 04:46:32PM +0200, Ard Biesheuvel wrote:
> On arm64, kernel mode NEON executes with preemption enabled, so there is
> no need to chunk the input by hand.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

There's still similar "chunking" in other arm64 code:

    $ git grep -E 'SZ_4K|cond_yield' lib/crypto/arm64
    lib/crypto/arm64/chacha.h:              unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
    lib/crypto/arm64/poly1305.h:                    unsigned int todo = min_t(unsigned int, len, SZ_4K);
    lib/crypto/arm64/sha1-ce-core.S:        cond_yield      1f, x5, x6
    lib/crypto/arm64/sha256-ce.S:   cond_yield      1f, x5, x6
    lib/crypto/arm64/sha3-ce-core.S:        cond_yield 4f, x8, x9
    lib/crypto/arm64/sha512-ce-core.S:      cond_yield      3f, x4, x5

I thought it was still sticking around, despite kernel-mode NEON now
being preemptible on arm64, because of CONFIG_PREEMPT_VOLUNTARY.

However, I see that support for CONFIG_PREEMPT_VOLUNTARY was recently
removed on arm64.  So that's what finally makes this no longer needed,
and we can now clean up these other cases too, right?

(Though, I can't find where the voluntary preemption points actually
were.  So maybe they weren't actually there anyway.)

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
  2026-03-30 14:46 ` [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64 Ard Biesheuvel
  2026-03-31  6:47   ` Christoph Hellwig
@ 2026-03-31 22:41   ` Eric Biggers
  2026-04-01 16:48     ` Ard Biesheuvel
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2026-03-31 22:41 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan

On Mon, Mar 30, 2026 at 04:46:36PM +0200, Ard Biesheuvel wrote:
> Enable big-endian support only on GCC - the code generated by Clang is
> horribly broken.
[...]
> +#if defined(CONFIG_ARM) && defined(CONFIG_CC_IS_CLANG)
> +static inline uint64x2_t pmull64(uint64x2_t a, uint64x2_t b)
> +{
> +	uint64_t l = vgetq_lane_u64(a, 0);
> +	uint64_t m = vgetq_lane_u64(b, 0);
> +	uint64x2_t result;
> +
> +	asm("vmull.p64	%q0, %1, %2" : "=w"(result) : "w"(l), "w"(m));
> +
> +	return result;
> +}

Perhaps omit big endian support, and use the inline asm implementation
of these functions with both gcc and clang?  The more unique
combinations need to be tested to cover all the code, the higher the
chance of one being missed in testing.

Also, leaving shared code in lib/crc/arm64/ will be confusing.  How
about lib/crc/arm-common/, and crc64_nvme_arm64_c => crc64_nvme_neon()?
Or even just put crc64-neon.c directly in lib/crc/.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64
  2026-03-31 22:33   ` Eric Biggers
@ 2026-04-01  0:09     ` Eric Biggers
  2026-04-01  6:57     ` Ard Biesheuvel
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2026-04-01  0:09 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan

On Tue, Mar 31, 2026 at 03:33:00PM -0700, Eric Biggers wrote:
> On Mon, Mar 30, 2026 at 04:46:32PM +0200, Ard Biesheuvel wrote:
> > On arm64, kernel mode NEON executes with preemption enabled, so there is
> > no need to chunk the input by hand.
> > 
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> 
> There's still similar "chunking" in other arm64 code:
> 
>     $ git grep -E 'SZ_4K|cond_yield' lib/crypto/arm64
>     lib/crypto/arm64/chacha.h:              unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
>     lib/crypto/arm64/poly1305.h:                    unsigned int todo = min_t(unsigned int, len, SZ_4K);
>     lib/crypto/arm64/sha1-ce-core.S:        cond_yield      1f, x5, x6
>     lib/crypto/arm64/sha256-ce.S:   cond_yield      1f, x5, x6
>     lib/crypto/arm64/sha3-ce-core.S:        cond_yield 4f, x8, x9
>     lib/crypto/arm64/sha512-ce-core.S:      cond_yield      3f, x4, x5
> 
> I thought it was still sticking around, despite kernel-mode NEON now
> being preemptible on arm64, because of CONFIG_PREEMPT_VOLUNTARY.
> 
> However, I see that support for CONFIG_PREEMPT_VOLUNTARY was recently
> removed on arm64.  So that's what finally makes this no longer needed,
> and we can now clean up these other cases too, right?
> 
> (Though, I can't find where the voluntary preemption points actually
> were.  So maybe they weren't actually there anyway.)

https://lore.kernel.org/linux-crypto/20260401000548.133151-1-ebiggers@kernel.org/
cleans up all the similar code in lib/crypto/arm64/.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64
  2026-03-31 22:33   ` Eric Biggers
  2026-04-01  0:09     ` Eric Biggers
@ 2026-04-01  6:57     ` Ard Biesheuvel
  1 sibling, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-04-01  6:57 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan



On Wed, 1 Apr 2026, at 00:33, Eric Biggers wrote:
> On Mon, Mar 30, 2026 at 04:46:32PM +0200, Ard Biesheuvel wrote:
>> On arm64, kernel mode NEON executes with preemption enabled, so there is
>> no need to chunk the input by hand.
>> 
>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> There's still similar "chunking" in other arm64 code:
>
>     $ git grep -E 'SZ_4K|cond_yield' lib/crypto/arm64
>     lib/crypto/arm64/chacha.h:              unsigned int todo = 
> min_t(unsigned int, bytes, SZ_4K);
>     lib/crypto/arm64/poly1305.h:                    unsigned int todo = 
> min_t(unsigned int, len, SZ_4K);
>     lib/crypto/arm64/sha1-ce-core.S:        cond_yield      1f, x5, x6
>     lib/crypto/arm64/sha256-ce.S:   cond_yield      1f, x5, x6
>     lib/crypto/arm64/sha3-ce-core.S:        cond_yield 4f, x8, x9
>     lib/crypto/arm64/sha512-ce-core.S:      cond_yield      3f, x4, x5
>
> I thought it was still sticking around, despite kernel-mode NEON now
> being preemptible on arm64, because of CONFIG_PREEMPT_VOLUNTARY.
>
> However, I see that support for CONFIG_PREEMPT_VOLUNTARY was recently
> removed on arm64.  So that's what finally makes this no longer needed,
> and we can now clean up these other cases too, right?
>

Indeed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
  2026-03-31 22:41   ` Eric Biggers
@ 2026-04-01 16:48     ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2026-04-01 16:48 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan



On Wed, 1 Apr 2026, at 00:41, Eric Biggers wrote:
> On Mon, Mar 30, 2026 at 04:46:36PM +0200, Ard Biesheuvel wrote:
>> Enable big-endian support only on GCC - the code generated by Clang is
>> horribly broken.
> [...]
>> +#if defined(CONFIG_ARM) && defined(CONFIG_CC_IS_CLANG)
>> +static inline uint64x2_t pmull64(uint64x2_t a, uint64x2_t b)
>> +{
>> +	uint64_t l = vgetq_lane_u64(a, 0);
>> +	uint64_t m = vgetq_lane_u64(b, 0);
>> +	uint64x2_t result;
>> +
>> +	asm("vmull.p64	%q0, %1, %2" : "=w"(result) : "w"(l), "w"(m));
>> +
>> +	return result;
>> +}
>
> Perhaps omit big endian support, and use the inline asm implementation
> of these functions with both gcc and clang?  The more unique
> combinations need to be tested to cover all the code, the higher the
> chance of one being missed in testing.
>

Yeah that should work.

> Also, leaving shared code in lib/crc/arm64/ will be confusing.  How
> about lib/crc/arm-common/, and crc64_nvme_arm64_c => crc64_nvme_neon()?
> Or even just put crc64-neon.c directly in lib/crc/.
>

Yeah the latter seems the most straight-forward.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM
  2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2026-03-30 14:46 ` [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64 Ard Biesheuvel
@ 2026-04-01 19:59 ` Eric Biggers
  2026-04-02  8:52   ` Ard Biesheuvel
  5 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2026-04-01 19:59 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan

On Mon, Mar 30, 2026 at 04:46:31PM +0200, Ard Biesheuvel wrote:
> Apply some tweaks to the new arm64 crc64 NEON intrinsics code, and wire
> it up for the 32-bit ARM build. Note that true 32-bit ARM CPUs usually
> don't implement the prerequisite 64x64 PMULL instructions, but 32-bit
> kernels are commonly used on 64-bit capable hardware too, which do
> implement the 32-bit versions of the crypto instructions if they are
> implemented for the 64-bit ISA (as per the architecture).
> 
> Cc: Demian Shulhan <demyansh@gmail.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> 
> Ard Biesheuvel (5):
>   lib/crc: arm64: Drop unnecessary chunking logic from crc64
>   lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
>   ARM: Add a neon-intrinsics.h header like on arm64
>   lib/crc: arm64: Simplify intrinsics implementation
>   lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64

I think patches 3 and 4 should be swapped, so it's cleanups first (which
make sense regardless of the 32-bit ARM support) and then the 32-bit ARM
support.

I do think we should be aware that even with the code mostly shared
using the NEON intrinsics, the 32-bit ARM support (which works only on
CPUs that support PMULL, i.e. are also 64-bit capable) doesn't come for
free.  We should expect to deal with occasional issues related to the
intrinsics with certain compiler versions, compiler flags, etc.

I assume that "32-bit kernels on ARMv8 CPUs" is currently still a big
enough niche to bother with this, despite that niche getting smaller
over time.  But as I mentioned I do think we should try to simplify it
as much as possible, e.g. by supporting little-endian only and avoiding
#ifdefs based on things like the compiler whenever possible.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM
  2026-04-01 19:59 ` [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Eric Biggers
@ 2026-04-02  8:52   ` Ard Biesheuvel
  2026-04-02 23:40     ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2026-04-02  8:52 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan


On Wed, 1 Apr 2026, at 21:59, Eric Biggers wrote:
> On Mon, Mar 30, 2026 at 04:46:31PM +0200, Ard Biesheuvel wrote:
>> Apply some tweaks to the new arm64 crc64 NEON intrinsics code, and wire
>> it up for the 32-bit ARM build. Note that true 32-bit ARM CPUs usually
>> don't implement the prerequisite 64x64 PMULL instructions, but 32-bit
>> kernels are commonly used on 64-bit capable hardware too, which do
>> implement the 32-bit versions of the crypto instructions if they are
>> implemented for the 64-bit ISA (as per the architecture).
>> 
>> Cc: Demian Shulhan <demyansh@gmail.com>
>> Cc: Eric Biggers <ebiggers@kernel.org>
>> 
>> Ard Biesheuvel (5):
>>   lib/crc: arm64: Drop unnecessary chunking logic from crc64
>>   lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
>>   ARM: Add a neon-intrinsics.h header like on arm64
>>   lib/crc: arm64: Simplify intrinsics implementation
>>   lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
>
> I think patches 3 and 4 should be swapped, so it's cleanups first (which
> make sense regardless of the 32-bit ARM support) and then the 32-bit ARM
> support.
>

Ok.

> I do think we should be aware that even with the code mostly shared
> using the NEON intrinsics, the 32-bit ARM support (which works only on
> CPUs that support PMULL, i.e. are also 64-bit capable) doesn't come for
> free.  We should expect to deal with occasional issues related to the
> intrinsics with certain compiler versions, compiler flags, etc.
>
> I assume that "32-bit kernels on ARMv8 CPUs" is currently still a big
> enough niche to bother with this, despite that niche getting smaller
> over time.

Running a 32-bit kernel on 64-bit capable hardware is usually done to reduce the RAM footprint, and that problem hasn't gotten any smaller lately. And 20x speedup is rather significant.

>  But as I mentioned I do think we should try to simplify it
> as much as possible, e.g. by supporting little-endian only and avoiding
> #ifdefs based on things like the compiler whenever possible.
>

Sure. The only reason I think this is worth the effort is because the same code can be used on ARM and arm64, so once this is no longer the case, I don't think we should bother.

So it makes sense to apply this reasoning to little endian as well - arm64 supports it so we can support in on ARM too.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM
  2026-04-02  8:52   ` Ard Biesheuvel
@ 2026-04-02 23:40     ` Eric Biggers
  2026-04-03  6:49       ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2026-04-02 23:40 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan

On Thu, Apr 02, 2026 at 10:52:17AM +0200, Ard Biesheuvel wrote:
> 
> On Wed, 1 Apr 2026, at 21:59, Eric Biggers wrote:
> > On Mon, Mar 30, 2026 at 04:46:31PM +0200, Ard Biesheuvel wrote:
> >> Apply some tweaks to the new arm64 crc64 NEON intrinsics code, and wire
> >> it up for the 32-bit ARM build. Note that true 32-bit ARM CPUs usually
> >> don't implement the prerequisite 64x64 PMULL instructions, but 32-bit
> >> kernels are commonly used on 64-bit capable hardware too, which do
> >> implement the 32-bit versions of the crypto instructions if they are
> >> implemented for the 64-bit ISA (as per the architecture).
> >> 
> >> Cc: Demian Shulhan <demyansh@gmail.com>
> >> Cc: Eric Biggers <ebiggers@kernel.org>
> >> 
> >> Ard Biesheuvel (5):
> >>   lib/crc: arm64: Drop unnecessary chunking logic from crc64
> >>   lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
> >>   ARM: Add a neon-intrinsics.h header like on arm64
> >>   lib/crc: arm64: Simplify intrinsics implementation
> >>   lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
> >
> > I think patches 3 and 4 should be swapped, so it's cleanups first (which
> > make sense regardless of the 32-bit ARM support) and then the 32-bit ARM
> > support.
> >
> 
> Ok.

I can also apply patches 1-2 and 4 now if you want.  Let me know if I
should do that or if a new version is coming.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM
  2026-04-02 23:40     ` Eric Biggers
@ 2026-04-03  6:49       ` Ard Biesheuvel
  2026-04-03 19:59         ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2026-04-03  6:49 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan



On Fri, 3 Apr 2026, at 01:40, Eric Biggers wrote:
> On Thu, Apr 02, 2026 at 10:52:17AM +0200, Ard Biesheuvel wrote:
>> 
>> On Wed, 1 Apr 2026, at 21:59, Eric Biggers wrote:
>> > On Mon, Mar 30, 2026 at 04:46:31PM +0200, Ard Biesheuvel wrote:
>> >> Apply some tweaks to the new arm64 crc64 NEON intrinsics code, and wire
>> >> it up for the 32-bit ARM build. Note that true 32-bit ARM CPUs usually
>> >> don't implement the prerequisite 64x64 PMULL instructions, but 32-bit
>> >> kernels are commonly used on 64-bit capable hardware too, which do
>> >> implement the 32-bit versions of the crypto instructions if they are
>> >> implemented for the 64-bit ISA (as per the architecture).
>> >> 
>> >> Cc: Demian Shulhan <demyansh@gmail.com>
>> >> Cc: Eric Biggers <ebiggers@kernel.org>
>> >> 
>> >> Ard Biesheuvel (5):
>> >>   lib/crc: arm64: Drop unnecessary chunking logic from crc64
>> >>   lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
>> >>   ARM: Add a neon-intrinsics.h header like on arm64
>> >>   lib/crc: arm64: Simplify intrinsics implementation
>> >>   lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
>> >
>> > I think patches 3 and 4 should be swapped, so it's cleanups first (which
>> > make sense regardless of the 32-bit ARM support) and then the 32-bit ARM
>> > support.
>> >
>> 
>> Ok.
>
> I can also apply patches 1-2 and 4 now if you want.  Let me know if I
> should do that or if a new version is coming.
>

Yes, good idea. I'll take care of the ARM stuff next cycle.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM
  2026-04-03  6:49       ` Ard Biesheuvel
@ 2026-04-03 19:59         ` Eric Biggers
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2026-04-03 19:59 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, Demian Shulhan

On Fri, Apr 03, 2026 at 08:49:04AM +0200, Ard Biesheuvel wrote:
> 
> 
> On Fri, 3 Apr 2026, at 01:40, Eric Biggers wrote:
> > On Thu, Apr 02, 2026 at 10:52:17AM +0200, Ard Biesheuvel wrote:
> >> 
> >> On Wed, 1 Apr 2026, at 21:59, Eric Biggers wrote:
> >> > On Mon, Mar 30, 2026 at 04:46:31PM +0200, Ard Biesheuvel wrote:
> >> >> Apply some tweaks to the new arm64 crc64 NEON intrinsics code, and wire
> >> >> it up for the 32-bit ARM build. Note that true 32-bit ARM CPUs usually
> >> >> don't implement the prerequisite 64x64 PMULL instructions, but 32-bit
> >> >> kernels are commonly used on 64-bit capable hardware too, which do
> >> >> implement the 32-bit versions of the crypto instructions if they are
> >> >> implemented for the 64-bit ISA (as per the architecture).
> >> >> 
> >> >> Cc: Demian Shulhan <demyansh@gmail.com>
> >> >> Cc: Eric Biggers <ebiggers@kernel.org>
> >> >> 
> >> >> Ard Biesheuvel (5):
> >> >>   lib/crc: arm64: Drop unnecessary chunking logic from crc64
> >> >>   lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
> >> >>   ARM: Add a neon-intrinsics.h header like on arm64
> >> >>   lib/crc: arm64: Simplify intrinsics implementation
> >> >>   lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
> >> >
> >> > I think patches 3 and 4 should be swapped, so it's cleanups first (which
> >> > make sense regardless of the 32-bit ARM support) and then the 32-bit ARM
> >> > support.
> >> >
> >> 
> >> Ok.
> >
> > I can also apply patches 1-2 and 4 now if you want.  Let me know if I
> > should do that or if a new version is coming.
> >
> 
> Yes, good idea. I'll take care of the ARM stuff next cycle.

I've applied patches 1-2 and 4 to
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=crc-next

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-04-03 20:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 14:46 [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Ard Biesheuvel
2026-03-30 14:46 ` [PATCH 1/5] lib/crc: arm64: Drop unnecessary chunking logic from crc64 Ard Biesheuvel
2026-03-31 22:33   ` Eric Biggers
2026-04-01  0:09     ` Eric Biggers
2026-04-01  6:57     ` Ard Biesheuvel
2026-03-30 14:46 ` [PATCH 2/5] lib/crc: arm64: Use existing macros for kernel-mode FPU cflags Ard Biesheuvel
2026-03-30 14:46 ` [PATCH 3/5] ARM: Add a neon-intrinsics.h header like on arm64 Ard Biesheuvel
2026-03-30 14:46 ` [PATCH 4/5] lib/crc: arm64: Simplify intrinsics implementation Ard Biesheuvel
2026-03-30 14:46 ` [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64 Ard Biesheuvel
2026-03-31  6:47   ` Christoph Hellwig
2026-03-31  8:20     ` Ard Biesheuvel
2026-03-31 22:41   ` Eric Biggers
2026-04-01 16:48     ` Ard Biesheuvel
2026-04-01 19:59 ` [PATCH 0/5] crc64: Tweak intrinsics code and enable it for ARM Eric Biggers
2026-04-02  8:52   ` Ard Biesheuvel
2026-04-02 23:40     ` Eric Biggers
2026-04-03  6:49       ` Ard Biesheuvel
2026-04-03 19:59         ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox