* [PATCH 0/9] POLYVAL library
@ 2025-11-09 23:47 Eric Biggers
2025-11-09 23:47 ` [PATCH 1/9] crypto: polyval - Rename conflicting functions Eric Biggers
` (10 more replies)
0 siblings, 11 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
This series is targeting libcrypto-next. It can also be retrieved from:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git polyval-lib-v1
This series migrates the POLYVAL code to lib/crypto/. It turns out that
just like Poly1305, the library is a much better fit for it.
This series also replaces the generic implementation of POLYVAL with a
much better one.
Notably, this series improves the performance of HCTR2, since it
eliminates unnecessary overhead that was being incurred by accessing
POLYVAL via the crypto_shash API. I see a 45% increase in throughput
with 64-byte messages, 53% with 128-byte, or 6% with 4096-byte.
It also eliminates the need to explicitly enable the optimized POLYVAL
code, as it's now enabled automatically when HCTR2 support is enabled.
Eric Biggers (9):
crypto: polyval - Rename conflicting functions
lib/crypto: polyval: Add POLYVAL library
lib/crypto: tests: Add KUnit tests for POLYVAL
lib/crypto: arm64/polyval: Migrate optimized code into library
lib/crypto: x86/polyval: Migrate optimized code into library
crypto: hctr2 - Convert to use POLYVAL library
crypto: polyval - Remove the polyval crypto_shash
crypto: testmgr - Remove polyval tests
fscrypt: Drop obsolete recommendation to enable optimized POLYVAL
Documentation/filesystems/fscrypt.rst | 2 -
arch/arm64/crypto/Kconfig | 10 -
arch/arm64/crypto/Makefile | 3 -
arch/arm64/crypto/polyval-ce-glue.c | 158 ---------
arch/x86/crypto/Kconfig | 10 -
arch/x86/crypto/Makefile | 3 -
arch/x86/crypto/polyval-clmulni_glue.c | 180 ----------
crypto/Kconfig | 12 +-
crypto/Makefile | 1 -
crypto/hctr2.c | 226 ++++---------
crypto/polyval-generic.c | 205 ------------
crypto/tcrypt.c | 4 -
crypto/testmgr.c | 9 +-
crypto/testmgr.h | 171 ----------
include/crypto/polyval.h | 182 ++++++++++-
lib/crypto/Kconfig | 12 +
lib/crypto/Makefile | 10 +
.../crypto/arm64}/polyval-ce-core.S | 38 +--
lib/crypto/arm64/polyval.h | 82 +++++
lib/crypto/polyval.c | 307 ++++++++++++++++++
lib/crypto/tests/Kconfig | 9 +
lib/crypto/tests/Makefile | 1 +
lib/crypto/tests/polyval-testvecs.h | 186 +++++++++++
lib/crypto/tests/polyval_kunit.c | 223 +++++++++++++
.../crypto/x86/polyval-pclmul-avx.S | 40 ++-
lib/crypto/x86/polyval.h | 83 +++++
scripts/crypto/gen-hash-testvecs.py | 47 ++-
27 files changed, 1240 insertions(+), 974 deletions(-)
delete mode 100644 arch/arm64/crypto/polyval-ce-glue.c
delete mode 100644 arch/x86/crypto/polyval-clmulni_glue.c
delete mode 100644 crypto/polyval-generic.c
rename {arch/arm64/crypto => lib/crypto/arm64}/polyval-ce-core.S (92%)
create mode 100644 lib/crypto/arm64/polyval.h
create mode 100644 lib/crypto/polyval.c
create mode 100644 lib/crypto/tests/polyval-testvecs.h
create mode 100644 lib/crypto/tests/polyval_kunit.c
rename arch/x86/crypto/polyval-clmulni_asm.S => lib/crypto/x86/polyval-pclmul-avx.S (91%)
create mode 100644 lib/crypto/x86/polyval.h
base-commit: ce59a87d1cbd3fa075aba73efde946e61d5ef089
--
2.51.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/9] crypto: polyval - Rename conflicting functions
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library Eric Biggers
` (9 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
Rename polyval_init() and polyval_update(), in preparation for adding
library functions with the same name to <crypto/polyval.h>.
Note that polyval-generic.c will be removed later, as it will be
superseded by the library. This commit just keeps the kernel building
for the initial introduction of the library.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
crypto/polyval-generic.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/crypto/polyval-generic.c b/crypto/polyval-generic.c
index db8adb56e4ca..fe5b01a4000d 100644
--- a/crypto/polyval-generic.c
+++ b/crypto/polyval-generic.c
@@ -97,21 +97,21 @@ static int polyval_setkey(struct crypto_shash *tfm,
return -ENOMEM;
return 0;
}
-static int polyval_init(struct shash_desc *desc)
+static int polyval_generic_init(struct shash_desc *desc)
{
struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
memset(dctx, 0, sizeof(*dctx));
return 0;
}
-static int polyval_update(struct shash_desc *desc,
- const u8 *src, unsigned int srclen)
+static int polyval_generic_update(struct shash_desc *desc,
+ const u8 *src, unsigned int srclen)
{
struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
const struct polyval_tfm_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 tmp[POLYVAL_BLOCK_SIZE];
@@ -133,11 +133,11 @@ static int polyval_finup(struct shash_desc *desc, const u8 *src,
if (len) {
u8 tmp[POLYVAL_BLOCK_SIZE] = {};
memcpy(tmp, src, len);
- polyval_update(desc, tmp, POLYVAL_BLOCK_SIZE);
+ polyval_generic_update(desc, tmp, POLYVAL_BLOCK_SIZE);
}
copy_and_reverse(dst, dctx->buffer);
return 0;
}
@@ -164,12 +164,12 @@ static void polyval_exit_tfm(struct crypto_shash *tfm)
gf128mul_free_4k(ctx->gf128);
}
static struct shash_alg polyval_alg = {
.digestsize = POLYVAL_DIGEST_SIZE,
- .init = polyval_init,
- .update = polyval_update,
+ .init = polyval_generic_init,
+ .update = polyval_generic_update,
.finup = polyval_finup,
.setkey = polyval_setkey,
.export = polyval_export,
.import = polyval_import,
.exit_tfm = polyval_exit_tfm,
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
2025-11-09 23:47 ` [PATCH 1/9] crypto: polyval - Rename conflicting functions Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-10 15:21 ` Ard Biesheuvel
2025-11-09 23:47 ` [PATCH 3/9] lib/crypto: tests: Add KUnit tests for POLYVAL Eric Biggers
` (8 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
Add support for POLYVAL to lib/crypto/.
This will replace the polyval crypto_shash algorithm and its use in the
hctr2 template, simplifying the code and reducing overhead.
Specifically, this commit introduces the POLYVAL library API and a
generic implementation of it. Later commits will migrate the existing
architecture-optimized implementations of POLYVAL into lib/crypto/ and
add a KUnit test suite.
I've also rewritten the generic implementation completely, using a more
modern approach instead of the traditional table-based approach. It's
now constant-time, requires no precomputation or dynamic memory
allocations, decreases the per-key memory usage from 4096 bytes to 16
bytes, and is faster than the old polyval-generic even on bulk data
reusing the same key (at least on x86_64, where I measured 15% faster).
We should do this for GHASH too, but for now just do it for POLYVAL.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
include/crypto/polyval.h | 171 +++++++++++++++++++++-
lib/crypto/Kconfig | 10 ++
lib/crypto/Makefile | 8 +
lib/crypto/polyval.c | 307 +++++++++++++++++++++++++++++++++++++++
4 files changed, 493 insertions(+), 3 deletions(-)
create mode 100644 lib/crypto/polyval.c
diff --git a/include/crypto/polyval.h b/include/crypto/polyval.h
index d2e63743e592..5ba4c248cad1 100644
--- a/include/crypto/polyval.h
+++ b/include/crypto/polyval.h
@@ -1,14 +1,179 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
- * Common values for the Polyval hash algorithm
+ * POLYVAL library API
*
- * Copyright 2021 Google LLC
+ * Copyright 2025 Google LLC
*/
#ifndef _CRYPTO_POLYVAL_H
#define _CRYPTO_POLYVAL_H
+#include <linux/string.h>
+#include <linux/types.h>
+
#define POLYVAL_BLOCK_SIZE 16
#define POLYVAL_DIGEST_SIZE 16
+/**
+ * struct polyval_elem - An element of the POLYVAL finite field
+ * @bytes: View of the element as a byte array (unioned with @lo and @hi)
+ * @lo: The low 64 terms of the element's polynomial
+ * @hi: The high 64 terms of the element's polynomial
+ *
+ * This represents an element of the finite field GF(2^128), using the POLYVAL
+ * convention: little-endian byte order and natural bit order.
+ */
+struct polyval_elem {
+ union {
+ u8 bytes[POLYVAL_BLOCK_SIZE];
+ struct {
+ __le64 lo;
+ __le64 hi;
+ };
+ };
+};
+
+/**
+ * struct polyval_key - Prepared key for POLYVAL
+ *
+ * This may contain just the raw key H, or it may contain precomputed key
+ * powers, depending on the platform's POLYVAL implementation. Use
+ * polyval_preparekey() to initialize this.
+ */
+struct polyval_key {
+#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#error "Unhandled arch"
+#else /* CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
+ /** @h: The hash key H */
+ struct polyval_elem h;
+#endif /* !CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
+};
+
+/**
+ * struct polyval_ctx - Context for computing a POLYVAL value
+ * @key: Pointer to the prepared POLYVAL key. The user of the API is
+ * responsible for ensuring that the key lives as long as the context.
+ * @acc: The accumulator
+ * @partial: Number of data bytes processed so far modulo POLYVAL_BLOCK_SIZE
+ */
+struct polyval_ctx {
+ const struct polyval_key *key;
+ struct polyval_elem acc;
+ size_t partial;
+};
+
+/**
+ * polyval_preparekey() - Prepare a POLYVAL key
+ * @key: (output) The key structure to initialize
+ * @raw_key: The raw hash key
+ *
+ * Initialize a POLYVAL key structure from a raw key. This may be a simple
+ * copy, or it may involve precomputing powers of the key, depending on the
+ * platform's POLYVAL implementation.
+ *
+ * Context: Any context.
+ */
+#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+void polyval_preparekey(struct polyval_key *key,
+ const u8 raw_key[POLYVAL_BLOCK_SIZE]);
+
+#else
+static inline void polyval_preparekey(struct polyval_key *key,
+ const u8 raw_key[POLYVAL_BLOCK_SIZE])
+{
+ /* Just a simple copy, so inline it. */
+ memcpy(key->h.bytes, raw_key, POLYVAL_BLOCK_SIZE);
+}
#endif
+
+/**
+ * polyval_init() - Initialize a POLYVAL context for a new message
+ * @ctx: The context to initialize
+ * @key: The key to use. Note that a pointer to the key is saved in the
+ * context, so the key must live at least as long as the context.
+ */
+static inline void polyval_init(struct polyval_ctx *ctx,
+ const struct polyval_key *key)
+{
+ *ctx = (struct polyval_ctx){ .key = key };
+}
+
+/**
+ * polyval_import_blkaligned() - Import a POLYVAL accumulator value
+ * @ctx: The context to initialize
+ * @key: The key to import. Note that a pointer to the key is saved in the
+ * context, so the key must live at least as long as the context.
+ * @acc: The accumulator value to import.
+ *
+ * This imports an accumulator that was saved by polyval_export_blkaligned().
+ * The same key must be used.
+ */
+static inline void
+polyval_import_blkaligned(struct polyval_ctx *ctx,
+ const struct polyval_key *key,
+ const struct polyval_elem *acc)
+{
+ *ctx = (struct polyval_ctx){ .key = key, .acc = *acc };
+}
+
+/**
+ * polyval_export_blkaligned() - Export a POLYVAL accumulator value
+ * @ctx: The context to export the accumulator value from
+ * @acc: (output) The exported accumulator value
+ *
+ * This exports the accumulator from a POLYVAL context. The number of data
+ * bytes processed so far must be a multiple of POLYVAL_BLOCK_SIZE.
+ */
+static inline void polyval_export_blkaligned(const struct polyval_ctx *ctx,
+ struct polyval_elem *acc)
+{
+ *acc = ctx->acc;
+}
+
+/**
+ * polyval_update() - Update a POLYVAL context with message data
+ * @ctx: The context to update; must have been initialized
+ * @data: The message data
+ * @len: The data length in bytes. Doesn't need to be block-aligned.
+ *
+ * This can be called any number of times.
+ *
+ * Context: Any context.
+ */
+void polyval_update(struct polyval_ctx *ctx, const u8 *data, size_t len);
+
+/**
+ * polyval_final() - Finish computing a POLYVAL value
+ * @ctx: The context to finalize
+ * @out: The output value
+ *
+ * If the total data length isn't a multiple of POLYVAL_BLOCK_SIZE, then the
+ * final block is automatically zero-padded.
+ *
+ * After finishing, this zeroizes @ctx. So the caller does not need to do it.
+ *
+ * Context: Any context.
+ */
+void polyval_final(struct polyval_ctx *ctx, u8 out[POLYVAL_BLOCK_SIZE]);
+
+/**
+ * polyval() - Compute a POLYVAL value
+ * @key: The prepared key
+ * @data: The message data
+ * @len: The data length in bytes. Doesn't need to be block-aligned.
+ * @out: The output value
+ *
+ * Context: Any context.
+ */
+static inline void polyval(const struct polyval_key *key,
+ const u8 *data, size_t len,
+ u8 out[POLYVAL_BLOCK_SIZE])
+{
+ struct polyval_ctx ctx;
+
+ polyval_init(&ctx, key);
+ polyval_update(&ctx, data, len);
+ polyval_final(&ctx, out);
+}
+
+#endif /* _CRYPTO_POLYVAL_H */
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 7445054fc0ad..6545f0e83b83 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -133,10 +133,20 @@ config CRYPTO_LIB_POLY1305_RSIZE
default 2 if MIPS || RISCV
default 11 if X86_64
default 9 if ARM || ARM64
default 1
+config CRYPTO_LIB_POLYVAL
+ tristate
+ help
+ The POLYVAL library functions. Select this if your module uses any of
+ the functions from <crypto/polyval.h>.
+
+config CRYPTO_LIB_POLYVAL_ARCH
+ bool
+ depends on CRYPTO_LIB_POLYVAL && !UML
+
config CRYPTO_LIB_CHACHA20POLY1305
tristate
select CRYPTO_LIB_CHACHA
select CRYPTO_LIB_POLY1305
select CRYPTO_LIB_UTILS
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 5515e73bfd5e..055e44008805 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -196,10 +196,18 @@ clean-files += arm/poly1305-core.S \
riscv/poly1305-core.S \
x86/poly1305-x86_64-cryptogams.S
################################################################################
+obj-$(CONFIG_CRYPTO_LIB_POLYVAL) += libpolyval.o
+libpolyval-y := polyval.o
+ifeq ($(CONFIG_CRYPTO_LIB_POLYVAL_ARCH),y)
+CFLAGS_polyval.o += -I$(src)/$(SRCARCH)
+endif
+
+################################################################################
+
obj-$(CONFIG_CRYPTO_LIB_SHA1) += libsha1.o
libsha1-y := sha1.o
ifeq ($(CONFIG_CRYPTO_LIB_SHA1_ARCH),y)
CFLAGS_sha1.o += -I$(src)/$(SRCARCH)
ifeq ($(CONFIG_ARM),y)
diff --git a/lib/crypto/polyval.c b/lib/crypto/polyval.c
new file mode 100644
index 000000000000..5796275f574a
--- /dev/null
+++ b/lib/crypto/polyval.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * POLYVAL library functions
+ *
+ * Copyright 2025 Google LLC
+ */
+
+#include <crypto/polyval.h>
+#include <linux/export.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+/*
+ * POLYVAL is an almost-XOR-universal hash function. Similar to GHASH, POLYVAL
+ * interprets the message as the coefficients of a polynomial in GF(2^128) and
+ * evaluates that polynomial at a secret point. POLYVAL has a simple
+ * mathematical relationship with GHASH, but it uses a better field convention
+ * which makes it easier and faster to implement.
+ *
+ * POLYVAL is not a cryptographic hash function, and it should be used only by
+ * algorithms that are specifically designed to use it.
+ *
+ * POLYVAL is specified by "AES-GCM-SIV: Nonce Misuse-Resistant Authenticated
+ * Encryption" (https://datatracker.ietf.org/doc/html/rfc8452)
+ *
+ * POLYVAL is also used by HCTR2. See "Length-preserving encryption with HCTR2"
+ * (https://eprint.iacr.org/2021/1441.pdf).
+ *
+ * This file provides a library API for POLYVAL. This API can delegate to
+ * either a generic implementation or an architecture-optimized implementation.
+ *
+ * For the generic implementation, we don't use the traditional table approach
+ * to GF(2^128) multiplication. That approach is not constant-time and requires
+ * a lot of memory. Instead, we use a different approach which emulates
+ * carryless multiplication using standard multiplications by spreading the data
+ * bits apart using "holes". This allows the carries to spill harmlessly. This
+ * approach is borrowed from BoringSSL, which in turn credits BearSSL's
+ * documentation (https://bearssl.org/constanttime.html#ghash-for-gcm) for the
+ * "holes" trick and a presentation by Shay Gueron
+ * (https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf) for the
+ * 256-bit => 128-bit reduction algorithm.
+ */
+
+#ifdef CONFIG_ARCH_SUPPORTS_INT128
+
+/* Do a 64 x 64 => 128 bit carryless multiplication. */
+static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
+{
+ /*
+ * With 64-bit multiplicands and one term every 4 bits, there would be
+ * up to 64 / 4 = 16 one bits per column when each multiplication is
+ * written out as a series of additions in the schoolbook manner.
+ * Unfortunately, that doesn't work since the value 16 is 1 too large to
+ * fit in 4 bits. Carries would sometimes overflow into the next term.
+ *
+ * Using one term every 5 bits would work. However, that would cost
+ * 5 x 5 = 25 multiplications instead of 4 x 4 = 16.
+ *
+ * Instead, mask off 4 bits from one multiplicand, giving a max of 15
+ * one bits per column. Then handle those 4 bits separately.
+ */
+ u64 a0 = a & 0x1111111111111110;
+ u64 a1 = a & 0x2222222222222220;
+ u64 a2 = a & 0x4444444444444440;
+ u64 a3 = a & 0x8888888888888880;
+
+ u64 b0 = b & 0x1111111111111111;
+ u64 b1 = b & 0x2222222222222222;
+ u64 b2 = b & 0x4444444444444444;
+ u64 b3 = b & 0x8888888888888888;
+
+ /* Multiply the high 60 bits of @a by @b. */
+ u128 c0 = (a0 * (u128)b0) ^ (a1 * (u128)b3) ^
+ (a2 * (u128)b2) ^ (a3 * (u128)b1);
+ u128 c1 = (a0 * (u128)b1) ^ (a1 * (u128)b0) ^
+ (a2 * (u128)b3) ^ (a3 * (u128)b2);
+ u128 c2 = (a0 * (u128)b2) ^ (a1 * (u128)b1) ^
+ (a2 * (u128)b0) ^ (a3 * (u128)b3);
+ u128 c3 = (a0 * (u128)b3) ^ (a1 * (u128)b2) ^
+ (a2 * (u128)b1) ^ (a3 * (u128)b0);
+
+ /* Multiply the low 4 bits of @a by @b. */
+ u64 e0 = -(a & 1) & b;
+ u64 e1 = -((a >> 1) & 1) & b;
+ u64 e2 = -((a >> 2) & 1) & b;
+ u64 e3 = -((a >> 3) & 1) & b;
+ u64 extra_lo = e0 ^ (e1 << 1) ^ (e2 << 2) ^ (e3 << 3);
+ u64 extra_hi = (e1 >> 63) ^ (e2 >> 62) ^ (e3 >> 61);
+
+ /* Add all the intermediate products together. */
+ *out_lo = (((u64)c0) & 0x1111111111111111) ^
+ (((u64)c1) & 0x2222222222222222) ^
+ (((u64)c2) & 0x4444444444444444) ^
+ (((u64)c3) & 0x8888888888888888) ^ extra_lo;
+ *out_hi = (((u64)(c0 >> 64)) & 0x1111111111111111) ^
+ (((u64)(c1 >> 64)) & 0x2222222222222222) ^
+ (((u64)(c2 >> 64)) & 0x4444444444444444) ^
+ (((u64)(c3 >> 64)) & 0x8888888888888888) ^ extra_hi;
+}
+
+#else /* CONFIG_ARCH_SUPPORTS_INT128 */
+
+/* Do a 32 x 32 => 64 bit carryless multiplication. */
+static u64 clmul32(u32 a, u32 b)
+{
+ /*
+ * With 32-bit multiplicands and one term every 4 bits, there are up to
+ * 32 / 4 = 8 one bits per column when each multiplication is written
+ * out as a series of additions in the schoolbook manner. The value 8
+ * fits in 4 bits, so the carries don't overflow into the next term.
+ */
+ u32 a0 = a & 0x11111111;
+ u32 a1 = a & 0x22222222;
+ u32 a2 = a & 0x44444444;
+ u32 a3 = a & 0x88888888;
+
+ u32 b0 = b & 0x11111111;
+ u32 b1 = b & 0x22222222;
+ u32 b2 = b & 0x44444444;
+ u32 b3 = b & 0x88888888;
+
+ u64 c0 = (a0 * (u64)b0) ^ (a1 * (u64)b3) ^
+ (a2 * (u64)b2) ^ (a3 * (u64)b1);
+ u64 c1 = (a0 * (u64)b1) ^ (a1 * (u64)b0) ^
+ (a2 * (u64)b3) ^ (a3 * (u64)b2);
+ u64 c2 = (a0 * (u64)b2) ^ (a1 * (u64)b1) ^
+ (a2 * (u64)b0) ^ (a3 * (u64)b3);
+ u64 c3 = (a0 * (u64)b3) ^ (a1 * (u64)b2) ^
+ (a2 * (u64)b1) ^ (a3 * (u64)b0);
+
+ /* Add all the intermediate products together. */
+ return (c0 & 0x1111111111111111) ^
+ (c1 & 0x2222222222222222) ^
+ (c2 & 0x4444444444444444) ^
+ (c3 & 0x8888888888888888);
+}
+
+/* Do a 64 x 64 => 128 bit carryless multiplication. */
+static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
+{
+ u32 a_lo = (u32)a;
+ u32 a_hi = a >> 32;
+ u32 b_lo = (u32)b;
+ u32 b_hi = b >> 32;
+
+ /* Karatsuba multiplication */
+ u64 lo = clmul32(a_lo, b_lo);
+ u64 hi = clmul32(a_hi, b_hi);
+ u64 mi = clmul32(a_lo ^ a_hi, b_lo ^ b_hi) ^ lo ^ hi;
+
+ *out_lo = lo ^ (mi << 32);
+ *out_hi = hi ^ (mi >> 32);
+}
+#endif /* !CONFIG_ARCH_SUPPORTS_INT128 */
+
+/* Compute @a = @a * @b * x^-128 in the POLYVAL field. */
+static void __maybe_unused
+polyval_mul_generic(struct polyval_elem *a, const struct polyval_elem *b)
+{
+ u64 c0, c1, c2, c3, mi0, mi1;
+
+ /*
+ * Carryless-multiply @a by @b using Karatsuba multiplication. Store
+ * the 256-bit product in @c0 (low) through @c3 (high).
+ */
+ clmul64(le64_to_cpu(a->lo), le64_to_cpu(b->lo), &c0, &c1);
+ clmul64(le64_to_cpu(a->hi), le64_to_cpu(b->hi), &c2, &c3);
+ clmul64(le64_to_cpu(a->lo ^ a->hi), le64_to_cpu(b->lo ^ b->hi),
+ &mi0, &mi1);
+ mi0 ^= c0 ^ c2;
+ mi1 ^= c1 ^ c3;
+ c1 ^= mi0;
+ c2 ^= mi1;
+
+ /*
+ * Cancel out the low 128 bits of the product by adding multiples of
+ * G(x) = x^128 + x^127 + x^126 + x^121 + 1. Do this in two steps, each
+ * of which cancels out 64 bits. Note that we break G(x) into three
+ * parts: 1, x^64 * (x^63 + x^62 + x^57), and x^128 * 1.
+ */
+
+ /*
+ * First, add G(x) times c0 as follows:
+ *
+ * (c0, c1, c2) = (0,
+ * c1 + (c0 * (x^63 + x^62 + x^57) mod x^64),
+ * c2 + c0 + floor((c0 * (x^63 + x^62 + x^57)) / x^64))
+ */
+ c1 ^= (c0 << 63) ^ (c0 << 62) ^ (c0 << 57);
+ c2 ^= c0 ^ (c0 >> 1) ^ (c0 >> 2) ^ (c0 >> 7);
+
+ /*
+ * Second, add G(x) times the new c1:
+ *
+ * (c1, c2, c3) = (0,
+ * c2 + (c1 * (x^63 + x^62 + x^57) mod x^64),
+ * c3 + c1 + floor((c1 * (x^63 + x^62 + x^57)) / x^64))
+ */
+ c2 ^= (c1 << 63) ^ (c1 << 62) ^ (c1 << 57);
+ c3 ^= c1 ^ (c1 >> 1) ^ (c1 >> 2) ^ (c1 >> 7);
+
+ /* Return (c2, c3). This implicitly multiplies by x^-128. */
+ a->lo = cpu_to_le64(c2);
+ a->hi = cpu_to_le64(c3);
+}
+
+static void __maybe_unused
+polyval_blocks_generic(struct polyval_elem *acc, const struct polyval_elem *key,
+ const u8 *data, size_t nblocks)
+{
+ do {
+ acc->lo ^= get_unaligned((__le64 *)data);
+ acc->hi ^= get_unaligned((__le64 *)(data + 8));
+ polyval_mul_generic(acc, key);
+ data += POLYVAL_BLOCK_SIZE;
+ } while (--nblocks);
+}
+
+/* Include the arch-optimized implementation of POLYVAL, if one is available. */
+#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#include "polyval.h" /* $(SRCARCH)/polyval.h */
+void polyval_preparekey(struct polyval_key *key,
+ const u8 raw_key[POLYVAL_BLOCK_SIZE])
+{
+ polyval_preparekey_arch(key, raw_key);
+}
+EXPORT_SYMBOL_GPL(polyval_preparekey);
+#endif /* Else, polyval_preparekey() is an inline function. */
+
+/*
+ * polyval_mul_generic() and polyval_blocks_generic() take the key as a
+ * polyval_elem rather than a polyval_key, so that arch-optimized
+ * implementations with a different key format can use it as a fallback (if they
+ * have H^1 stored somewhere in their struct). Thus, the following dispatch
+ * code is needed to pass the appropriate key argument.
+ */
+
+static void polyval_mul(struct polyval_ctx *ctx)
+{
+#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+ polyval_mul_arch(&ctx->acc, ctx->key);
+#else
+ polyval_mul_generic(&ctx->acc, &ctx->key->h);
+#endif
+}
+
+static void polyval_blocks(struct polyval_ctx *ctx,
+ const u8 *data, size_t nblocks)
+{
+#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+ polyval_blocks_arch(&ctx->acc, ctx->key, data, nblocks);
+#else
+ polyval_blocks_generic(&ctx->acc, &ctx->key->h, data, nblocks);
+#endif
+}
+
+void polyval_update(struct polyval_ctx *ctx, const u8 *data, size_t len)
+{
+ if (unlikely(ctx->partial)) {
+ size_t n = min(len, POLYVAL_BLOCK_SIZE - ctx->partial);
+
+ len -= n;
+ while (n--)
+ ctx->acc.bytes[ctx->partial++] ^= *data++;
+ if (ctx->partial < POLYVAL_BLOCK_SIZE)
+ return;
+ polyval_mul(ctx);
+ }
+ if (len >= POLYVAL_BLOCK_SIZE) {
+ size_t nblocks = len / POLYVAL_BLOCK_SIZE;
+
+ polyval_blocks(ctx, data, nblocks);
+ data += len & ~(POLYVAL_BLOCK_SIZE - 1);
+ len &= POLYVAL_BLOCK_SIZE - 1;
+ }
+ for (size_t i = 0; i < len; i++)
+ ctx->acc.bytes[i] ^= data[i];
+ ctx->partial = len;
+}
+EXPORT_SYMBOL_GPL(polyval_update);
+
+void polyval_final(struct polyval_ctx *ctx, u8 out[POLYVAL_BLOCK_SIZE])
+{
+ if (unlikely(ctx->partial))
+ polyval_mul(ctx);
+ memcpy(out, &ctx->acc, POLYVAL_BLOCK_SIZE);
+ memzero_explicit(ctx, sizeof(*ctx));
+}
+EXPORT_SYMBOL_GPL(polyval_final);
+
+#ifdef polyval_mod_init_arch
+static int __init polyval_mod_init(void)
+{
+ polyval_mod_init_arch();
+ return 0;
+}
+subsys_initcall(polyval_mod_init);
+
+static void __exit polyval_mod_exit(void)
+{
+}
+module_exit(polyval_mod_exit);
+#endif
+
+MODULE_DESCRIPTION("POLYVAL almost-XOR-universal hash function");
+MODULE_LICENSE("GPL");
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/9] lib/crypto: tests: Add KUnit tests for POLYVAL
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
2025-11-09 23:47 ` [PATCH 1/9] crypto: polyval - Rename conflicting functions Eric Biggers
2025-11-09 23:47 ` [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 4/9] lib/crypto: arm64/polyval: Migrate optimized code into library Eric Biggers
` (7 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
Add a test suite for the POLYVAL library, including:
- All the standard tests and the benchmark from hash-test-template.h
- Comparison with a test vector from the RFC
- Test with key and message containing all one bits
- Additional tests related to the key struct
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
lib/crypto/tests/Kconfig | 9 ++
lib/crypto/tests/Makefile | 1 +
lib/crypto/tests/polyval-testvecs.h | 186 +++++++++++++++++++++++
lib/crypto/tests/polyval_kunit.c | 223 ++++++++++++++++++++++++++++
scripts/crypto/gen-hash-testvecs.py | 47 +++++-
5 files changed, 464 insertions(+), 2 deletions(-)
create mode 100644 lib/crypto/tests/polyval-testvecs.h
create mode 100644 lib/crypto/tests/polyval_kunit.c
diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig
index 140afd1714ba..61d435c450bb 100644
--- a/lib/crypto/tests/Kconfig
+++ b/lib/crypto/tests/Kconfig
@@ -45,10 +45,19 @@ config CRYPTO_LIB_POLY1305_KUNIT_TEST
select CRYPTO_LIB_BENCHMARK_VISIBLE
select CRYPTO_LIB_POLY1305
help
KUnit tests for the Poly1305 library functions.
+config CRYPTO_LIB_POLYVAL_KUNIT_TEST
+ tristate "KUnit tests for POLYVAL" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ default KUNIT_ALL_TESTS || CRYPTO_SELFTESTS
+ select CRYPTO_LIB_BENCHMARK_VISIBLE
+ select CRYPTO_LIB_POLYVAL
+ help
+ KUnit tests for the POLYVAL library functions.
+
config CRYPTO_LIB_SHA1_KUNIT_TEST
tristate "KUnit tests for SHA-1" if !KUNIT_ALL_TESTS
depends on KUNIT
default KUNIT_ALL_TESTS || CRYPTO_SELFTESTS
select CRYPTO_LIB_BENCHMARK_VISIBLE
diff --git a/lib/crypto/tests/Makefile b/lib/crypto/tests/Makefile
index f7d1392dc847..5109a0651925 100644
--- a/lib/crypto/tests/Makefile
+++ b/lib/crypto/tests/Makefile
@@ -3,9 +3,10 @@
obj-$(CONFIG_CRYPTO_LIB_BLAKE2B_KUNIT_TEST) += blake2b_kunit.o
obj-$(CONFIG_CRYPTO_LIB_BLAKE2S_KUNIT_TEST) += blake2s_kunit.o
obj-$(CONFIG_CRYPTO_LIB_CURVE25519_KUNIT_TEST) += curve25519_kunit.o
obj-$(CONFIG_CRYPTO_LIB_MD5_KUNIT_TEST) += md5_kunit.o
obj-$(CONFIG_CRYPTO_LIB_POLY1305_KUNIT_TEST) += poly1305_kunit.o
+obj-$(CONFIG_CRYPTO_LIB_POLYVAL_KUNIT_TEST) += polyval_kunit.o
obj-$(CONFIG_CRYPTO_LIB_SHA1_KUNIT_TEST) += sha1_kunit.o
obj-$(CONFIG_CRYPTO_LIB_SHA256_KUNIT_TEST) += sha224_kunit.o sha256_kunit.o
obj-$(CONFIG_CRYPTO_LIB_SHA512_KUNIT_TEST) += sha384_kunit.o sha512_kunit.o
obj-$(CONFIG_CRYPTO_LIB_SHA3_KUNIT_TEST) += sha3_kunit.o
diff --git a/lib/crypto/tests/polyval-testvecs.h b/lib/crypto/tests/polyval-testvecs.h
new file mode 100644
index 000000000000..3d33f60d58bb
--- /dev/null
+++ b/lib/crypto/tests/polyval-testvecs.h
@@ -0,0 +1,186 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* This file was generated by: ./scripts/crypto/gen-hash-testvecs.py polyval */
+
+static const struct {
+ size_t data_len;
+ u8 digest[POLYVAL_DIGEST_SIZE];
+} hash_testvecs[] = {
+ {
+ .data_len = 0,
+ .digest = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ },
+ },
+ {
+ .data_len = 1,
+ .digest = {
+ 0xb5, 0x51, 0x69, 0x89, 0xd4, 0x3c, 0x59, 0xca,
+ 0x6a, 0x1c, 0x2a, 0xe9, 0xa1, 0x9c, 0x6c, 0x83,
+ },
+ },
+ {
+ .data_len = 2,
+ .digest = {
+ 0xf4, 0x50, 0xaf, 0x07, 0xda, 0x42, 0xa7, 0x41,
+ 0x4d, 0x24, 0x88, 0x87, 0xe3, 0x40, 0x73, 0x7c,
+ },
+ },
+ {
+ .data_len = 3,
+ .digest = {
+ 0x9e, 0x88, 0x78, 0x71, 0x4c, 0x55, 0x87, 0xe8,
+ 0xb4, 0x96, 0x3d, 0x56, 0xc8, 0xb2, 0xe1, 0x68,
+ },
+ },
+ {
+ .data_len = 16,
+ .digest = {
+ 0x9e, 0x81, 0x37, 0x8f, 0x49, 0xf7, 0xa2, 0xe4,
+ 0x04, 0x45, 0x12, 0x78, 0x45, 0x42, 0x27, 0xad,
+ },
+ },
+ {
+ .data_len = 32,
+ .digest = {
+ 0x60, 0x19, 0xd0, 0xa4, 0xf0, 0xde, 0x9e, 0xe7,
+ 0x6a, 0x89, 0x1a, 0xea, 0x80, 0x14, 0xa9, 0xa3,
+ },
+ },
+ {
+ .data_len = 48,
+ .digest = {
+ 0x0c, 0xa2, 0x70, 0x4d, 0x7c, 0x89, 0xac, 0x41,
+ 0xc2, 0x9e, 0x0d, 0x07, 0x07, 0x6a, 0x7f, 0xd5,
+ },
+ },
+ {
+ .data_len = 49,
+ .digest = {
+ 0x91, 0xd3, 0xa9, 0x5c, 0x79, 0x3d, 0x6b, 0x84,
+ 0x99, 0x54, 0xa7, 0xb4, 0x06, 0x66, 0xfd, 0x1c,
+ },
+ },
+ {
+ .data_len = 63,
+ .digest = {
+ 0x29, 0x37, 0xb8, 0xe5, 0xd8, 0x27, 0x4d, 0xfb,
+ 0x83, 0x4f, 0x67, 0xf7, 0xf9, 0xc1, 0x0a, 0x9d,
+ },
+ },
+ {
+ .data_len = 64,
+ .digest = {
+ 0x17, 0xa9, 0x06, 0x2c, 0xf3, 0xe8, 0x2e, 0xa6,
+ 0x6b, 0xb2, 0x1f, 0x5d, 0x94, 0x3c, 0x02, 0xa2,
+ },
+ },
+ {
+ .data_len = 65,
+ .digest = {
+ 0x7c, 0x80, 0x74, 0xd7, 0xa1, 0x37, 0x30, 0x64,
+ 0x3b, 0xa4, 0xa3, 0x98, 0xde, 0x47, 0x10, 0x23,
+ },
+ },
+ {
+ .data_len = 127,
+ .digest = {
+ 0x27, 0x3a, 0xcf, 0xf5, 0xaf, 0x9f, 0xd8, 0xd8,
+ 0x2d, 0x6a, 0x91, 0xfb, 0xb8, 0xfa, 0xbe, 0x0c,
+ },
+ },
+ {
+ .data_len = 128,
+ .digest = {
+ 0x97, 0x6e, 0xc4, 0xbe, 0x6b, 0x15, 0xa6, 0x7c,
+ 0xc4, 0xa2, 0xb8, 0x0a, 0x0e, 0x9c, 0xc7, 0x3a,
+ },
+ },
+ {
+ .data_len = 129,
+ .digest = {
+ 0x2b, 0xc3, 0x98, 0xba, 0x6e, 0x42, 0xf8, 0x18,
+ 0x85, 0x69, 0x15, 0x37, 0x10, 0x60, 0xe6, 0xac,
+ },
+ },
+ {
+ .data_len = 256,
+ .digest = {
+ 0x88, 0x21, 0x77, 0x89, 0xd7, 0x93, 0x90, 0xfc,
+ 0xf3, 0xb0, 0xe3, 0xfb, 0x14, 0xe2, 0xcf, 0x74,
+ },
+ },
+ {
+ .data_len = 511,
+ .digest = {
+ 0x66, 0x3d, 0x3e, 0x08, 0xa0, 0x49, 0x81, 0x68,
+ 0x3e, 0x3b, 0xc8, 0x80, 0x55, 0xd4, 0x15, 0xe9,
+ },
+ },
+ {
+ .data_len = 513,
+ .digest = {
+ 0x05, 0xf5, 0x06, 0x66, 0xe7, 0x11, 0x08, 0x84,
+ 0xff, 0x94, 0x50, 0x85, 0x65, 0x95, 0x2a, 0x20,
+ },
+ },
+ {
+ .data_len = 1000,
+ .digest = {
+ 0xd3, 0xa0, 0x51, 0x69, 0xb5, 0x38, 0xae, 0x1b,
+ 0xe1, 0xa2, 0x89, 0xc6, 0x8d, 0x2b, 0x62, 0x37,
+ },
+ },
+ {
+ .data_len = 3333,
+ .digest = {
+ 0x37, 0x6d, 0x6a, 0x14, 0xdc, 0xa5, 0x37, 0xfc,
+ 0xfe, 0x67, 0x76, 0xb2, 0x64, 0x68, 0x64, 0x05,
+ },
+ },
+ {
+ .data_len = 4096,
+ .digest = {
+ 0xe3, 0x12, 0x0c, 0x58, 0x46, 0x45, 0x27, 0x7a,
+ 0x0e, 0xa2, 0xfa, 0x2c, 0x35, 0x73, 0x6c, 0x94,
+ },
+ },
+ {
+ .data_len = 4128,
+ .digest = {
+ 0x63, 0x0d, 0xa1, 0xbc, 0x6e, 0x3e, 0xd3, 0x1d,
+ 0x28, 0x52, 0xd2, 0xf4, 0x30, 0x2d, 0xff, 0xc4,
+ },
+ },
+ {
+ .data_len = 4160,
+ .digest = {
+ 0xb2, 0x91, 0x49, 0xe2, 0x02, 0x98, 0x00, 0x79,
+ 0x71, 0xb9, 0xd7, 0xd4, 0xb5, 0x94, 0x6d, 0x7d,
+ },
+ },
+ {
+ .data_len = 4224,
+ .digest = {
+ 0x58, 0x96, 0x48, 0x69, 0x05, 0x17, 0xe1, 0x6d,
+ 0xbc, 0xf2, 0x3d, 0x10, 0x96, 0x00, 0x74, 0x58,
+ },
+ },
+ {
+ .data_len = 16384,
+ .digest = {
+ 0x99, 0x3c, 0xcb, 0x4d, 0x64, 0xc9, 0xa9, 0x41,
+ 0x52, 0x93, 0xfd, 0x65, 0xc4, 0xcc, 0xa5, 0xe5,
+ },
+ },
+};
+
+static const u8 hash_testvec_consolidated[POLYVAL_DIGEST_SIZE] = {
+ 0xdf, 0x68, 0x52, 0x99, 0x92, 0xc3, 0xe8, 0x88,
+ 0x29, 0x13, 0xc8, 0x35, 0x67, 0xa3, 0xd3, 0xad,
+};
+
+static const u8 polyval_allones_hashofhashes[POLYVAL_DIGEST_SIZE] = {
+ 0xd5, 0xf7, 0xfd, 0xb2, 0xa6, 0xef, 0x0b, 0x85,
+ 0x0d, 0x0a, 0x06, 0x10, 0xbc, 0x64, 0x94, 0x73,
+};
diff --git a/lib/crypto/tests/polyval_kunit.c b/lib/crypto/tests/polyval_kunit.c
new file mode 100644
index 000000000000..e59f598c1572
--- /dev/null
+++ b/lib/crypto/tests/polyval_kunit.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2025 Google LLC
+ */
+#include <crypto/polyval.h>
+#include "polyval-testvecs.h"
+
+/*
+ * A fixed key used when presenting POLYVAL as an unkeyed hash function in order
+ * to reuse hash-test-template.h. At the beginning of the test suite, this is
+ * initialized to a key prepared from bytes generated from a fixed seed.
+ */
+static struct polyval_key test_key;
+
+static void polyval_init_withtestkey(struct polyval_ctx *ctx)
+{
+ polyval_init(ctx, &test_key);
+}
+
+static void polyval_withtestkey(const u8 *data, size_t len,
+ u8 out[POLYVAL_BLOCK_SIZE])
+{
+ polyval(&test_key, data, len, out);
+}
+
+/* Generate the HASH_KUNIT_CASES using hash-test-template.h. */
+#define HASH polyval_withtestkey
+#define HASH_CTX polyval_ctx
+#define HASH_SIZE POLYVAL_BLOCK_SIZE
+#define HASH_INIT polyval_init_withtestkey
+#define HASH_UPDATE polyval_update
+#define HASH_FINAL polyval_final
+#include "hash-test-template.h"
+
+/*
+ * Test an example from RFC8452 ("AES-GCM-SIV: Nonce Misuse-Resistant
+ * Authenticated Encryption") to ensure compatibility with that.
+ */
+static void test_polyval_rfc8452_testvec(struct kunit *test)
+{
+ static const u8 raw_key[POLYVAL_BLOCK_SIZE] =
+ "\x31\x07\x28\xd9\x91\x1f\x1f\x38"
+ "\x37\xb2\x43\x16\xc3\xfa\xb9\xa0";
+ static const u8 data[48] =
+ "\x65\x78\x61\x6d\x70\x6c\x65\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x48\x65\x6c\x6c\x6f\x20\x77\x6f"
+ "\x72\x6c\x64\x00\x00\x00\x00\x00"
+ "\x38\x00\x00\x00\x00\x00\x00\x00"
+ "\x58\x00\x00\x00\x00\x00\x00\x00";
+ static const u8 expected_hash[POLYVAL_BLOCK_SIZE] =
+ "\xad\x7f\xcf\x0b\x51\x69\x85\x16"
+ "\x62\x67\x2f\x3c\x5f\x95\x13\x8f";
+ u8 hash[POLYVAL_BLOCK_SIZE];
+ struct polyval_key key;
+
+ polyval_preparekey(&key, raw_key);
+ polyval(&key, data, sizeof(data), hash);
+ KUNIT_ASSERT_MEMEQ(test, hash, expected_hash, sizeof(hash));
+}
+
+/*
+ * Test a key and messages containing all one bits. This is useful to detect
+ * overflow bugs in implementations that emulate carryless multiplication using
+ * a series of standard multiplications with the bits spread out.
+ */
+static void test_polyval_allones_key_and_message(struct kunit *test)
+{
+ struct polyval_key key;
+ struct polyval_ctx hashofhashes_ctx;
+ u8 hash[POLYVAL_BLOCK_SIZE];
+
+ static_assert(TEST_BUF_LEN >= 4096);
+ memset(test_buf, 0xff, 4096);
+
+ polyval_preparekey(&key, test_buf);
+ polyval_init(&hashofhashes_ctx, &key);
+ for (size_t len = 0; len <= 4096; len += 16) {
+ polyval(&key, test_buf, len, hash);
+ polyval_update(&hashofhashes_ctx, hash, sizeof(hash));
+ }
+ polyval_final(&hashofhashes_ctx, hash);
+ KUNIT_ASSERT_MEMEQ(test, hash, polyval_allones_hashofhashes,
+ sizeof(hash));
+}
+
+#define MAX_LEN_FOR_KEY_CHECK 1024
+
+/*
+ * Given two prepared keys which should be identical (but may differ in
+ * alignment and/or whether they are followed by a guard page or not), verify
+ * that they produce consistent results on various data lengths.
+ */
+static void check_key_consistency(struct kunit *test,
+ const struct polyval_key *key1,
+ const struct polyval_key *key2)
+{
+ u8 *data = test_buf;
+ u8 hash1[POLYVAL_BLOCK_SIZE];
+ u8 hash2[POLYVAL_BLOCK_SIZE];
+
+ rand_bytes(data, MAX_LEN_FOR_KEY_CHECK);
+ KUNIT_ASSERT_MEMEQ(test, key1, key2, sizeof(*key1));
+
+ for (int i = 0; i < 100; i++) {
+ size_t len = rand_length(MAX_LEN_FOR_KEY_CHECK);
+
+ polyval(key1, data, len, hash1);
+ polyval(key2, data, len, hash2);
+ KUNIT_ASSERT_MEMEQ(test, hash1, hash2, sizeof(hash1));
+ }
+}
+
+/* Test that no buffer overreads occur on either raw_key or polyval_key. */
+static void test_polyval_with_guarded_key(struct kunit *test)
+{
+ u8 raw_key[POLYVAL_BLOCK_SIZE];
+ u8 *guarded_raw_key = &test_buf[TEST_BUF_LEN - sizeof(raw_key)];
+ struct polyval_key key1, key2;
+ struct polyval_key *guarded_key =
+ (struct polyval_key *)&test_buf[TEST_BUF_LEN - sizeof(key1)];
+
+ /* Prepare with regular buffers. */
+ rand_bytes(raw_key, sizeof(raw_key));
+ polyval_preparekey(&key1, raw_key);
+
+ /* Prepare with guarded raw_key, then check that it works. */
+ memcpy(guarded_raw_key, raw_key, sizeof(raw_key));
+ polyval_preparekey(&key2, guarded_raw_key);
+ check_key_consistency(test, &key1, &key2);
+
+ /* Prepare guarded polyval_key, then check that it works. */
+ polyval_preparekey(guarded_key, raw_key);
+ check_key_consistency(test, &key1, guarded_key);
+}
+
+/*
+ * Test that polyval_key only needs to be aligned to
+ * __alignof__(struct polyval_key), i.e. 8 bytes. The assembly code may prefer
+ * 16-byte or higher alignment, but it musn't require it.
+ */
+static void test_polyval_with_minimally_aligned_key(struct kunit *test)
+{
+ u8 raw_key[POLYVAL_BLOCK_SIZE];
+ struct polyval_key key;
+ struct polyval_key *minaligned_key =
+ (struct polyval_key *)&test_buf[MAX_LEN_FOR_KEY_CHECK +
+ __alignof__(struct polyval_key)];
+
+ KUNIT_ASSERT_TRUE(test, IS_ALIGNED((uintptr_t)minaligned_key,
+ __alignof__(struct polyval_key)));
+ KUNIT_ASSERT_TRUE(test,
+ !IS_ALIGNED((uintptr_t)minaligned_key,
+ 2 * __alignof__(struct polyval_key)));
+
+ rand_bytes(raw_key, sizeof(raw_key));
+ polyval_preparekey(&key, raw_key);
+ polyval_preparekey(minaligned_key, raw_key);
+ check_key_consistency(test, &key, minaligned_key);
+}
+
+struct polyval_irq_test_state {
+ struct polyval_key expected_key;
+ u8 raw_key[POLYVAL_BLOCK_SIZE];
+};
+
+static bool polyval_irq_test_func(void *state_)
+{
+ struct polyval_irq_test_state *state = state_;
+ struct polyval_key key;
+
+ polyval_preparekey(&key, state->raw_key);
+ return memcmp(&key, &state->expected_key, sizeof(key)) == 0;
+}
+
+/*
+ * Test that polyval_preparekey() produces the same output regardless of whether
+ * FPU or vector registers are usable when it is called.
+ */
+static void test_polyval_preparekey_in_irqs(struct kunit *test)
+{
+ struct polyval_irq_test_state state;
+
+ rand_bytes(state.raw_key, sizeof(state.raw_key));
+ polyval_preparekey(&state.expected_key, state.raw_key);
+ kunit_run_irq_test(test, polyval_irq_test_func, 20000, &state);
+}
+
+static int polyval_suite_init(struct kunit_suite *suite)
+{
+ u8 raw_key[POLYVAL_BLOCK_SIZE];
+
+ rand_bytes_seeded_from_len(raw_key, sizeof(raw_key));
+ polyval_preparekey(&test_key, raw_key);
+ return hash_suite_init(suite);
+}
+
+static void polyval_suite_exit(struct kunit_suite *suite)
+{
+ hash_suite_exit(suite);
+}
+
+static struct kunit_case polyval_test_cases[] = {
+ HASH_KUNIT_CASES,
+ KUNIT_CASE(test_polyval_rfc8452_testvec),
+ KUNIT_CASE(test_polyval_allones_key_and_message),
+ KUNIT_CASE(test_polyval_with_guarded_key),
+ KUNIT_CASE(test_polyval_with_minimally_aligned_key),
+ KUNIT_CASE(test_polyval_preparekey_in_irqs),
+ KUNIT_CASE(benchmark_hash),
+ {},
+};
+
+static struct kunit_suite polyval_test_suite = {
+ .name = "polyval",
+ .test_cases = polyval_test_cases,
+ .suite_init = polyval_suite_init,
+ .suite_exit = polyval_suite_exit,
+};
+kunit_test_suite(polyval_test_suite);
+
+MODULE_DESCRIPTION("KUnit tests and benchmark for POLYVAL");
+MODULE_LICENSE("GPL");
diff --git a/scripts/crypto/gen-hash-testvecs.py b/scripts/crypto/gen-hash-testvecs.py
index ae2682882cd1..c1d0517140bd 100755
--- a/scripts/crypto/gen-hash-testvecs.py
+++ b/scripts/crypto/gen-hash-testvecs.py
@@ -1,9 +1,9 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0-or-later
#
-# Script that generates test vectors for the given cryptographic hash function.
+# Script that generates test vectors for the given hash function.
#
# Copyright 2025 Google LLC
import hashlib
import hmac
@@ -48,15 +48,46 @@ class Poly1305:
# nondestructive, i.e. not changing any field of self.
def digest(self):
m = (self.h + self.s) % 2**128
return m.to_bytes(16, byteorder='little')
+POLYVAL_POLY = sum((1 << i) for i in [128, 127, 126, 121, 0])
+POLYVAL_BLOCK_SIZE = 16
+
+# A straightforward, unoptimized implementation of POLYVAL.
+# Reference: https://datatracker.ietf.org/doc/html/rfc8452
+class Polyval:
+ def __init__(self, key):
+ assert len(key) == 16
+ self.h = int.from_bytes(key, byteorder='little')
+ self.acc = 0
+
+ # Note: this supports partial blocks only at the end.
+ def update(self, data):
+ for i in range(0, len(data), 16):
+ # acc += block
+ self.acc ^= int.from_bytes(data[i:i+16], byteorder='little')
+ # acc = (acc * h * x^-128) mod POLYVAL_POLY
+ product = 0
+ for j in range(128):
+ if (self.h & (1 << j)) != 0:
+ product ^= self.acc << j
+ if (product & (1 << j)) != 0:
+ product ^= POLYVAL_POLY << j
+ self.acc = product >> 128
+ return self
+
+ def digest(self):
+ return self.acc.to_bytes(16, byteorder='little')
+
def hash_init(alg):
if alg == 'poly1305':
# Use a fixed random key here, to present Poly1305 as an unkeyed hash.
# This allows all the test cases for unkeyed hashes to work on Poly1305.
return Poly1305(rand_bytes(POLY1305_KEY_SIZE))
+ if alg == 'polyval':
+ return Polyval(rand_bytes(POLYVAL_BLOCK_SIZE))
return hashlib.new(alg)
def hash_update(ctx, data):
ctx.update(data)
@@ -163,13 +194,22 @@ def gen_additional_poly1305_testvecs():
data += ctx.digest()
print_static_u8_array_definition(
'poly1305_allones_macofmacs[POLY1305_DIGEST_SIZE]',
Poly1305(key).update(data).digest())
+def gen_additional_polyval_testvecs():
+ key = b'\xff' * POLYVAL_BLOCK_SIZE
+ hashes = b''
+ for data_len in range(0, 4097, 16):
+ hashes += Polyval(key).update(b'\xff' * data_len).digest()
+ print_static_u8_array_definition(
+ 'polyval_allones_hashofhashes[POLYVAL_DIGEST_SIZE]',
+ Polyval(key).update(hashes).digest())
+
if len(sys.argv) != 2:
sys.stderr.write('Usage: gen-hash-testvecs.py ALGORITHM\n')
- sys.stderr.write('ALGORITHM may be any supported by Python hashlib, or poly1305 or sha3.\n')
+ sys.stderr.write('ALGORITHM may be any supported by Python hashlib; or poly1305, polyval, or sha3.\n')
sys.stderr.write('Example: gen-hash-testvecs.py sha512\n')
sys.exit(1)
alg = sys.argv[1]
print('/* SPDX-License-Identifier: GPL-2.0-or-later */')
@@ -178,10 +218,13 @@ if alg.startswith('blake2'):
gen_unkeyed_testvecs(alg)
gen_additional_blake2_testvecs(alg)
elif alg == 'poly1305':
gen_unkeyed_testvecs(alg)
gen_additional_poly1305_testvecs()
+elif alg == 'polyval':
+ gen_unkeyed_testvecs(alg)
+ gen_additional_polyval_testvecs()
elif alg == 'sha3':
print()
print('/* SHA3-256 test vectors */')
gen_unkeyed_testvecs('sha3-256')
print()
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/9] lib/crypto: arm64/polyval: Migrate optimized code into library
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (2 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 3/9] lib/crypto: tests: Add KUnit tests for POLYVAL Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 5/9] lib/crypto: x86/polyval: " Eric Biggers
` (6 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
Migrate the arm64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface. This makes the POLYVAL library be
properly optimized on arm64.
This drops the arm64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there. But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.
Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
arch/arm64/crypto/Kconfig | 10 --
arch/arm64/crypto/Makefile | 3 -
arch/arm64/crypto/polyval-ce-glue.c | 158 ------------------
include/crypto/polyval.h | 8 +
lib/crypto/Kconfig | 1 +
lib/crypto/Makefile | 1 +
.../crypto/arm64}/polyval-ce-core.S | 38 ++---
lib/crypto/arm64/polyval.h | 82 +++++++++
8 files changed, 110 insertions(+), 191 deletions(-)
delete mode 100644 arch/arm64/crypto/polyval-ce-glue.c
rename {arch/arm64/crypto => lib/crypto/arm64}/polyval-ce-core.S (92%)
create mode 100644 lib/crypto/arm64/polyval.h
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 376d6b50743f..bdd276a6e540 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -45,20 +45,10 @@ config CRYPTO_SM3_ARM64_CE
SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)
Architecture: arm64 using:
- ARMv8.2 Crypto Extensions
-config CRYPTO_POLYVAL_ARM64_CE
- tristate "Hash functions: POLYVAL (ARMv8 Crypto Extensions)"
- depends on KERNEL_MODE_NEON
- select CRYPTO_POLYVAL
- help
- POLYVAL hash function for HCTR2
-
- Architecture: arm64 using:
- - ARMv8 Crypto Extensions
-
config CRYPTO_AES_ARM64
tristate "Ciphers: AES, modes: ECB, CBC, CTR, CTS, XCTR, XTS"
select CRYPTO_AES
select CRYPTO_LIB_SHA256
help
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index fd3d590fa113..1e330aa08d3f 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -27,13 +27,10 @@ obj-$(CONFIG_CRYPTO_SM4_ARM64_NEON_BLK) += sm4-neon.o
sm4-neon-y := sm4-neon-glue.o sm4-neon-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
-obj-$(CONFIG_CRYPTO_POLYVAL_ARM64_CE) += polyval-ce.o
-polyval-ce-y := polyval-ce-glue.o polyval-ce-core.o
-
obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
aes-ce-cipher-y := aes-ce-core.o aes-ce-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c
deleted file mode 100644
index c4e653688ea0..000000000000
--- a/arch/arm64/crypto/polyval-ce-glue.c
+++ /dev/null
@@ -1,158 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Glue code for POLYVAL using ARMv8 Crypto Extensions
- *
- * Copyright (c) 2007 Nokia Siemens Networks - Mikko Herranen <mh1@iki.fi>
- * Copyright (c) 2009 Intel Corp.
- * Author: Huang Ying <ying.huang@intel.com>
- * Copyright 2021 Google LLC
- */
-
-/*
- * Glue code based on ghash-clmulni-intel_glue.c.
- *
- * This implementation of POLYVAL uses montgomery multiplication accelerated by
- * ARMv8 Crypto Extensions instructions to implement the finite field operations.
- */
-
-#include <asm/neon.h>
-#include <crypto/internal/hash.h>
-#include <crypto/polyval.h>
-#include <crypto/utils.h>
-#include <linux/cpufeature.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-#define NUM_KEY_POWERS 8
-
-struct polyval_tfm_ctx {
- /*
- * These powers must be in the order h^8, ..., h^1.
- */
- u8 key_powers[NUM_KEY_POWERS][POLYVAL_BLOCK_SIZE];
-};
-
-struct polyval_desc_ctx {
- u8 buffer[POLYVAL_BLOCK_SIZE];
-};
-
-asmlinkage void pmull_polyval_update(const struct polyval_tfm_ctx *keys,
- const u8 *in, size_t nblocks, u8 *accumulator);
-asmlinkage void pmull_polyval_mul(u8 *op1, const u8 *op2);
-
-static void internal_polyval_update(const struct polyval_tfm_ctx *keys,
- const u8 *in, size_t nblocks, u8 *accumulator)
-{
- kernel_neon_begin();
- pmull_polyval_update(keys, in, nblocks, accumulator);
- kernel_neon_end();
-}
-
-static void internal_polyval_mul(u8 *op1, const u8 *op2)
-{
- kernel_neon_begin();
- pmull_polyval_mul(op1, op2);
- kernel_neon_end();
-}
-
-static int polyval_arm64_setkey(struct crypto_shash *tfm,
- const u8 *key, unsigned int keylen)
-{
- struct polyval_tfm_ctx *tctx = crypto_shash_ctx(tfm);
- int i;
-
- if (keylen != POLYVAL_BLOCK_SIZE)
- return -EINVAL;
-
- memcpy(tctx->key_powers[NUM_KEY_POWERS-1], key, POLYVAL_BLOCK_SIZE);
-
- for (i = NUM_KEY_POWERS-2; i >= 0; i--) {
- memcpy(tctx->key_powers[i], key, POLYVAL_BLOCK_SIZE);
- internal_polyval_mul(tctx->key_powers[i],
- tctx->key_powers[i+1]);
- }
-
- return 0;
-}
-
-static int polyval_arm64_init(struct shash_desc *desc)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
-
- memset(dctx, 0, sizeof(*dctx));
-
- return 0;
-}
-
-static int polyval_arm64_update(struct shash_desc *desc,
- const u8 *src, unsigned int srclen)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
- const struct polyval_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
- unsigned int nblocks;
-
- do {
- /* allow rescheduling every 4K bytes */
- nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
- internal_polyval_update(tctx, src, nblocks, dctx->buffer);
- srclen -= nblocks * POLYVAL_BLOCK_SIZE;
- src += nblocks * POLYVAL_BLOCK_SIZE;
- } while (srclen >= POLYVAL_BLOCK_SIZE);
-
- return srclen;
-}
-
-static int polyval_arm64_finup(struct shash_desc *desc, const u8 *src,
- unsigned int len, u8 *dst)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
- const struct polyval_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
-
- if (len) {
- crypto_xor(dctx->buffer, src, len);
- internal_polyval_mul(dctx->buffer,
- tctx->key_powers[NUM_KEY_POWERS-1]);
- }
-
- memcpy(dst, dctx->buffer, POLYVAL_BLOCK_SIZE);
-
- return 0;
-}
-
-static struct shash_alg polyval_alg = {
- .digestsize = POLYVAL_DIGEST_SIZE,
- .init = polyval_arm64_init,
- .update = polyval_arm64_update,
- .finup = polyval_arm64_finup,
- .setkey = polyval_arm64_setkey,
- .descsize = sizeof(struct polyval_desc_ctx),
- .base = {
- .cra_name = "polyval",
- .cra_driver_name = "polyval-ce",
- .cra_priority = 200,
- .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY,
- .cra_blocksize = POLYVAL_BLOCK_SIZE,
- .cra_ctxsize = sizeof(struct polyval_tfm_ctx),
- .cra_module = THIS_MODULE,
- },
-};
-
-static int __init polyval_ce_mod_init(void)
-{
- return crypto_register_shash(&polyval_alg);
-}
-
-static void __exit polyval_ce_mod_exit(void)
-{
- crypto_unregister_shash(&polyval_alg);
-}
-
-module_cpu_feature_match(PMULL, polyval_ce_mod_init)
-module_exit(polyval_ce_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("POLYVAL hash function accelerated by ARMv8 Crypto Extensions");
-MODULE_ALIAS_CRYPTO("polyval");
-MODULE_ALIAS_CRYPTO("polyval-ce");
diff --git a/include/crypto/polyval.h b/include/crypto/polyval.h
index 5ba4c248cad1..f8aaf4275fbd 100644
--- a/include/crypto/polyval.h
+++ b/include/crypto/polyval.h
@@ -37,14 +37,22 @@ struct polyval_elem {
* struct polyval_key - Prepared key for POLYVAL
*
* This may contain just the raw key H, or it may contain precomputed key
* powers, depending on the platform's POLYVAL implementation. Use
* polyval_preparekey() to initialize this.
+ *
+ * By H^i we mean H^(i-1) * H * x^-128, with base case H^1 = H. I.e. the
+ * exponentiation repeats the POLYVAL dot operation, with its "extra" x^-128.
*/
struct polyval_key {
#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#ifdef CONFIG_ARM64
+ /** @h_powers: Powers of the hash key H^8 through H^1 */
+ struct polyval_elem h_powers[8];
+#else
#error "Unhandled arch"
+#endif
#else /* CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
/** @h: The hash key H */
struct polyval_elem h;
#endif /* !CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
};
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 6545f0e83b83..430723994142 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -142,10 +142,11 @@ config CRYPTO_LIB_POLYVAL
the functions from <crypto/polyval.h>.
config CRYPTO_LIB_POLYVAL_ARCH
bool
depends on CRYPTO_LIB_POLYVAL && !UML
+ default y if ARM64 && KERNEL_MODE_NEON
config CRYPTO_LIB_CHACHA20POLY1305
tristate
select CRYPTO_LIB_CHACHA
select CRYPTO_LIB_POLY1305
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 055e44008805..2efa96afcb4b 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -200,10 +200,11 @@ clean-files += arm/poly1305-core.S \
obj-$(CONFIG_CRYPTO_LIB_POLYVAL) += libpolyval.o
libpolyval-y := polyval.o
ifeq ($(CONFIG_CRYPTO_LIB_POLYVAL_ARCH),y)
CFLAGS_polyval.o += -I$(src)/$(SRCARCH)
+libpolyval-$(CONFIG_ARM64) += arm64/polyval-ce-core.o
endif
################################################################################
obj-$(CONFIG_CRYPTO_LIB_SHA1) += libsha1.o
diff --git a/arch/arm64/crypto/polyval-ce-core.S b/lib/crypto/arm64/polyval-ce-core.S
similarity index 92%
rename from arch/arm64/crypto/polyval-ce-core.S
rename to lib/crypto/arm64/polyval-ce-core.S
index b5326540d2e3..7c731a044d02 100644
--- a/arch/arm64/crypto/polyval-ce-core.S
+++ b/lib/crypto/arm64/polyval-ce-core.S
@@ -25,14 +25,14 @@
*/
#include <linux/linkage.h>
#define STRIDE_BLOCKS 8
-KEY_POWERS .req x0
-MSG .req x1
-BLOCKS_LEFT .req x2
-ACCUMULATOR .req x3
+ACCUMULATOR .req x0
+KEY_POWERS .req x1
+MSG .req x2
+BLOCKS_LEFT .req x3
KEY_START .req x10
EXTRA_BYTES .req x11
TMP .req x13
M0 .req v0
@@ -298,44 +298,42 @@ GSTAR .req v24
karatsuba2
montgomery_reduction SUM
.endm
/*
- * Perform montgomery multiplication in GF(2^128) and store result in op1.
+ * Computes a = a * b * x^{-128} mod x^128 + x^127 + x^126 + x^121 + 1.
*
- * Computes op1*op2*x^{-128} mod x^128 + x^127 + x^126 + x^121 + 1
- * If op1, op2 are in montgomery form, this computes the montgomery
- * form of op1*op2.
- *
- * void pmull_polyval_mul(u8 *op1, const u8 *op2);
+ * void polyval_mul_pmull(struct polyval_elem *a,
+ * const struct polyval_elem *b);
*/
-SYM_FUNC_START(pmull_polyval_mul)
+SYM_FUNC_START(polyval_mul_pmull)
adr TMP, .Lgstar
ld1 {GSTAR.2d}, [TMP]
ld1 {v0.16b}, [x0]
ld1 {v1.16b}, [x1]
karatsuba1_store v0 v1
karatsuba2
montgomery_reduction SUM
st1 {SUM.16b}, [x0]
ret
-SYM_FUNC_END(pmull_polyval_mul)
+SYM_FUNC_END(polyval_mul_pmull)
/*
* Perform polynomial evaluation as specified by POLYVAL. This computes:
* h^n * accumulator + h^n * m_0 + ... + h^1 * m_{n-1}
* where n=nblocks, h is the hash key, and m_i are the message blocks.
*
- * x0 - pointer to precomputed key powers h^8 ... h^1
- * x1 - pointer to message blocks
- * x2 - number of blocks to hash
- * x3 - pointer to accumulator
+ * x0 - pointer to accumulator
+ * x1 - pointer to precomputed key powers h^8 ... h^1
+ * x2 - pointer to message blocks
+ * x3 - number of blocks to hash
*
- * void pmull_polyval_update(const struct polyval_ctx *ctx, const u8 *in,
- * size_t nblocks, u8 *accumulator);
+ * void polyval_blocks_pmull(struct polyval_elem *acc,
+ * const struct polyval_key *key,
+ * const u8 *data, size_t nblocks);
*/
-SYM_FUNC_START(pmull_polyval_update)
+SYM_FUNC_START(polyval_blocks_pmull)
adr TMP, .Lgstar
mov KEY_START, KEY_POWERS
ld1 {GSTAR.2d}, [TMP]
ld1 {SUM.16b}, [ACCUMULATOR]
subs BLOCKS_LEFT, BLOCKS_LEFT, #STRIDE_BLOCKS
@@ -356,6 +354,6 @@ SYM_FUNC_START(pmull_polyval_update)
beq .LskipPartial
partial_stride
.LskipPartial:
st1 {SUM.16b}, [ACCUMULATOR]
ret
-SYM_FUNC_END(pmull_polyval_update)
+SYM_FUNC_END(polyval_blocks_pmull)
diff --git a/lib/crypto/arm64/polyval.h b/lib/crypto/arm64/polyval.h
new file mode 100644
index 000000000000..2486e80750d0
--- /dev/null
+++ b/lib/crypto/arm64/polyval.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * POLYVAL library functions, arm64 optimized
+ *
+ * Copyright 2025 Google LLC
+ */
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <linux/cpufeature.h>
+
+#define NUM_H_POWERS 8
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pmull);
+
+asmlinkage void polyval_mul_pmull(struct polyval_elem *a,
+ const struct polyval_elem *b);
+asmlinkage void polyval_blocks_pmull(struct polyval_elem *acc,
+ const struct polyval_key *key,
+ const u8 *data, size_t nblocks);
+
+static void polyval_preparekey_arch(struct polyval_key *key,
+ const u8 raw_key[POLYVAL_BLOCK_SIZE])
+{
+ static_assert(ARRAY_SIZE(key->h_powers) == NUM_H_POWERS);
+ memcpy(&key->h_powers[NUM_H_POWERS - 1], raw_key, POLYVAL_BLOCK_SIZE);
+ if (static_branch_likely(&have_pmull) && may_use_simd()) {
+ kernel_neon_begin();
+ for (int i = NUM_H_POWERS - 2; i >= 0; i--) {
+ key->h_powers[i] = key->h_powers[i + 1];
+ polyval_mul_pmull(&key->h_powers[i],
+ &key->h_powers[NUM_H_POWERS - 1]);
+ }
+ kernel_neon_end();
+ } else {
+ for (int i = NUM_H_POWERS - 2; i >= 0; i--) {
+ key->h_powers[i] = key->h_powers[i + 1];
+ polyval_mul_generic(&key->h_powers[i],
+ &key->h_powers[NUM_H_POWERS - 1]);
+ }
+ }
+}
+
+static void polyval_mul_arch(struct polyval_elem *acc,
+ const struct polyval_key *key)
+{
+ if (static_branch_likely(&have_pmull) && may_use_simd()) {
+ kernel_neon_begin();
+ polyval_mul_pmull(acc, &key->h_powers[NUM_H_POWERS - 1]);
+ kernel_neon_end();
+ } else {
+ polyval_mul_generic(acc, &key->h_powers[NUM_H_POWERS - 1]);
+ }
+}
+
+static void polyval_blocks_arch(struct polyval_elem *acc,
+ const struct polyval_key *key,
+ const u8 *data, size_t nblocks)
+{
+ if (static_branch_likely(&have_pmull) && may_use_simd()) {
+ do {
+ /* Allow rescheduling every 4 KiB. */
+ size_t n = min_t(size_t, nblocks,
+ 4096 / POLYVAL_BLOCK_SIZE);
+
+ kernel_neon_begin();
+ polyval_blocks_pmull(acc, key, data, n);
+ kernel_neon_end();
+ data += n * POLYVAL_BLOCK_SIZE;
+ nblocks -= n;
+ } while (nblocks);
+ } else {
+ polyval_blocks_generic(acc, &key->h_powers[NUM_H_POWERS - 1],
+ data, nblocks);
+ }
+}
+
+#define polyval_mod_init_arch polyval_mod_init_arch
+static void polyval_mod_init_arch(void)
+{
+ if (cpu_have_named_feature(PMULL))
+ static_branch_enable(&have_pmull);
+}
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/9] lib/crypto: x86/polyval: Migrate optimized code into library
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (3 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 4/9] lib/crypto: arm64/polyval: Migrate optimized code into library Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 6/9] crypto: hctr2 - Convert to use POLYVAL library Eric Biggers
` (5 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface. This makes the POLYVAL library be
properly optimized on x86_64.
This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there. But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.
Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.
Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned. Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
arch/x86/crypto/Kconfig | 10 -
arch/x86/crypto/Makefile | 3 -
arch/x86/crypto/polyval-clmulni_glue.c | 180 ------------------
include/crypto/polyval.h | 3 +
lib/crypto/Kconfig | 1 +
lib/crypto/Makefile | 1 +
.../crypto/x86/polyval-pclmul-avx.S | 40 ++--
lib/crypto/x86/polyval.h | 83 ++++++++
8 files changed, 107 insertions(+), 214 deletions(-)
delete mode 100644 arch/x86/crypto/polyval-clmulni_glue.c
rename arch/x86/crypto/polyval-clmulni_asm.S => lib/crypto/x86/polyval-pclmul-avx.S (91%)
create mode 100644 lib/crypto/x86/polyval.h
diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 48d3076b6053..3fd2423d3cf8 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -351,20 +351,10 @@ config CRYPTO_NHPOLY1305_AVX2
NHPoly1305 hash function for Adiantum
Architecture: x86_64 using:
- AVX2 (Advanced Vector Extensions 2)
-config CRYPTO_POLYVAL_CLMUL_NI
- tristate "Hash functions: POLYVAL (CLMUL-NI)"
- depends on 64BIT
- select CRYPTO_POLYVAL
- help
- POLYVAL hash function for HCTR2
-
- Architecture: x86_64 using:
- - CLMUL-NI (carry-less multiplication new instructions)
-
config CRYPTO_SM3_AVX_X86_64
tristate "Hash functions: SM3 (AVX)"
depends on 64BIT
select CRYPTO_HASH
select CRYPTO_LIB_SM3
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 6409e3009524..5f2fb4f148fe 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -51,13 +51,10 @@ aesni-intel-$(CONFIG_64BIT) += aes-ctr-avx-x86_64.o \
aes-xts-avx-x86_64.o
obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
-obj-$(CONFIG_CRYPTO_POLYVAL_CLMUL_NI) += polyval-clmulni.o
-polyval-clmulni-y := polyval-clmulni_asm.o polyval-clmulni_glue.o
-
obj-$(CONFIG_CRYPTO_NHPOLY1305_SSE2) += nhpoly1305-sse2.o
nhpoly1305-sse2-y := nh-sse2-x86_64.o nhpoly1305-sse2-glue.o
obj-$(CONFIG_CRYPTO_NHPOLY1305_AVX2) += nhpoly1305-avx2.o
nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o
diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c
deleted file mode 100644
index 6b466867f91a..000000000000
--- a/arch/x86/crypto/polyval-clmulni_glue.c
+++ /dev/null
@@ -1,180 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Glue code for POLYVAL using PCMULQDQ-NI
- *
- * Copyright (c) 2007 Nokia Siemens Networks - Mikko Herranen <mh1@iki.fi>
- * Copyright (c) 2009 Intel Corp.
- * Author: Huang Ying <ying.huang@intel.com>
- * Copyright 2021 Google LLC
- */
-
-/*
- * Glue code based on ghash-clmulni-intel_glue.c.
- *
- * This implementation of POLYVAL uses montgomery multiplication
- * accelerated by PCLMULQDQ-NI to implement the finite field
- * operations.
- */
-
-#include <asm/cpu_device_id.h>
-#include <asm/fpu/api.h>
-#include <crypto/internal/hash.h>
-#include <crypto/polyval.h>
-#include <crypto/utils.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-#define POLYVAL_ALIGN 16
-#define POLYVAL_ALIGN_ATTR __aligned(POLYVAL_ALIGN)
-#define POLYVAL_ALIGN_EXTRA ((POLYVAL_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define POLYVAL_CTX_SIZE (sizeof(struct polyval_tfm_ctx) + POLYVAL_ALIGN_EXTRA)
-#define NUM_KEY_POWERS 8
-
-struct polyval_tfm_ctx {
- /*
- * These powers must be in the order h^8, ..., h^1.
- */
- u8 key_powers[NUM_KEY_POWERS][POLYVAL_BLOCK_SIZE] POLYVAL_ALIGN_ATTR;
-};
-
-struct polyval_desc_ctx {
- u8 buffer[POLYVAL_BLOCK_SIZE];
-};
-
-asmlinkage void clmul_polyval_update(const struct polyval_tfm_ctx *keys,
- const u8 *in, size_t nblocks, u8 *accumulator);
-asmlinkage void clmul_polyval_mul(u8 *op1, const u8 *op2);
-
-static inline struct polyval_tfm_ctx *polyval_tfm_ctx(struct crypto_shash *tfm)
-{
- return PTR_ALIGN(crypto_shash_ctx(tfm), POLYVAL_ALIGN);
-}
-
-static void internal_polyval_update(const struct polyval_tfm_ctx *keys,
- const u8 *in, size_t nblocks, u8 *accumulator)
-{
- kernel_fpu_begin();
- clmul_polyval_update(keys, in, nblocks, accumulator);
- kernel_fpu_end();
-}
-
-static void internal_polyval_mul(u8 *op1, const u8 *op2)
-{
- kernel_fpu_begin();
- clmul_polyval_mul(op1, op2);
- kernel_fpu_end();
-}
-
-static int polyval_x86_setkey(struct crypto_shash *tfm,
- const u8 *key, unsigned int keylen)
-{
- struct polyval_tfm_ctx *tctx = polyval_tfm_ctx(tfm);
- int i;
-
- if (keylen != POLYVAL_BLOCK_SIZE)
- return -EINVAL;
-
- memcpy(tctx->key_powers[NUM_KEY_POWERS-1], key, POLYVAL_BLOCK_SIZE);
-
- for (i = NUM_KEY_POWERS-2; i >= 0; i--) {
- memcpy(tctx->key_powers[i], key, POLYVAL_BLOCK_SIZE);
- internal_polyval_mul(tctx->key_powers[i],
- tctx->key_powers[i+1]);
- }
-
- return 0;
-}
-
-static int polyval_x86_init(struct shash_desc *desc)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
-
- memset(dctx, 0, sizeof(*dctx));
-
- return 0;
-}
-
-static int polyval_x86_update(struct shash_desc *desc,
- const u8 *src, unsigned int srclen)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
- const struct polyval_tfm_ctx *tctx = polyval_tfm_ctx(desc->tfm);
- unsigned int nblocks;
-
- do {
- /* Allow rescheduling every 4K bytes. */
- nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
- internal_polyval_update(tctx, src, nblocks, dctx->buffer);
- srclen -= nblocks * POLYVAL_BLOCK_SIZE;
- src += nblocks * POLYVAL_BLOCK_SIZE;
- } while (srclen >= POLYVAL_BLOCK_SIZE);
-
- return srclen;
-}
-
-static int polyval_x86_finup(struct shash_desc *desc, const u8 *src,
- unsigned int len, u8 *dst)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
- const struct polyval_tfm_ctx *tctx = polyval_tfm_ctx(desc->tfm);
-
- if (len) {
- crypto_xor(dctx->buffer, src, len);
- internal_polyval_mul(dctx->buffer,
- tctx->key_powers[NUM_KEY_POWERS-1]);
- }
-
- memcpy(dst, dctx->buffer, POLYVAL_BLOCK_SIZE);
-
- return 0;
-}
-
-static struct shash_alg polyval_alg = {
- .digestsize = POLYVAL_DIGEST_SIZE,
- .init = polyval_x86_init,
- .update = polyval_x86_update,
- .finup = polyval_x86_finup,
- .setkey = polyval_x86_setkey,
- .descsize = sizeof(struct polyval_desc_ctx),
- .base = {
- .cra_name = "polyval",
- .cra_driver_name = "polyval-clmulni",
- .cra_priority = 200,
- .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY,
- .cra_blocksize = POLYVAL_BLOCK_SIZE,
- .cra_ctxsize = POLYVAL_CTX_SIZE,
- .cra_module = THIS_MODULE,
- },
-};
-
-__maybe_unused static const struct x86_cpu_id pcmul_cpu_id[] = {
- X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
- {}
-};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
-
-static int __init polyval_clmulni_mod_init(void)
-{
- if (!x86_match_cpu(pcmul_cpu_id))
- return -ENODEV;
-
- if (!boot_cpu_has(X86_FEATURE_AVX))
- return -ENODEV;
-
- return crypto_register_shash(&polyval_alg);
-}
-
-static void __exit polyval_clmulni_mod_exit(void)
-{
- crypto_unregister_shash(&polyval_alg);
-}
-
-module_init(polyval_clmulni_mod_init);
-module_exit(polyval_clmulni_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("POLYVAL hash function accelerated by PCLMULQDQ-NI");
-MODULE_ALIAS_CRYPTO("polyval");
-MODULE_ALIAS_CRYPTO("polyval-clmulni");
diff --git a/include/crypto/polyval.h b/include/crypto/polyval.h
index f8aaf4275fbd..b28b8ef11353 100644
--- a/include/crypto/polyval.h
+++ b/include/crypto/polyval.h
@@ -46,10 +46,13 @@ struct polyval_elem {
struct polyval_key {
#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
#ifdef CONFIG_ARM64
/** @h_powers: Powers of the hash key H^8 through H^1 */
struct polyval_elem h_powers[8];
+#elif defined(CONFIG_X86)
+ /** @h_powers: Powers of the hash key H^8 through H^1 */
+ struct polyval_elem h_powers[8];
#else
#error "Unhandled arch"
#endif
#else /* CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
/** @h: The hash key H */
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 430723994142..9d04b3771ce2 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -143,10 +143,11 @@ config CRYPTO_LIB_POLYVAL
config CRYPTO_LIB_POLYVAL_ARCH
bool
depends on CRYPTO_LIB_POLYVAL && !UML
default y if ARM64 && KERNEL_MODE_NEON
+ default y if X86_64
config CRYPTO_LIB_CHACHA20POLY1305
tristate
select CRYPTO_LIB_CHACHA
select CRYPTO_LIB_POLY1305
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 2efa96afcb4b..6580991f8e12 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -201,10 +201,11 @@ clean-files += arm/poly1305-core.S \
obj-$(CONFIG_CRYPTO_LIB_POLYVAL) += libpolyval.o
libpolyval-y := polyval.o
ifeq ($(CONFIG_CRYPTO_LIB_POLYVAL_ARCH),y)
CFLAGS_polyval.o += -I$(src)/$(SRCARCH)
libpolyval-$(CONFIG_ARM64) += arm64/polyval-ce-core.o
+libpolyval-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
endif
################################################################################
obj-$(CONFIG_CRYPTO_LIB_SHA1) += libsha1.o
diff --git a/arch/x86/crypto/polyval-clmulni_asm.S b/lib/crypto/x86/polyval-pclmul-avx.S
similarity index 91%
rename from arch/x86/crypto/polyval-clmulni_asm.S
rename to lib/crypto/x86/polyval-pclmul-avx.S
index a6ebe4e7dd2b..7f739465ad35 100644
--- a/arch/x86/crypto/polyval-clmulni_asm.S
+++ b/lib/crypto/x86/polyval-pclmul-avx.S
@@ -34,14 +34,14 @@
#define LO %xmm12
#define HI %xmm13
#define MI %xmm14
#define SUM %xmm15
-#define KEY_POWERS %rdi
-#define MSG %rsi
-#define BLOCKS_LEFT %rdx
-#define ACCUMULATOR %rcx
+#define ACCUMULATOR %rdi
+#define KEY_POWERS %rsi
+#define MSG %rdx
+#define BLOCKS_LEFT %rcx
#define TMP %rax
.section .rodata.cst16.gstar, "aM", @progbits, 16
.align 16
@@ -232,11 +232,11 @@
addq $(16*STRIDE_BLOCKS), KEY_POWERS
subq TMP, KEY_POWERS
movups (MSG), %xmm0
pxor SUM, %xmm0
- movaps (KEY_POWERS), %xmm1
+ movups (KEY_POWERS), %xmm1
schoolbook1_noload
dec BLOCKS_LEFT
addq $16, MSG
addq $16, KEY_POWERS
@@ -259,45 +259,43 @@
schoolbook2
montgomery_reduction SUM
.endm
/*
- * Perform montgomery multiplication in GF(2^128) and store result in op1.
+ * Computes a = a * b * x^{-128} mod x^128 + x^127 + x^126 + x^121 + 1.
*
- * Computes op1*op2*x^{-128} mod x^128 + x^127 + x^126 + x^121 + 1
- * If op1, op2 are in montgomery form, this computes the montgomery
- * form of op1*op2.
- *
- * void clmul_polyval_mul(u8 *op1, const u8 *op2);
+ * void polyval_mul_pclmul_avx(struct polyval_elem *a,
+ * const struct polyval_elem *b);
*/
-SYM_FUNC_START(clmul_polyval_mul)
+SYM_FUNC_START(polyval_mul_pclmul_avx)
FRAME_BEGIN
vmovdqa .Lgstar(%rip), GSTAR
movups (%rdi), %xmm0
movups (%rsi), %xmm1
schoolbook1_noload
schoolbook2
montgomery_reduction SUM
movups SUM, (%rdi)
FRAME_END
RET
-SYM_FUNC_END(clmul_polyval_mul)
+SYM_FUNC_END(polyval_mul_pclmul_avx)
/*
* Perform polynomial evaluation as specified by POLYVAL. This computes:
* h^n * accumulator + h^n * m_0 + ... + h^1 * m_{n-1}
* where n=nblocks, h is the hash key, and m_i are the message blocks.
*
- * rdi - pointer to precomputed key powers h^8 ... h^1
- * rsi - pointer to message blocks
- * rdx - number of blocks to hash
- * rcx - pointer to the accumulator
+ * rdi - pointer to the accumulator
+ * rsi - pointer to precomputed key powers h^8 ... h^1
+ * rdx - pointer to message blocks
+ * rcx - number of blocks to hash
*
- * void clmul_polyval_update(const struct polyval_tfm_ctx *keys,
- * const u8 *in, size_t nblocks, u8 *accumulator);
+ * void polyval_blocks_pclmul_avx(struct polyval_elem *acc,
+ * const struct polyval_key *key,
+ * const u8 *data, size_t nblocks);
*/
-SYM_FUNC_START(clmul_polyval_update)
+SYM_FUNC_START(polyval_blocks_pclmul_avx)
FRAME_BEGIN
vmovdqa .Lgstar(%rip), GSTAR
movups (ACCUMULATOR), SUM
subq $STRIDE_BLOCKS, BLOCKS_LEFT
js .LstrideLoopExit
@@ -316,6 +314,6 @@ SYM_FUNC_START(clmul_polyval_update)
partial_stride
.LskipPartial:
movups SUM, (ACCUMULATOR)
FRAME_END
RET
-SYM_FUNC_END(clmul_polyval_update)
+SYM_FUNC_END(polyval_blocks_pclmul_avx)
diff --git a/lib/crypto/x86/polyval.h b/lib/crypto/x86/polyval.h
new file mode 100644
index 000000000000..ef8797521420
--- /dev/null
+++ b/lib/crypto/x86/polyval.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * POLYVAL library functions, x86_64 optimized
+ *
+ * Copyright 2025 Google LLC
+ */
+#include <asm/fpu/api.h>
+#include <linux/cpufeature.h>
+
+#define NUM_H_POWERS 8
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pclmul_avx);
+
+asmlinkage void polyval_mul_pclmul_avx(struct polyval_elem *a,
+ const struct polyval_elem *b);
+asmlinkage void polyval_blocks_pclmul_avx(struct polyval_elem *acc,
+ const struct polyval_key *key,
+ const u8 *data, size_t nblocks);
+
+static void polyval_preparekey_arch(struct polyval_key *key,
+ const u8 raw_key[POLYVAL_BLOCK_SIZE])
+{
+ static_assert(ARRAY_SIZE(key->h_powers) == NUM_H_POWERS);
+ memcpy(&key->h_powers[NUM_H_POWERS - 1], raw_key, POLYVAL_BLOCK_SIZE);
+ if (static_branch_likely(&have_pclmul_avx) && irq_fpu_usable()) {
+ kernel_fpu_begin();
+ for (int i = NUM_H_POWERS - 2; i >= 0; i--) {
+ key->h_powers[i] = key->h_powers[i + 1];
+ polyval_mul_pclmul_avx(
+ &key->h_powers[i],
+ &key->h_powers[NUM_H_POWERS - 1]);
+ }
+ kernel_fpu_end();
+ } else {
+ for (int i = NUM_H_POWERS - 2; i >= 0; i--) {
+ key->h_powers[i] = key->h_powers[i + 1];
+ polyval_mul_generic(&key->h_powers[i],
+ &key->h_powers[NUM_H_POWERS - 1]);
+ }
+ }
+}
+
+static void polyval_mul_arch(struct polyval_elem *acc,
+ const struct polyval_key *key)
+{
+ if (static_branch_likely(&have_pclmul_avx) && irq_fpu_usable()) {
+ kernel_fpu_begin();
+ polyval_mul_pclmul_avx(acc, &key->h_powers[NUM_H_POWERS - 1]);
+ kernel_fpu_end();
+ } else {
+ polyval_mul_generic(acc, &key->h_powers[NUM_H_POWERS - 1]);
+ }
+}
+
+static void polyval_blocks_arch(struct polyval_elem *acc,
+ const struct polyval_key *key,
+ const u8 *data, size_t nblocks)
+{
+ if (static_branch_likely(&have_pclmul_avx) && irq_fpu_usable()) {
+ do {
+ /* Allow rescheduling every 4 KiB. */
+ size_t n = min_t(size_t, nblocks,
+ 4096 / POLYVAL_BLOCK_SIZE);
+
+ kernel_fpu_begin();
+ polyval_blocks_pclmul_avx(acc, key, data, n);
+ kernel_fpu_end();
+ data += n * POLYVAL_BLOCK_SIZE;
+ nblocks -= n;
+ } while (nblocks);
+ } else {
+ polyval_blocks_generic(acc, &key->h_powers[NUM_H_POWERS - 1],
+ data, nblocks);
+ }
+}
+
+#define polyval_mod_init_arch polyval_mod_init_arch
+static void polyval_mod_init_arch(void)
+{
+ if (boot_cpu_has(X86_FEATURE_PCLMULQDQ) &&
+ boot_cpu_has(X86_FEATURE_AVX))
+ static_branch_enable(&have_pclmul_avx);
+}
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/9] crypto: hctr2 - Convert to use POLYVAL library
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (4 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 5/9] lib/crypto: x86/polyval: " Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 7/9] crypto: polyval - Remove the polyval crypto_shash Eric Biggers
` (4 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
The "hash function" in hctr2 is fixed at POLYVAL; it can never vary.
Just use the POLYVAL library, which is much easier to use than the
crypto_shash API. It's faster, uses fixed-size structs, and never fails
(all the functions return void).
Note that this eliminates the only known user of the polyval support in
crypto_shash. A later commit will remove support for polyval from
crypto_shash, given that the library API is sufficient.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
crypto/Kconfig | 2 +-
crypto/hctr2.c | 226 ++++++++++++++---------------------------------
crypto/testmgr.c | 3 +-
3 files changed, 66 insertions(+), 165 deletions(-)
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 57b85e903cf0..805172f75bf1 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -694,11 +694,11 @@ config CRYPTO_ECB
ECB (Electronic Codebook) mode (NIST SP800-38A)
config CRYPTO_HCTR2
tristate "HCTR2"
select CRYPTO_XCTR
- select CRYPTO_POLYVAL
+ select CRYPTO_LIB_POLYVAL
select CRYPTO_MANAGER
help
HCTR2 length-preserving encryption mode
A mode for storage encryption that is efficient on processors with
diff --git a/crypto/hctr2.c b/crypto/hctr2.c
index c8932777bba8..f4cd6c29b4d3 100644
--- a/crypto/hctr2.c
+++ b/crypto/hctr2.c
@@ -15,11 +15,10 @@
* For more details, see the paper: "Length-preserving encryption with HCTR2"
* (https://eprint.iacr.org/2021/1441.pdf)
*/
#include <crypto/internal/cipher.h>
-#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
#include <crypto/polyval.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
@@ -35,97 +34,65 @@
#define TWEAK_SIZE 32
struct hctr2_instance_ctx {
struct crypto_cipher_spawn blockcipher_spawn;
struct crypto_skcipher_spawn xctr_spawn;
- struct crypto_shash_spawn polyval_spawn;
};
struct hctr2_tfm_ctx {
struct crypto_cipher *blockcipher;
struct crypto_skcipher *xctr;
- struct crypto_shash *polyval;
+ struct polyval_key poly_key;
+ struct polyval_elem hashed_tweaklens[2];
u8 L[BLOCKCIPHER_BLOCK_SIZE];
- int hashed_tweak_offset;
- /*
- * This struct is allocated with extra space for two exported hash
- * states. Since the hash state size is not known at compile-time, we
- * can't add these to the struct directly.
- *
- * hashed_tweaklen_divisible;
- * hashed_tweaklen_remainder;
- */
};
struct hctr2_request_ctx {
u8 first_block[BLOCKCIPHER_BLOCK_SIZE];
u8 xctr_iv[BLOCKCIPHER_BLOCK_SIZE];
struct scatterlist *bulk_part_dst;
struct scatterlist *bulk_part_src;
struct scatterlist sg_src[2];
struct scatterlist sg_dst[2];
+ struct polyval_elem hashed_tweak;
/*
- * Sub-request sizes are unknown at compile-time, so they need to go
- * after the members with known sizes.
+ * skcipher sub-request size is unknown at compile-time, so it needs to
+ * go after the members with known sizes.
*/
union {
- struct shash_desc hash_desc;
+ struct polyval_ctx poly_ctx;
struct skcipher_request xctr_req;
} u;
- /*
- * This struct is allocated with extra space for one exported hash
- * state. Since the hash state size is not known at compile-time, we
- * can't add it to the struct directly.
- *
- * hashed_tweak;
- */
};
-static inline u8 *hctr2_hashed_tweaklen(const struct hctr2_tfm_ctx *tctx,
- bool has_remainder)
-{
- u8 *p = (u8 *)tctx + sizeof(*tctx);
-
- if (has_remainder) /* For messages not a multiple of block length */
- p += crypto_shash_statesize(tctx->polyval);
- return p;
-}
-
-static inline u8 *hctr2_hashed_tweak(const struct hctr2_tfm_ctx *tctx,
- struct hctr2_request_ctx *rctx)
-{
- return (u8 *)rctx + tctx->hashed_tweak_offset;
-}
-
/*
* The input data for each HCTR2 hash step begins with a 16-byte block that
* contains the tweak length and a flag that indicates whether the input is evenly
* divisible into blocks. Since this implementation only supports one tweak
* length, we precompute the two hash states resulting from hashing the two
* possible values of this initial block. This reduces by one block the amount of
* data that needs to be hashed for each encryption/decryption
*
* These precomputed hashes are stored in hctr2_tfm_ctx.
*/
-static int hctr2_hash_tweaklen(struct hctr2_tfm_ctx *tctx, bool has_remainder)
+static void hctr2_hash_tweaklens(struct hctr2_tfm_ctx *tctx)
{
- SHASH_DESC_ON_STACK(shash, tfm->polyval);
- __le64 tweak_length_block[2];
- int err;
-
- shash->tfm = tctx->polyval;
- memset(tweak_length_block, 0, sizeof(tweak_length_block));
-
- tweak_length_block[0] = cpu_to_le64(TWEAK_SIZE * 8 * 2 + 2 + has_remainder);
- err = crypto_shash_init(shash);
- if (err)
- return err;
- err = crypto_shash_update(shash, (u8 *)tweak_length_block,
- POLYVAL_BLOCK_SIZE);
- if (err)
- return err;
- return crypto_shash_export(shash, hctr2_hashed_tweaklen(tctx, has_remainder));
+ struct polyval_ctx ctx;
+
+ for (int has_remainder = 0; has_remainder < 2; has_remainder++) {
+ const __le64 tweak_length_block[2] = {
+ cpu_to_le64(TWEAK_SIZE * 8 * 2 + 2 + has_remainder),
+ };
+
+ polyval_init(&ctx, &tctx->poly_key);
+ polyval_update(&ctx, (const u8 *)&tweak_length_block,
+ sizeof(tweak_length_block));
+ static_assert(sizeof(tweak_length_block) == POLYVAL_BLOCK_SIZE);
+ polyval_export_blkaligned(
+ &ctx, &tctx->hashed_tweaklens[has_remainder]);
+ }
+ memzero_explicit(&ctx, sizeof(ctx));
}
static int hctr2_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
@@ -154,98 +121,75 @@ static int hctr2_setkey(struct crypto_skcipher *tfm, const u8 *key,
memset(tctx->L, 0, sizeof(tctx->L));
tctx->L[0] = 0x01;
crypto_cipher_encrypt_one(tctx->blockcipher, tctx->L, tctx->L);
- crypto_shash_clear_flags(tctx->polyval, CRYPTO_TFM_REQ_MASK);
- crypto_shash_set_flags(tctx->polyval, crypto_skcipher_get_flags(tfm) &
- CRYPTO_TFM_REQ_MASK);
- err = crypto_shash_setkey(tctx->polyval, hbar, BLOCKCIPHER_BLOCK_SIZE);
- if (err)
- return err;
+ static_assert(sizeof(hbar) == POLYVAL_BLOCK_SIZE);
+ polyval_preparekey(&tctx->poly_key, hbar);
memzero_explicit(hbar, sizeof(hbar));
- return hctr2_hash_tweaklen(tctx, true) ?: hctr2_hash_tweaklen(tctx, false);
+ hctr2_hash_tweaklens(tctx);
+ return 0;
}
-static int hctr2_hash_tweak(struct skcipher_request *req)
+static void hctr2_hash_tweak(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct hctr2_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct hctr2_request_ctx *rctx = skcipher_request_ctx(req);
- struct shash_desc *hash_desc = &rctx->u.hash_desc;
- int err;
+ struct polyval_ctx *poly_ctx = &rctx->u.poly_ctx;
bool has_remainder = req->cryptlen % POLYVAL_BLOCK_SIZE;
- hash_desc->tfm = tctx->polyval;
- err = crypto_shash_import(hash_desc, hctr2_hashed_tweaklen(tctx, has_remainder));
- if (err)
- return err;
- err = crypto_shash_update(hash_desc, req->iv, TWEAK_SIZE);
- if (err)
- return err;
+ polyval_import_blkaligned(poly_ctx, &tctx->poly_key,
+ &tctx->hashed_tweaklens[has_remainder]);
+ polyval_update(poly_ctx, req->iv, TWEAK_SIZE);
// Store the hashed tweak, since we need it when computing both
// H(T || N) and H(T || V).
- return crypto_shash_export(hash_desc, hctr2_hashed_tweak(tctx, rctx));
+ static_assert(TWEAK_SIZE % POLYVAL_BLOCK_SIZE == 0);
+ polyval_export_blkaligned(poly_ctx, &rctx->hashed_tweak);
}
-static int hctr2_hash_message(struct skcipher_request *req,
- struct scatterlist *sgl,
- u8 digest[POLYVAL_DIGEST_SIZE])
+static void hctr2_hash_message(struct skcipher_request *req,
+ struct scatterlist *sgl,
+ u8 digest[POLYVAL_DIGEST_SIZE])
{
- static const u8 padding[BLOCKCIPHER_BLOCK_SIZE] = { 0x1 };
+ static const u8 padding = 0x1;
struct hctr2_request_ctx *rctx = skcipher_request_ctx(req);
- struct shash_desc *hash_desc = &rctx->u.hash_desc;
+ struct polyval_ctx *poly_ctx = &rctx->u.poly_ctx;
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
struct sg_mapping_iter miter;
- unsigned int remainder = bulk_len % BLOCKCIPHER_BLOCK_SIZE;
int i;
- int err = 0;
int n = 0;
sg_miter_start(&miter, sgl, sg_nents(sgl),
SG_MITER_FROM_SG | SG_MITER_ATOMIC);
for (i = 0; i < bulk_len; i += n) {
sg_miter_next(&miter);
n = min_t(unsigned int, miter.length, bulk_len - i);
- err = crypto_shash_update(hash_desc, miter.addr, n);
- if (err)
- break;
+ polyval_update(poly_ctx, miter.addr, n);
}
sg_miter_stop(&miter);
- if (err)
- return err;
-
- if (remainder) {
- err = crypto_shash_update(hash_desc, padding,
- BLOCKCIPHER_BLOCK_SIZE - remainder);
- if (err)
- return err;
- }
- return crypto_shash_final(hash_desc, digest);
+ if (req->cryptlen % BLOCKCIPHER_BLOCK_SIZE)
+ polyval_update(poly_ctx, &padding, 1);
+ polyval_final(poly_ctx, digest);
}
static int hctr2_finish(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct hctr2_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct hctr2_request_ctx *rctx = skcipher_request_ctx(req);
+ struct polyval_ctx *poly_ctx = &rctx->u.poly_ctx;
u8 digest[POLYVAL_DIGEST_SIZE];
- struct shash_desc *hash_desc = &rctx->u.hash_desc;
- int err;
// U = UU ^ H(T || V)
// or M = MM ^ H(T || N)
- hash_desc->tfm = tctx->polyval;
- err = crypto_shash_import(hash_desc, hctr2_hashed_tweak(tctx, rctx));
- if (err)
- return err;
- err = hctr2_hash_message(req, rctx->bulk_part_dst, digest);
- if (err)
- return err;
+ polyval_import_blkaligned(poly_ctx, &tctx->poly_key,
+ &rctx->hashed_tweak);
+ hctr2_hash_message(req, rctx->bulk_part_dst, digest);
crypto_xor(rctx->first_block, digest, BLOCKCIPHER_BLOCK_SIZE);
// Copy U (or M) into dst scatterlist
scatterwalk_map_and_copy(rctx->first_block, req->dst,
0, BLOCKCIPHER_BLOCK_SIZE, 1);
@@ -267,11 +211,10 @@ static int hctr2_crypt(struct skcipher_request *req, bool enc)
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct hctr2_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct hctr2_request_ctx *rctx = skcipher_request_ctx(req);
u8 digest[POLYVAL_DIGEST_SIZE];
int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
- int err;
// Requests must be at least one block
if (req->cryptlen < BLOCKCIPHER_BLOCK_SIZE)
return -EINVAL;
@@ -285,16 +228,12 @@ static int hctr2_crypt(struct skcipher_request *req, bool enc)
rctx->bulk_part_dst = scatterwalk_ffwd(rctx->sg_dst, req->dst,
BLOCKCIPHER_BLOCK_SIZE);
// MM = M ^ H(T || N)
// or UU = U ^ H(T || V)
- err = hctr2_hash_tweak(req);
- if (err)
- return err;
- err = hctr2_hash_message(req, rctx->bulk_part_src, digest);
- if (err)
- return err;
+ hctr2_hash_tweak(req);
+ hctr2_hash_message(req, rctx->bulk_part_src, digest);
crypto_xor(digest, rctx->first_block, BLOCKCIPHER_BLOCK_SIZE);
// UU = E(MM)
// or MM = D(UU)
if (enc)
@@ -336,12 +275,10 @@ static int hctr2_init_tfm(struct crypto_skcipher *tfm)
struct skcipher_instance *inst = skcipher_alg_instance(tfm);
struct hctr2_instance_ctx *ictx = skcipher_instance_ctx(inst);
struct hctr2_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *xctr;
struct crypto_cipher *blockcipher;
- struct crypto_shash *polyval;
- unsigned int subreq_size;
int err;
xctr = crypto_spawn_skcipher(&ictx->xctr_spawn);
if (IS_ERR(xctr))
return PTR_ERR(xctr);
@@ -350,35 +287,21 @@ static int hctr2_init_tfm(struct crypto_skcipher *tfm)
if (IS_ERR(blockcipher)) {
err = PTR_ERR(blockcipher);
goto err_free_xctr;
}
- polyval = crypto_spawn_shash(&ictx->polyval_spawn);
- if (IS_ERR(polyval)) {
- err = PTR_ERR(polyval);
- goto err_free_blockcipher;
- }
-
tctx->xctr = xctr;
tctx->blockcipher = blockcipher;
- tctx->polyval = polyval;
BUILD_BUG_ON(offsetofend(struct hctr2_request_ctx, u) !=
sizeof(struct hctr2_request_ctx));
- subreq_size = max(sizeof_field(struct hctr2_request_ctx, u.hash_desc) +
- crypto_shash_descsize(polyval),
- sizeof_field(struct hctr2_request_ctx, u.xctr_req) +
- crypto_skcipher_reqsize(xctr));
-
- tctx->hashed_tweak_offset = offsetof(struct hctr2_request_ctx, u) +
- subreq_size;
- crypto_skcipher_set_reqsize(tfm, tctx->hashed_tweak_offset +
- crypto_shash_statesize(polyval));
+ crypto_skcipher_set_reqsize(
+ tfm, max(sizeof(struct hctr2_request_ctx),
+ offsetofend(struct hctr2_request_ctx, u.xctr_req) +
+ crypto_skcipher_reqsize(xctr)));
return 0;
-err_free_blockcipher:
- crypto_free_cipher(blockcipher);
err_free_xctr:
crypto_free_skcipher(xctr);
return err;
}
@@ -386,34 +309,29 @@ static void hctr2_exit_tfm(struct crypto_skcipher *tfm)
{
struct hctr2_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
crypto_free_cipher(tctx->blockcipher);
crypto_free_skcipher(tctx->xctr);
- crypto_free_shash(tctx->polyval);
}
static void hctr2_free_instance(struct skcipher_instance *inst)
{
struct hctr2_instance_ctx *ictx = skcipher_instance_ctx(inst);
crypto_drop_cipher(&ictx->blockcipher_spawn);
crypto_drop_skcipher(&ictx->xctr_spawn);
- crypto_drop_shash(&ictx->polyval_spawn);
kfree(inst);
}
-static int hctr2_create_common(struct crypto_template *tmpl,
- struct rtattr **tb,
- const char *xctr_name,
- const char *polyval_name)
+static int hctr2_create_common(struct crypto_template *tmpl, struct rtattr **tb,
+ const char *xctr_name)
{
struct skcipher_alg_common *xctr_alg;
u32 mask;
struct skcipher_instance *inst;
struct hctr2_instance_ctx *ictx;
struct crypto_alg *blockcipher_alg;
- struct shash_alg *polyval_alg;
char blockcipher_name[CRYPTO_MAX_ALG_NAME];
int len;
int err;
err = crypto_check_attr_type(tb, CRYPTO_ALG_TYPE_SKCIPHER, &mask);
@@ -455,46 +373,27 @@ static int hctr2_create_common(struct crypto_template *tmpl,
/* Require blocksize of 16 bytes */
err = -EINVAL;
if (blockcipher_alg->cra_blocksize != BLOCKCIPHER_BLOCK_SIZE)
goto err_free_inst;
- /* Polyval ε-∆U hash function */
- err = crypto_grab_shash(&ictx->polyval_spawn,
- skcipher_crypto_instance(inst),
- polyval_name, 0, mask);
- if (err)
- goto err_free_inst;
- polyval_alg = crypto_spawn_shash_alg(&ictx->polyval_spawn);
-
- /* Ensure Polyval is being used */
- err = -EINVAL;
- if (strcmp(polyval_alg->base.cra_name, "polyval") != 0)
- goto err_free_inst;
-
/* Instance fields */
err = -ENAMETOOLONG;
if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, "hctr2(%s)",
blockcipher_alg->cra_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME,
- "hctr2_base(%s,%s)",
- xctr_alg->base.cra_driver_name,
- polyval_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
+ "hctr2_base(%s,polyval-lib)",
+ xctr_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
inst->alg.base.cra_blocksize = BLOCKCIPHER_BLOCK_SIZE;
- inst->alg.base.cra_ctxsize = sizeof(struct hctr2_tfm_ctx) +
- polyval_alg->statesize * 2;
+ inst->alg.base.cra_ctxsize = sizeof(struct hctr2_tfm_ctx);
inst->alg.base.cra_alignmask = xctr_alg->base.cra_alignmask;
- /*
- * The hash function is called twice, so it is weighted higher than the
- * xctr and blockcipher.
- */
inst->alg.base.cra_priority = (2 * xctr_alg->base.cra_priority +
- 4 * polyval_alg->base.cra_priority +
- blockcipher_alg->cra_priority) / 7;
+ blockcipher_alg->cra_priority) /
+ 3;
inst->alg.setkey = hctr2_setkey;
inst->alg.encrypt = hctr2_encrypt;
inst->alg.decrypt = hctr2_decrypt;
inst->alg.init = hctr2_init_tfm;
@@ -523,12 +422,15 @@ static int hctr2_create_base(struct crypto_template *tmpl, struct rtattr **tb)
return PTR_ERR(xctr_name);
polyval_name = crypto_attr_alg_name(tb[2]);
if (IS_ERR(polyval_name))
return PTR_ERR(polyval_name);
+ if (strcmp(polyval_name, "polyval") != 0 &&
+ strcmp(polyval_name, "polyval-lib") != 0)
+ return -ENOENT;
- return hctr2_create_common(tmpl, tb, xctr_name, polyval_name);
+ return hctr2_create_common(tmpl, tb, xctr_name);
}
static int hctr2_create(struct crypto_template *tmpl, struct rtattr **tb)
{
const char *blockcipher_name;
@@ -540,11 +442,11 @@ static int hctr2_create(struct crypto_template *tmpl, struct rtattr **tb)
if (snprintf(xctr_name, CRYPTO_MAX_ALG_NAME, "xctr(%s)",
blockcipher_name) >= CRYPTO_MAX_ALG_NAME)
return -ENAMETOOLONG;
- return hctr2_create_common(tmpl, tb, xctr_name, "polyval");
+ return hctr2_create_common(tmpl, tb, xctr_name);
}
static struct crypto_template hctr2_tmpls[] = {
{
/* hctr2_base(xctr_name, polyval_name) */
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 90d06c3ec967..499e979a56dc 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -5057,12 +5057,11 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(ghash_tv_template)
}
}, {
.alg = "hctr2(aes)",
- .generic_driver =
- "hctr2_base(xctr(aes-generic),polyval-generic)",
+ .generic_driver = "hctr2_base(xctr(aes-generic),polyval-lib)",
.test = alg_test_skcipher,
.suite = {
.cipher = __VECS(aes_hctr2_tv_template)
}
}, {
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/9] crypto: polyval - Remove the polyval crypto_shash
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (5 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 6/9] crypto: hctr2 - Convert to use POLYVAL library Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 8/9] crypto: testmgr - Remove polyval tests Eric Biggers
` (3 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
Remove polyval support from crypto_shash. It no longer has any user now
that the HCTR2 code uses the POLYVAL library instead.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
crypto/Kconfig | 10 --
crypto/Makefile | 1 -
crypto/polyval-generic.c | 205 ---------------------------------------
3 files changed, 216 deletions(-)
delete mode 100644 crypto/polyval-generic.c
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 805172f75bf1..bf8b8a60a0c0 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -946,20 +946,10 @@ config CRYPTO_MICHAEL_MIC
known as WPA (Wif-Fi Protected Access).
This algorithm is required for TKIP, but it should not be used for
other purposes because of the weakness of the algorithm.
-config CRYPTO_POLYVAL
- tristate
- select CRYPTO_HASH
- select CRYPTO_LIB_GF128MUL
- help
- POLYVAL hash function for HCTR2
-
- This is used in HCTR2. It is not a general-purpose
- cryptographic hash function.
-
config CRYPTO_RMD160
tristate "RIPEMD-160"
select CRYPTO_HASH
help
RIPEMD-160 hash function (ISO/IEC 10118-3)
diff --git a/crypto/Makefile b/crypto/Makefile
index 0388ff8d219d..093c56a45d3f 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -170,11 +170,10 @@ KASAN_SANITIZE_jitterentropy.o = n
UBSAN_SANITIZE_jitterentropy.o = n
jitterentropy_rng-y := jitterentropy.o jitterentropy-kcapi.o
obj-$(CONFIG_CRYPTO_JITTERENTROPY_TESTINTERFACE) += jitterentropy-testing.o
obj-$(CONFIG_CRYPTO_BENCHMARK) += tcrypt.o
obj-$(CONFIG_CRYPTO_GHASH) += ghash-generic.o
-obj-$(CONFIG_CRYPTO_POLYVAL) += polyval-generic.o
obj-$(CONFIG_CRYPTO_USER_API) += af_alg.o
obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
diff --git a/crypto/polyval-generic.c b/crypto/polyval-generic.c
deleted file mode 100644
index fe5b01a4000d..000000000000
--- a/crypto/polyval-generic.c
+++ /dev/null
@@ -1,205 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * POLYVAL: hash function for HCTR2.
- *
- * Copyright (c) 2007 Nokia Siemens Networks - Mikko Herranen <mh1@iki.fi>
- * Copyright (c) 2009 Intel Corp.
- * Author: Huang Ying <ying.huang@intel.com>
- * Copyright 2021 Google LLC
- */
-
-/*
- * Code based on crypto/ghash-generic.c
- *
- * POLYVAL is a keyed hash function similar to GHASH. POLYVAL uses a different
- * modulus for finite field multiplication which makes hardware accelerated
- * implementations on little-endian machines faster. POLYVAL is used in the
- * kernel to implement HCTR2, but was originally specified for AES-GCM-SIV
- * (RFC 8452).
- *
- * For more information see:
- * Length-preserving encryption with HCTR2:
- * https://eprint.iacr.org/2021/1441.pdf
- * AES-GCM-SIV: Nonce Misuse-Resistant Authenticated Encryption:
- * https://datatracker.ietf.org/doc/html/rfc8452
- *
- * Like GHASH, POLYVAL is not a cryptographic hash function and should
- * not be used outside of crypto modes explicitly designed to use POLYVAL.
- *
- * This implementation uses a convenient trick involving the GHASH and POLYVAL
- * fields. This trick allows multiplication in the POLYVAL field to be
- * implemented by using multiplication in the GHASH field as a subroutine. An
- * element of the POLYVAL field can be converted to an element of the GHASH
- * field by computing x*REVERSE(a), where REVERSE reverses the byte-ordering of
- * a. Similarly, an element of the GHASH field can be converted back to the
- * POLYVAL field by computing REVERSE(x^{-1}*a). For more information, see:
- * https://datatracker.ietf.org/doc/html/rfc8452#appendix-A
- *
- * By using this trick, we do not need to implement the POLYVAL field for the
- * generic implementation.
- *
- * Warning: this generic implementation is not intended to be used in practice
- * and is not constant time. For practical use, a hardware accelerated
- * implementation of POLYVAL should be used instead.
- *
- */
-
-#include <crypto/gf128mul.h>
-#include <crypto/internal/hash.h>
-#include <crypto/polyval.h>
-#include <crypto/utils.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-#include <linux/unaligned.h>
-
-struct polyval_tfm_ctx {
- struct gf128mul_4k *gf128;
-};
-
-struct polyval_desc_ctx {
- union {
- u8 buffer[POLYVAL_BLOCK_SIZE];
- be128 buffer128;
- };
-};
-
-static void copy_and_reverse(u8 dst[POLYVAL_BLOCK_SIZE],
- const u8 src[POLYVAL_BLOCK_SIZE])
-{
- u64 a = get_unaligned((const u64 *)&src[0]);
- u64 b = get_unaligned((const u64 *)&src[8]);
-
- put_unaligned(swab64(a), (u64 *)&dst[8]);
- put_unaligned(swab64(b), (u64 *)&dst[0]);
-}
-
-static int polyval_setkey(struct crypto_shash *tfm,
- const u8 *key, unsigned int keylen)
-{
- struct polyval_tfm_ctx *ctx = crypto_shash_ctx(tfm);
- be128 k;
-
- if (keylen != POLYVAL_BLOCK_SIZE)
- return -EINVAL;
-
- gf128mul_free_4k(ctx->gf128);
-
- BUILD_BUG_ON(sizeof(k) != POLYVAL_BLOCK_SIZE);
- copy_and_reverse((u8 *)&k, key);
- gf128mul_x_lle(&k, &k);
-
- ctx->gf128 = gf128mul_init_4k_lle(&k);
- memzero_explicit(&k, POLYVAL_BLOCK_SIZE);
-
- if (!ctx->gf128)
- return -ENOMEM;
-
- return 0;
-}
-
-static int polyval_generic_init(struct shash_desc *desc)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
-
- memset(dctx, 0, sizeof(*dctx));
-
- return 0;
-}
-
-static int polyval_generic_update(struct shash_desc *desc,
- const u8 *src, unsigned int srclen)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
- const struct polyval_tfm_ctx *ctx = crypto_shash_ctx(desc->tfm);
- u8 tmp[POLYVAL_BLOCK_SIZE];
-
- do {
- copy_and_reverse(tmp, src);
- crypto_xor(dctx->buffer, tmp, POLYVAL_BLOCK_SIZE);
- gf128mul_4k_lle(&dctx->buffer128, ctx->gf128);
- src += POLYVAL_BLOCK_SIZE;
- srclen -= POLYVAL_BLOCK_SIZE;
- } while (srclen >= POLYVAL_BLOCK_SIZE);
-
- return srclen;
-}
-
-static int polyval_finup(struct shash_desc *desc, const u8 *src,
- unsigned int len, u8 *dst)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
-
- if (len) {
- u8 tmp[POLYVAL_BLOCK_SIZE] = {};
-
- memcpy(tmp, src, len);
- polyval_generic_update(desc, tmp, POLYVAL_BLOCK_SIZE);
- }
- copy_and_reverse(dst, dctx->buffer);
- return 0;
-}
-
-static int polyval_export(struct shash_desc *desc, void *out)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
-
- copy_and_reverse(out, dctx->buffer);
- return 0;
-}
-
-static int polyval_import(struct shash_desc *desc, const void *in)
-{
- struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
-
- copy_and_reverse(dctx->buffer, in);
- return 0;
-}
-
-static void polyval_exit_tfm(struct crypto_shash *tfm)
-{
- struct polyval_tfm_ctx *ctx = crypto_shash_ctx(tfm);
-
- gf128mul_free_4k(ctx->gf128);
-}
-
-static struct shash_alg polyval_alg = {
- .digestsize = POLYVAL_DIGEST_SIZE,
- .init = polyval_generic_init,
- .update = polyval_generic_update,
- .finup = polyval_finup,
- .setkey = polyval_setkey,
- .export = polyval_export,
- .import = polyval_import,
- .exit_tfm = polyval_exit_tfm,
- .statesize = sizeof(struct polyval_desc_ctx),
- .descsize = sizeof(struct polyval_desc_ctx),
- .base = {
- .cra_name = "polyval",
- .cra_driver_name = "polyval-generic",
- .cra_priority = 100,
- .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY,
- .cra_blocksize = POLYVAL_BLOCK_SIZE,
- .cra_ctxsize = sizeof(struct polyval_tfm_ctx),
- .cra_module = THIS_MODULE,
- },
-};
-
-static int __init polyval_mod_init(void)
-{
- return crypto_register_shash(&polyval_alg);
-}
-
-static void __exit polyval_mod_exit(void)
-{
- crypto_unregister_shash(&polyval_alg);
-}
-
-module_init(polyval_mod_init);
-module_exit(polyval_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("POLYVAL hash function");
-MODULE_ALIAS_CRYPTO("polyval");
-MODULE_ALIAS_CRYPTO("polyval-generic");
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 8/9] crypto: testmgr - Remove polyval tests
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (6 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 7/9] crypto: polyval - Remove the polyval crypto_shash Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-09 23:47 ` [PATCH 9/9] fscrypt: Drop obsolete recommendation to enable optimized POLYVAL Eric Biggers
` (2 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
These are no longer used, since polyval support has been removed from
the crypto_shash API.
POLYVAL remains supported via lib/crypto/, where it has a KUnit test
suite instead.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
crypto/tcrypt.c | 4 --
crypto/testmgr.c | 6 --
crypto/testmgr.h | 171 -----------------------------------------------
3 files changed, 181 deletions(-)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index d1d88debbd71..32d9eaf2c8af 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1688,14 +1688,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
case 56:
ret = min(ret, tcrypt_test("ccm(sm4)"));
break;
- case 57:
- ret = min(ret, tcrypt_test("polyval"));
- break;
-
case 58:
ret = min(ret, tcrypt_test("gcm(aria)"));
break;
case 59:
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 499e979a56dc..6fb53978df11 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -5368,16 +5368,10 @@ static const struct alg_test_desc alg_test_descs[] = {
.fips_allowed = 1,
}, {
.alg = "pkcs1pad(rsa)",
.test = alg_test_null,
.fips_allowed = 1,
- }, {
- .alg = "polyval",
- .test = alg_test_hash,
- .suite = {
- .hash = __VECS(polyval_tv_template)
- }
}, {
.alg = "rfc3686(ctr(aes))",
.test = alg_test_skcipher,
.fips_allowed = 1,
.suite = {
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 268231227282..a3e4695945ca 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -36233,181 +36233,10 @@ static const struct cipher_testvec aes_xctr_tv_template[] = {
.len = 512,
},
};
-/*
- * Test vectors generated using https://github.com/google/hctr2
- *
- * To ensure compatibility with RFC 8452, some tests were sourced from
- * https://datatracker.ietf.org/doc/html/rfc8452
- */
-static const struct hash_testvec polyval_tv_template[] = {
- { // From RFC 8452
- .key = "\x31\x07\x28\xd9\x91\x1f\x1f\x38"
- "\x37\xb2\x43\x16\xc3\xfa\xb9\xa0",
- .plaintext = "\x65\x78\x61\x6d\x70\x6c\x65\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x48\x65\x6c\x6c\x6f\x20\x77\x6f"
- "\x72\x6c\x64\x00\x00\x00\x00\x00"
- "\x38\x00\x00\x00\x00\x00\x00\x00"
- "\x58\x00\x00\x00\x00\x00\x00\x00",
- .digest = "\xad\x7f\xcf\x0b\x51\x69\x85\x16"
- "\x62\x67\x2f\x3c\x5f\x95\x13\x8f",
- .psize = 48,
- .ksize = 16,
- },
- { // From RFC 8452
- .key = "\xd9\xb3\x60\x27\x96\x94\x94\x1a"
- "\xc5\xdb\xc6\x98\x7a\xda\x73\x77",
- .plaintext = "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00",
- .digest = "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00",
- .psize = 16,
- .ksize = 16,
- },
- { // From RFC 8452
- .key = "\xd9\xb3\x60\x27\x96\x94\x94\x1a"
- "\xc5\xdb\xc6\x98\x7a\xda\x73\x77",
- .plaintext = "\x01\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x40\x00\x00\x00\x00\x00\x00\x00",
- .digest = "\xeb\x93\xb7\x74\x09\x62\xc5\xe4"
- "\x9d\x2a\x90\xa7\xdc\x5c\xec\x74",
- .psize = 32,
- .ksize = 16,
- },
- { // From RFC 8452
- .key = "\xd9\xb3\x60\x27\x96\x94\x94\x1a"
- "\xc5\xdb\xc6\x98\x7a\xda\x73\x77",
- .plaintext = "\x01\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x02\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x03\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x80\x01\x00\x00\x00\x00\x00\x00",
- .digest = "\x81\x38\x87\x46\xbc\x22\xd2\x6b"
- "\x2a\xbc\x3d\xcb\x15\x75\x42\x22",
- .psize = 64,
- .ksize = 16,
- },
- { // From RFC 8452
- .key = "\xd9\xb3\x60\x27\x96\x94\x94\x1a"
- "\xc5\xdb\xc6\x98\x7a\xda\x73\x77",
- .plaintext = "\x01\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x02\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x03\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x04\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x02\x00\x00\x00\x00\x00\x00",
- .digest = "\x1e\x39\xb6\xd3\x34\x4d\x34\x8f"
- "\x60\x44\xf8\x99\x35\xd1\xcf\x78",
- .psize = 80,
- .ksize = 16,
- },
- { // From RFC 8452
- .key = "\xd9\xb3\x60\x27\x96\x94\x94\x1a"
- "\xc5\xdb\xc6\x98\x7a\xda\x73\x77",
- .plaintext = "\x01\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x02\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x03\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x04\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x05\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x08\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x02\x00\x00\x00\x00\x00\x00",
- .digest = "\xff\xcd\x05\xd5\x77\x0f\x34\xad"
- "\x92\x67\xf0\xa5\x99\x94\xb1\x5a",
- .psize = 96,
- .ksize = 16,
- },
- { // Random ( 1)
- .key = "\x90\xcc\xac\xee\xba\xd7\xd4\x68"
- "\x98\xa6\x79\x70\xdf\x66\x15\x6c",
- .plaintext = "",
- .digest = "\x00\x00\x00\x00\x00\x00\x00\x00"
- "\x00\x00\x00\x00\x00\x00\x00\x00",
- .psize = 0,
- .ksize = 16,
- },
- { // Random ( 1)
- .key = "\xc1\x45\x71\xf0\x30\x07\x94\xe7"
- "\x3a\xdd\xe4\xc6\x19\x2d\x02\xa2",
- .plaintext = "\xc1\x5d\x47\xc7\x4c\x7c\x5e\x07"
- "\x85\x14\x8f\x79\xcc\x73\x83\xf7"
- "\x35\xb8\xcb\x73\x61\xf0\x53\x31"
- "\xbf\x84\xde\xb6\xde\xaf\xb0\xb8"
- "\xb7\xd9\x11\x91\x89\xfd\x1e\x4c"
- "\x84\x4a\x1f\x2a\x87\xa4\xaf\x62"
- "\x8d\x7d\x58\xf6\x43\x35\xfc\x53"
- "\x8f\x1a\xf6\x12\xe1\x13\x3f\x66"
- "\x91\x4b\x13\xd6\x45\xfb\xb0\x7a"
- "\xe0\x8b\x8e\x99\xf7\x86\x46\x37"
- "\xd1\x22\x9e\x52\xf3\x3f\xd9\x75"
- "\x2c\x2c\xc6\xbb\x0e\x08\x14\x29"
- "\xe8\x50\x2f\xd8\xbe\xf4\xe9\x69"
- "\x4a\xee\xf7\xae\x15\x65\x35\x1e",
- .digest = "\x00\x4f\x5d\xe9\x3b\xc0\xd6\x50"
- "\x3e\x38\x73\x86\xc6\xda\xca\x7f",
- .psize = 112,
- .ksize = 16,
- },
- { // Random ( 1)
- .key = "\x37\xbe\x68\x16\x50\xb9\x4e\xb0"
- "\x47\xde\xe2\xbd\xde\xe4\x48\x09",
- .plaintext = "\x87\xfc\x68\x9f\xff\xf2\x4a\x1e"
- "\x82\x3b\x73\x8f\xc1\xb2\x1b\x7a"
- "\x6c\x4f\x81\xbc\x88\x9b\x6c\xa3"
- "\x9c\xc2\xa5\xbc\x14\x70\x4c\x9b"
- "\x0c\x9f\x59\x92\x16\x4b\x91\x3d"
- "\x18\x55\x22\x68\x12\x8c\x63\xb2"
- "\x51\xcb\x85\x4b\xd2\xae\x0b\x1c"
- "\x5d\x28\x9d\x1d\xb1\xc8\xf0\x77"
- "\xe9\xb5\x07\x4e\x06\xc8\xee\xf8"
- "\x1b\xed\x72\x2a\x55\x7d\x16\xc9"
- "\xf2\x54\xe7\xe9\xe0\x44\x5b\x33"
- "\xb1\x49\xee\xff\x43\xfb\x82\xcd"
- "\x4a\x70\x78\x81\xa4\x34\x36\xe8"
- "\x4c\x28\x54\xa6\x6c\xc3\x6b\x78"
- "\xe7\xc0\x5d\xc6\x5d\x81\xab\x70"
- "\x08\x86\xa1\xfd\xf4\x77\x55\xfd"
- "\xa3\xe9\xe2\x1b\xdf\x99\xb7\x80"
- "\xf9\x0a\x4f\x72\x4a\xd3\xaf\xbb"
- "\xb3\x3b\xeb\x08\x58\x0f\x79\xce"
- "\xa5\x99\x05\x12\x34\xd4\xf4\x86"
- "\x37\x23\x1d\xc8\x49\xc0\x92\xae"
- "\xa6\xac\x9b\x31\x55\xed\x15\xc6"
- "\x05\x17\x37\x8d\x90\x42\xe4\x87"
- "\x89\x62\x88\x69\x1c\x6a\xfd\xe3"
- "\x00\x2b\x47\x1a\x73\xc1\x51\xc2"
- "\xc0\x62\x74\x6a\x9e\xb2\xe5\x21"
- "\xbe\x90\xb5\xb0\x50\xca\x88\x68"
- "\xe1\x9d\x7a\xdf\x6c\xb7\xb9\x98"
- "\xee\x28\x62\x61\x8b\xd1\x47\xf9"
- "\x04\x7a\x0b\x5d\xcd\x2b\x65\xf5"
- "\x12\xa3\xfe\x1a\xaa\x2c\x78\x42"
- "\xb8\xbe\x7d\x74\xeb\x59\xba\xba",
- .digest = "\xae\x11\xd4\x60\x2a\x5f\x9e\x42"
- "\x89\x04\xc2\x34\x8d\x55\x94\x0a",
- .psize = 256,
- .ksize = 16,
- },
-
-};
-
/*
* Test vectors generated using https://github.com/google/hctr2
*/
static const struct cipher_testvec aes_hctr2_tv_template[] = {
{
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 9/9] fscrypt: Drop obsolete recommendation to enable optimized POLYVAL
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (7 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 8/9] crypto: testmgr - Remove polyval tests Eric Biggers
@ 2025-11-09 23:47 ` Eric Biggers
2025-11-10 15:51 ` [PATCH 0/9] POLYVAL library Ard Biesheuvel
2025-11-11 19:28 ` Eric Biggers
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-09 23:47 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86, Eric Biggers
CONFIG_CRYPTO_POLYVAL_ARM64_CE and CONFIG_CRYPTO_POLYVAL_CLMUL_NI no
longer exist. The architecture-optimized POLYVAL code is now just
enabled automatically when HCTR2 support is enabled. Update the fscrypt
documentation accordingly.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
Documentation/filesystems/fscrypt.rst | 2 --
1 file changed, 2 deletions(-)
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 696a5844bfa3..70af896822e1 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -448,13 +448,11 @@ API, but the filenames mode still does.
- AES-256-HCTR2
- Mandatory:
- CONFIG_CRYPTO_HCTR2
- Recommended:
- arm64: CONFIG_CRYPTO_AES_ARM64_CE_BLK
- - arm64: CONFIG_CRYPTO_POLYVAL_ARM64_CE
- x86: CONFIG_CRYPTO_AES_NI_INTEL
- - x86: CONFIG_CRYPTO_POLYVAL_CLMUL_NI
- Adiantum
- Mandatory:
- CONFIG_CRYPTO_ADIANTUM
- Recommended:
--
2.51.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library
2025-11-09 23:47 ` [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library Eric Biggers
@ 2025-11-10 15:21 ` Ard Biesheuvel
2025-11-11 7:42 ` Ard Biesheuvel
0 siblings, 1 reply; 16+ messages in thread
From: Ard Biesheuvel @ 2025-11-10 15:21 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-crypto, linux-kernel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86
Hi,
On Mon, 10 Nov 2025 at 00:49, Eric Biggers <ebiggers@kernel.org> wrote:
>
> Add support for POLYVAL to lib/crypto/.
>
> This will replace the polyval crypto_shash algorithm and its use in the
> hctr2 template, simplifying the code and reducing overhead.
>
> Specifically, this commit introduces the POLYVAL library API and a
> generic implementation of it. Later commits will migrate the existing
> architecture-optimized implementations of POLYVAL into lib/crypto/ and
> add a KUnit test suite.
>
> I've also rewritten the generic implementation completely, using a more
> modern approach instead of the traditional table-based approach. It's
> now constant-time, requires no precomputation or dynamic memory
> allocations, decreases the per-key memory usage from 4096 bytes to 16
> bytes, and is faster than the old polyval-generic even on bulk data
> reusing the same key (at least on x86_64, where I measured 15% faster).
> We should do this for GHASH too, but for now just do it for POLYVAL.
>
Very nice.
GHASH might suffer on 32-bit, I suppose, but taking this approach at
least on 64-bit also for GHASH would be a huge improvement.
I had a stab at replacing the int128 arithmetic with
__builtin_bitreverse64(), but it seems to make little difference (and
GCC does not support it [yet]). I've tried both arm64 and x86, and the
perf delta (using your kunit benchmark) is negligible in either case.
(FYI)
> Signed-off-by: Eric Biggers <ebiggers@kernel.org>
> ---
> include/crypto/polyval.h | 171 +++++++++++++++++++++-
> lib/crypto/Kconfig | 10 ++
> lib/crypto/Makefile | 8 +
> lib/crypto/polyval.c | 307 +++++++++++++++++++++++++++++++++++++++
> 4 files changed, 493 insertions(+), 3 deletions(-)
> create mode 100644 lib/crypto/polyval.c
>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/9] POLYVAL library
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (8 preceding siblings ...)
2025-11-09 23:47 ` [PATCH 9/9] fscrypt: Drop obsolete recommendation to enable optimized POLYVAL Eric Biggers
@ 2025-11-10 15:51 ` Ard Biesheuvel
2025-11-11 19:28 ` Eric Biggers
10 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2025-11-10 15:51 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-crypto, linux-kernel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86
On Mon, 10 Nov 2025 at 00:49, Eric Biggers <ebiggers@kernel.org> wrote:
>
> This series is targeting libcrypto-next. It can also be retrieved from:
>
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git polyval-lib-v1
>
> This series migrates the POLYVAL code to lib/crypto/. It turns out that
> just like Poly1305, the library is a much better fit for it.
>
> This series also replaces the generic implementation of POLYVAL with a
> much better one.
>
> Notably, this series improves the performance of HCTR2, since it
> eliminates unnecessary overhead that was being incurred by accessing
> POLYVAL via the crypto_shash API. I see a 45% increase in throughput
> with 64-byte messages, 53% with 128-byte, or 6% with 4096-byte.
>
> It also eliminates the need to explicitly enable the optimized POLYVAL
> code, as it's now enabled automatically when HCTR2 support is enabled.
>
> Eric Biggers (9):
> crypto: polyval - Rename conflicting functions
> lib/crypto: polyval: Add POLYVAL library
> lib/crypto: tests: Add KUnit tests for POLYVAL
> lib/crypto: arm64/polyval: Migrate optimized code into library
> lib/crypto: x86/polyval: Migrate optimized code into library
> crypto: hctr2 - Convert to use POLYVAL library
> crypto: polyval - Remove the polyval crypto_shash
> crypto: testmgr - Remove polyval tests
> fscrypt: Drop obsolete recommendation to enable optimized POLYVAL
>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library
2025-11-10 15:21 ` Ard Biesheuvel
@ 2025-11-11 7:42 ` Ard Biesheuvel
2025-11-11 19:46 ` Eric Biggers
0 siblings, 1 reply; 16+ messages in thread
From: Ard Biesheuvel @ 2025-11-11 7:42 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-crypto, linux-kernel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86
On Mon, 10 Nov 2025 at 16:21, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Hi,
>
> On Mon, 10 Nov 2025 at 00:49, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Add support for POLYVAL to lib/crypto/.
> >
> > This will replace the polyval crypto_shash algorithm and its use in the
> > hctr2 template, simplifying the code and reducing overhead.
> >
> > Specifically, this commit introduces the POLYVAL library API and a
> > generic implementation of it. Later commits will migrate the existing
> > architecture-optimized implementations of POLYVAL into lib/crypto/ and
> > add a KUnit test suite.
> >
> > I've also rewritten the generic implementation completely, using a more
> > modern approach instead of the traditional table-based approach. It's
> > now constant-time, requires no precomputation or dynamic memory
> > allocations, decreases the per-key memory usage from 4096 bytes to 16
> > bytes, and is faster than the old polyval-generic even on bulk data
> > reusing the same key (at least on x86_64, where I measured 15% faster).
> > We should do this for GHASH too, but for now just do it for POLYVAL.
> >
>
> Very nice.
>
> GHASH might suffer on 32-bit, I suppose, but taking this approach at
> least on 64-bit also for GHASH would be a huge improvement.
>
> I had a stab at replacing the int128 arithmetic with
> __builtin_bitreverse64(), but it seems to make little difference (and
> GCC does not support it [yet]). I've tried both arm64 and x86, and the
> perf delta (using your kunit benchmark) is negligible in either case.
Sigh. I intended to only apply the generic patch and the kunit test,
but applied the whole series in the end, which explains perfectly why
x86_64 and arm64 performance are identical, given that the generic
code isn't even used.
So trying this again, on a Cortex-A72 without Crypto Extensions, I do
get a ~30% performance improvement doing the below. I haven't
re-tested x86, but given that it does not appear to have a native
scalar bit reverse instruction (or __builtin_bitreverse64() is broken
for it), there is probably no point in finding out.
Not saying we should do this for POLYVAL, but something to keep in
mind for gf128mul.c perhaps.
--- a/lib/crypto/polyval.c
+++ b/lib/crypto/polyval.c
@@ -42,11 +42,48 @@
* 256-bit => 128-bit reduction algorithm.
*/
-#ifdef CONFIG_ARCH_SUPPORTS_INT128
+#if defined(CONFIG_ARCH_SUPPORTS_INT128) ||
__has_builtin(__builtin_bitreverse64)
/* Do a 64 x 64 => 128 bit carryless multiplication. */
static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
{
+ u64 a0 = a & 0x1111111111111111;
+ u64 a1 = a & 0x2222222222222222;
+ u64 a2 = a & 0x4444444444444444;
+ u64 a3 = a & 0x8888888888888888;
+
+ u64 b0 = b & 0x1111111111111111;
+ u64 b1 = b & 0x2222222222222222;
+ u64 b2 = b & 0x4444444444444444;
+ u64 b3 = b & 0x8888888888888888;
+
+#if __has_builtin(__builtin_bitreverse64)
+#define brev64 __builtin_bitreverse64
+ u64 c0 = (a0 * b0) ^ (a1 * b3) ^ (a2 * b2) ^ (a3 * b1);
+ u64 c1 = (a0 * b1) ^ (a1 * b0) ^ (a2 * b3) ^ (a3 * b2);
+ u64 c2 = (a0 * b2) ^ (a1 * b1) ^ (a2 * b0) ^ (a3 * b3);
+ u64 c3 = (a0 * b3) ^ (a1 * b2) ^ (a2 * b1) ^ (a3 * b0);
+
+ a0 = brev64(a0);
+ a1 = brev64(a1);
+ a2 = brev64(a2);
+ a3 = brev64(a3);
+
+ b0 = brev64(b0);
+ b1 = brev64(b1);
+ b2 = brev64(b2);
+ b3 = brev64(b3);
+
+ u64 d0 = (a0 * b0) ^ (a1 * b3) ^ (a2 * b2) ^ (a3 * b1);
+ u64 d1 = (a0 * b1) ^ (a1 * b0) ^ (a2 * b3) ^ (a3 * b2);
+ u64 d2 = (a0 * b2) ^ (a1 * b1) ^ (a2 * b0) ^ (a3 * b3);
+ u64 d3 = (a0 * b3) ^ (a1 * b2) ^ (a2 * b1) ^ (a3 * b0);
+
+ *out_hi = ((brev64(d0) >> 1) & 0x1111111111111111) ^
+ ((brev64(d1) >> 1) & 0x2222222222222222) ^
+ ((brev64(d2) >> 1) & 0x4444444444444444) ^
+ ((brev64(d3) >> 1) & 0x8888888888888888);
+#else
/*
* With 64-bit multiplicands and one term every 4 bits, there would be
* up to 64 / 4 = 16 one bits per column when each multiplication is
@@ -60,15 +97,10 @@ static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
* Instead, mask off 4 bits from one multiplicand, giving a max of 15
* one bits per column. Then handle those 4 bits separately.
*/
- u64 a0 = a & 0x1111111111111110;
- u64 a1 = a & 0x2222222222222220;
- u64 a2 = a & 0x4444444444444440;
- u64 a3 = a & 0x8888888888888880;
-
- u64 b0 = b & 0x1111111111111111;
- u64 b1 = b & 0x2222222222222222;
- u64 b2 = b & 0x4444444444444444;
- u64 b3 = b & 0x8888888888888888;
+ a0 &= ~0xfULL;
+ a1 &= ~0xfULL;
+ a2 &= ~0xfULL;
+ a3 &= ~0xfULL;
/* Multiply the high 60 bits of @a by @b. */
u128 c0 = (a0 * (u128)b0) ^ (a1 * (u128)b3) ^
@@ -85,18 +117,20 @@ static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
u64 e1 = -((a >> 1) & 1) & b;
u64 e2 = -((a >> 2) & 1) & b;
u64 e3 = -((a >> 3) & 1) & b;
- u64 extra_lo = e0 ^ (e1 << 1) ^ (e2 << 2) ^ (e3 << 3);
- u64 extra_hi = (e1 >> 63) ^ (e2 >> 62) ^ (e3 >> 61);
/* Add all the intermediate products together. */
- *out_lo = (((u64)c0) & 0x1111111111111111) ^
- (((u64)c1) & 0x2222222222222222) ^
- (((u64)c2) & 0x4444444444444444) ^
- (((u64)c3) & 0x8888888888888888) ^ extra_lo;
*out_hi = (((u64)(c0 >> 64)) & 0x1111111111111111) ^
(((u64)(c1 >> 64)) & 0x2222222222222222) ^
(((u64)(c2 >> 64)) & 0x4444444444444444) ^
- (((u64)(c3 >> 64)) & 0x8888888888888888) ^ extra_hi;
+ (((u64)(c3 >> 64)) & 0x8888888888888888) ^
+ (e1 >> 63) ^ (e2 >> 62) ^ (e3 >> 61);
+
+ *out_lo = e0 ^ (e1 << 1) ^ (e2 << 2) ^ (e3 << 3);
+#endif
+ *out_lo ^= (((u64)c0) & 0x1111111111111111) ^
+ (((u64)c1) & 0x2222222222222222) ^
+ (((u64)c2) & 0x4444444444444444) ^
+ (((u64)c3) & 0x8888888888888888);
}
#else /* CONFIG_ARCH_SUPPORTS_INT128 */
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/9] POLYVAL library
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
` (9 preceding siblings ...)
2025-11-10 15:51 ` [PATCH 0/9] POLYVAL library Ard Biesheuvel
@ 2025-11-11 19:28 ` Eric Biggers
10 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-11-11 19:28 UTC (permalink / raw)
To: linux-crypto
Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86
On Sun, Nov 09, 2025 at 03:47:15PM -0800, Eric Biggers wrote:
> This series is targeting libcrypto-next. It can also be retrieved from:
>
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git polyval-lib-v1
>
> This series migrates the POLYVAL code to lib/crypto/. It turns out that
> just like Poly1305, the library is a much better fit for it.
>
> This series also replaces the generic implementation of POLYVAL with a
> much better one.
>
> Notably, this series improves the performance of HCTR2, since it
> eliminates unnecessary overhead that was being incurred by accessing
> POLYVAL via the crypto_shash API. I see a 45% increase in throughput
> with 64-byte messages, 53% with 128-byte, or 6% with 4096-byte.
>
> It also eliminates the need to explicitly enable the optimized POLYVAL
> code, as it's now enabled automatically when HCTR2 support is enabled.
>
> Eric Biggers (9):
> crypto: polyval - Rename conflicting functions
> lib/crypto: polyval: Add POLYVAL library
> lib/crypto: tests: Add KUnit tests for POLYVAL
> lib/crypto: arm64/polyval: Migrate optimized code into library
> lib/crypto: x86/polyval: Migrate optimized code into library
> crypto: hctr2 - Convert to use POLYVAL library
> crypto: polyval - Remove the polyval crypto_shash
> crypto: testmgr - Remove polyval tests
> fscrypt: Drop obsolete recommendation to enable optimized POLYVAL
>
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next
- Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library
2025-11-11 7:42 ` Ard Biesheuvel
@ 2025-11-11 19:46 ` Eric Biggers
2025-11-12 10:32 ` Ard Biesheuvel
0 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2025-11-11 19:46 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-crypto, linux-kernel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86
On Tue, Nov 11, 2025 at 08:42:29AM +0100, Ard Biesheuvel wrote:
> On Mon, 10 Nov 2025 at 16:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Hi,
> >
> > On Mon, 10 Nov 2025 at 00:49, Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > Add support for POLYVAL to lib/crypto/.
> > >
> > > This will replace the polyval crypto_shash algorithm and its use in the
> > > hctr2 template, simplifying the code and reducing overhead.
> > >
> > > Specifically, this commit introduces the POLYVAL library API and a
> > > generic implementation of it. Later commits will migrate the existing
> > > architecture-optimized implementations of POLYVAL into lib/crypto/ and
> > > add a KUnit test suite.
> > >
> > > I've also rewritten the generic implementation completely, using a more
> > > modern approach instead of the traditional table-based approach. It's
> > > now constant-time, requires no precomputation or dynamic memory
> > > allocations, decreases the per-key memory usage from 4096 bytes to 16
> > > bytes, and is faster than the old polyval-generic even on bulk data
> > > reusing the same key (at least on x86_64, where I measured 15% faster).
> > > We should do this for GHASH too, but for now just do it for POLYVAL.
> > >
> >
> > Very nice.
> >
> > GHASH might suffer on 32-bit, I suppose, but taking this approach at
> > least on 64-bit also for GHASH would be a huge improvement.
> >
> > I had a stab at replacing the int128 arithmetic with
> > __builtin_bitreverse64(), but it seems to make little difference (and
> > GCC does not support it [yet]). I've tried both arm64 and x86, and the
> > perf delta (using your kunit benchmark) is negligible in either case.
>
> Sigh. I intended to only apply the generic patch and the kunit test,
> but applied the whole series in the end, which explains perfectly why
> x86_64 and arm64 performance are identical, given that the generic
> code isn't even used.
>
> So trying this again, on a Cortex-A72 without Crypto Extensions, I do
> get a ~30% performance improvement doing the below. I haven't
> re-tested x86, but given that it does not appear to have a native
> scalar bit reverse instruction (or __builtin_bitreverse64() is broken
> for it), there is probably no point in finding out.
>
> Not saying we should do this for POLYVAL, but something to keep in
> mind for gf128mul.c perhaps.
>
>
> --- a/lib/crypto/polyval.c
> +++ b/lib/crypto/polyval.c
> @@ -42,11 +42,48 @@
> * 256-bit => 128-bit reduction algorithm.
> */
>
> -#ifdef CONFIG_ARCH_SUPPORTS_INT128
> +#if defined(CONFIG_ARCH_SUPPORTS_INT128) ||
> __has_builtin(__builtin_bitreverse64)
>
> /* Do a 64 x 64 => 128 bit carryless multiplication. */
> static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
> {
> + u64 a0 = a & 0x1111111111111111;
> + u64 a1 = a & 0x2222222222222222;
> + u64 a2 = a & 0x4444444444444444;
> + u64 a3 = a & 0x8888888888888888;
> +
> + u64 b0 = b & 0x1111111111111111;
> + u64 b1 = b & 0x2222222222222222;
> + u64 b2 = b & 0x4444444444444444;
> + u64 b3 = b & 0x8888888888888888;
> +
> +#if __has_builtin(__builtin_bitreverse64)
> +#define brev64 __builtin_bitreverse64
> + u64 c0 = (a0 * b0) ^ (a1 * b3) ^ (a2 * b2) ^ (a3 * b1);
> + u64 c1 = (a0 * b1) ^ (a1 * b0) ^ (a2 * b3) ^ (a3 * b2);
> + u64 c2 = (a0 * b2) ^ (a1 * b1) ^ (a2 * b0) ^ (a3 * b3);
> + u64 c3 = (a0 * b3) ^ (a1 * b2) ^ (a2 * b1) ^ (a3 * b0);
> +
> + a0 = brev64(a0);
> + a1 = brev64(a1);
> + a2 = brev64(a2);
> + a3 = brev64(a3);
> +
> + b0 = brev64(b0);
> + b1 = brev64(b1);
> + b2 = brev64(b2);
> + b3 = brev64(b3);
> +
> + u64 d0 = (a0 * b0) ^ (a1 * b3) ^ (a2 * b2) ^ (a3 * b1);
> + u64 d1 = (a0 * b1) ^ (a1 * b0) ^ (a2 * b3) ^ (a3 * b2);
> + u64 d2 = (a0 * b2) ^ (a1 * b1) ^ (a2 * b0) ^ (a3 * b3);
> + u64 d3 = (a0 * b3) ^ (a1 * b2) ^ (a2 * b1) ^ (a3 * b0);
> +
> + *out_hi = ((brev64(d0) >> 1) & 0x1111111111111111) ^
> + ((brev64(d1) >> 1) & 0x2222222222222222) ^
> + ((brev64(d2) >> 1) & 0x4444444444444444) ^
> + ((brev64(d3) >> 1) & 0x8888888888888888);
Yeah, that's an interesting idea! So if we bit-reflect the inputs, do
an n x n => n multiplication, and bit-reflect the output and right-shift
it by 1, we get the high half of the desired n x n => 2n multiplication.
(This relies on the fact that carries are being discarded.) Then we
don't need an instruction that does an n x n => 2n multiplication or
produces the high half of it.
The availability of hardware bit-reversal is limited, though. arm32,
arm64, and mips32r6 have it. But all of those also have a "multiply
high" instruction. So the 30% performance improvement you saw on arm64
seems surprising to me, as umulh should have been used. (I verified
that it's indeed used in the generated asm with both gcc and clang.)
The available bit-reversal abstractions aren't too great either, with
__builtin_bitreverse64() being clang-specific and <linux/bitrev.h>
having a table-based, i.e. non-constant-time, fallback. So presumably
we'd need to add our own which is guaranteed to use the actual
instructions and not some slow and/or table-based fallback.
I'll definitely look into this more later when bringing this improvement
to GHASH too. But for now I think we should go with the version I have
in my patch.
- Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library
2025-11-11 19:46 ` Eric Biggers
@ 2025-11-12 10:32 ` Ard Biesheuvel
0 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2025-11-12 10:32 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-crypto, linux-kernel, Jason A . Donenfeld, Herbert Xu,
linux-arm-kernel, x86
On Tue, 11 Nov 2025 at 20:47, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Nov 11, 2025 at 08:42:29AM +0100, Ard Biesheuvel wrote:
> > On Mon, 10 Nov 2025 at 16:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, 10 Nov 2025 at 00:49, Eric Biggers <ebiggers@kernel.org> wrote:
> > > >
> > > > Add support for POLYVAL to lib/crypto/.
> > > >
> > > > This will replace the polyval crypto_shash algorithm and its use in the
> > > > hctr2 template, simplifying the code and reducing overhead.
> > > >
> > > > Specifically, this commit introduces the POLYVAL library API and a
> > > > generic implementation of it. Later commits will migrate the existing
> > > > architecture-optimized implementations of POLYVAL into lib/crypto/ and
> > > > add a KUnit test suite.
> > > >
> > > > I've also rewritten the generic implementation completely, using a more
> > > > modern approach instead of the traditional table-based approach. It's
> > > > now constant-time, requires no precomputation or dynamic memory
> > > > allocations, decreases the per-key memory usage from 4096 bytes to 16
> > > > bytes, and is faster than the old polyval-generic even on bulk data
> > > > reusing the same key (at least on x86_64, where I measured 15% faster).
> > > > We should do this for GHASH too, but for now just do it for POLYVAL.
> > > >
> > >
> > > Very nice.
> > >
> > > GHASH might suffer on 32-bit, I suppose, but taking this approach at
> > > least on 64-bit also for GHASH would be a huge improvement.
> > >
> > > I had a stab at replacing the int128 arithmetic with
> > > __builtin_bitreverse64(), but it seems to make little difference (and
> > > GCC does not support it [yet]). I've tried both arm64 and x86, and the
> > > perf delta (using your kunit benchmark) is negligible in either case.
> >
> > Sigh. I intended to only apply the generic patch and the kunit test,
> > but applied the whole series in the end, which explains perfectly why
> > x86_64 and arm64 performance are identical, given that the generic
> > code isn't even used.
> >
> > So trying this again, on a Cortex-A72 without Crypto Extensions, I do
> > get a ~30% performance improvement doing the below. I haven't
> > re-tested x86, but given that it does not appear to have a native
> > scalar bit reverse instruction (or __builtin_bitreverse64() is broken
> > for it), there is probably no point in finding out.
> >
> > Not saying we should do this for POLYVAL, but something to keep in
> > mind for gf128mul.c perhaps.
> >
> >
> > --- a/lib/crypto/polyval.c
> > +++ b/lib/crypto/polyval.c
> > @@ -42,11 +42,48 @@
> > * 256-bit => 128-bit reduction algorithm.
> > */
> >
> > -#ifdef CONFIG_ARCH_SUPPORTS_INT128
> > +#if defined(CONFIG_ARCH_SUPPORTS_INT128) ||
> > __has_builtin(__builtin_bitreverse64)
> >
> > /* Do a 64 x 64 => 128 bit carryless multiplication. */
> > static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
> > {
> > + u64 a0 = a & 0x1111111111111111;
> > + u64 a1 = a & 0x2222222222222222;
> > + u64 a2 = a & 0x4444444444444444;
> > + u64 a3 = a & 0x8888888888888888;
> > +
> > + u64 b0 = b & 0x1111111111111111;
> > + u64 b1 = b & 0x2222222222222222;
> > + u64 b2 = b & 0x4444444444444444;
> > + u64 b3 = b & 0x8888888888888888;
> > +
> > +#if __has_builtin(__builtin_bitreverse64)
> > +#define brev64 __builtin_bitreverse64
> > + u64 c0 = (a0 * b0) ^ (a1 * b3) ^ (a2 * b2) ^ (a3 * b1);
> > + u64 c1 = (a0 * b1) ^ (a1 * b0) ^ (a2 * b3) ^ (a3 * b2);
> > + u64 c2 = (a0 * b2) ^ (a1 * b1) ^ (a2 * b0) ^ (a3 * b3);
> > + u64 c3 = (a0 * b3) ^ (a1 * b2) ^ (a2 * b1) ^ (a3 * b0);
> > +
> > + a0 = brev64(a0);
> > + a1 = brev64(a1);
> > + a2 = brev64(a2);
> > + a3 = brev64(a3);
> > +
> > + b0 = brev64(b0);
> > + b1 = brev64(b1);
> > + b2 = brev64(b2);
> > + b3 = brev64(b3);
> > +
> > + u64 d0 = (a0 * b0) ^ (a1 * b3) ^ (a2 * b2) ^ (a3 * b1);
> > + u64 d1 = (a0 * b1) ^ (a1 * b0) ^ (a2 * b3) ^ (a3 * b2);
> > + u64 d2 = (a0 * b2) ^ (a1 * b1) ^ (a2 * b0) ^ (a3 * b3);
> > + u64 d3 = (a0 * b3) ^ (a1 * b2) ^ (a2 * b1) ^ (a3 * b0);
> > +
> > + *out_hi = ((brev64(d0) >> 1) & 0x1111111111111111) ^
> > + ((brev64(d1) >> 1) & 0x2222222222222222) ^
> > + ((brev64(d2) >> 1) & 0x4444444444444444) ^
> > + ((brev64(d3) >> 1) & 0x8888888888888888);
>
> Yeah, that's an interesting idea! So if we bit-reflect the inputs, do
> an n x n => n multiplication, and bit-reflect the output and right-shift
> it by 1, we get the high half of the desired n x n => 2n multiplication.
> (This relies on the fact that carries are being discarded.) Then we
> don't need an instruction that does an n x n => 2n multiplication or
> produces the high half of it.
>
> The availability of hardware bit-reversal is limited, though. arm32,
> arm64, and mips32r6 have it. But all of those also have a "multiply
> high" instruction. So the 30% performance improvement you saw on arm64
> seems surprising to me, as umulh should have been used. (I verified
> that it's indeed used in the generated asm with both gcc and clang.)
>
Yeah - it might be just the compiler making a mess of things. GCC is
already considerably faster than Clang doing the u128 arithmetic (75
vs 67 MB/s on RPi4). But the bit reverse code manages 85 MB/s [which
is only 26% faster btw, so a bit less than when I tried this the other
day].
I re-tested Apple M2 (which doesn't need this code, but for
comparison), and there the GCC generated u128 code is as fast or
slightly faster than the Clang generated bitreverse code.
So I guess this is more a matter of fixing the u128 related codegen on Clang.
> The available bit-reversal abstractions aren't too great either, with
> __builtin_bitreverse64() being clang-specific and <linux/bitrev.h>
> having a table-based, i.e. non-constant-time, fallback. So presumably
> we'd need to add our own which is guaranteed to use the actual
> instructions and not some slow and/or table-based fallback.
>
__builtin_bitreverse64() does exist on x86 too, but generates a huge
pile of code, so the mere availability is not a good reason to use it
either.
> I'll definitely look into this more later when bringing this improvement
> to GHASH too. But for now I think we should go with the version I have
> in my patch.
>
For now, definitely. And I'll see if I can file a Clang bug somewhere.
But I don't think we'll be making use of bitreverse for GHASH either.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-11-12 10:32 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-09 23:47 [PATCH 0/9] POLYVAL library Eric Biggers
2025-11-09 23:47 ` [PATCH 1/9] crypto: polyval - Rename conflicting functions Eric Biggers
2025-11-09 23:47 ` [PATCH 2/9] lib/crypto: polyval: Add POLYVAL library Eric Biggers
2025-11-10 15:21 ` Ard Biesheuvel
2025-11-11 7:42 ` Ard Biesheuvel
2025-11-11 19:46 ` Eric Biggers
2025-11-12 10:32 ` Ard Biesheuvel
2025-11-09 23:47 ` [PATCH 3/9] lib/crypto: tests: Add KUnit tests for POLYVAL Eric Biggers
2025-11-09 23:47 ` [PATCH 4/9] lib/crypto: arm64/polyval: Migrate optimized code into library Eric Biggers
2025-11-09 23:47 ` [PATCH 5/9] lib/crypto: x86/polyval: " Eric Biggers
2025-11-09 23:47 ` [PATCH 6/9] crypto: hctr2 - Convert to use POLYVAL library Eric Biggers
2025-11-09 23:47 ` [PATCH 7/9] crypto: polyval - Remove the polyval crypto_shash Eric Biggers
2025-11-09 23:47 ` [PATCH 8/9] crypto: testmgr - Remove polyval tests Eric Biggers
2025-11-09 23:47 ` [PATCH 9/9] fscrypt: Drop obsolete recommendation to enable optimized POLYVAL Eric Biggers
2025-11-10 15:51 ` [PATCH 0/9] POLYVAL library Ard Biesheuvel
2025-11-11 19:28 ` Eric Biggers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).