public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 00/19] GHASH library
@ 2026-03-19  6:17 Eric Biggers
  2026-03-19  6:17 ` [PATCH 01/19] lib/crypto: gf128hash: Rename polyval module to gf128hash Eric Biggers
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

This series is targeting libcrypto-next.  It can also be retrieved from:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git ghash-lib-v1

This series migrates the standalone GHASH code to lib/crypto/, then
converts the "gcm" template and AES-GCM library code to use it.  (GHASH
is the universal hash function used by GCM mode.)  As was the case with
POLYVAL and Poly1305 as well, the library is a much better fit for it.

Since GHASH and POLYVAL are closely related and it often makes sense to
implement one in terms of the other, the existing "polyval" library
module is renamed to "gf128hash" and the GHASH support is added to it.

The generic implementation of GHASH is also replaced with a better one
utilizing the existing polyval_mul_generic().

Note that some GHASH implementations, often faster ones using more
recent CPU features, still exist in arch/*/crypto/ as internal
components of AES-GCM implementations.  Those are left as-is for now.
The goal with this GHASH library is just to provide parity with the
existing standalone GHASH support, which is used when a full
implementation of AES-GCM (or ${someothercipher}-GCM, if another block
cipher is being used) is unavailable.  Migrating the
architecture-optimized AES-GCM code to lib/crypto/ will be a next step.

Eric Biggers (19):
  lib/crypto: gf128hash: Rename polyval module to gf128hash
  lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL
    functions
  lib/crypto: gf128hash: Add GHASH support
  lib/crypto: tests: Add KUnit tests for GHASH
  crypto: arm/ghash - Make the "ghash" crypto_shash NEON-only
  crypto: arm/ghash - Move NEON GHASH assembly into its own file
  lib/crypto: arm/ghash: Migrate optimized code into library
  crypto: arm64/ghash - Move NEON GHASH assembly into its own file
  lib/crypto: arm64/ghash: Migrate optimized code into library
  crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized
  lib/crypto: powerpc/ghash: Migrate optimized code into library
  lib/crypto: riscv/ghash: Migrate optimized code into library
  lib/crypto: s390/ghash: Migrate optimized code into library
  lib/crypto: x86/ghash: Migrate optimized code into library
  crypto: gcm - Use GHASH library instead of crypto_ahash
  crypto: ghash - Remove ghash from crypto_shash API
  lib/crypto: gf128mul: Remove unused 4k_lle functions
  lib/crypto: gf128hash: Remove unused content from ghash.h
  lib/crypto: aesgcm: Use GHASH library API

 MAINTAINERS                                   |   4 +-
 arch/arm/crypto/Kconfig                       |  13 +-
 arch/arm/crypto/ghash-ce-core.S               | 171 +-------
 arch/arm/crypto/ghash-ce-glue.c               | 166 +------
 arch/arm64/crypto/Kconfig                     |   5 +-
 arch/arm64/crypto/ghash-ce-core.S             | 221 +---------
 arch/arm64/crypto/ghash-ce-glue.c             | 164 +------
 arch/powerpc/crypto/Kconfig                   |   5 +-
 arch/powerpc/crypto/Makefile                  |   8 +-
 arch/powerpc/crypto/aesp8-ppc.h               |   1 -
 arch/powerpc/crypto/ghash.c                   | 160 -------
 arch/powerpc/crypto/vmx.c                     |  10 +-
 arch/riscv/crypto/Kconfig                     |  11 -
 arch/riscv/crypto/Makefile                    |   3 -
 arch/riscv/crypto/ghash-riscv64-glue.c        | 146 -------
 arch/s390/configs/debug_defconfig             |   1 -
 arch/s390/configs/defconfig                   |   1 -
 arch/s390/crypto/Kconfig                      |  10 -
 arch/s390/crypto/Makefile                     |   1 -
 arch/s390/crypto/ghash_s390.c                 | 144 ------
 arch/x86/crypto/Kconfig                       |  10 -
 arch/x86/crypto/Makefile                      |   3 -
 arch/x86/crypto/aesni-intel_glue.c            |   1 +
 arch/x86/crypto/ghash-clmulni-intel_glue.c    | 163 -------
 crypto/Kconfig                                |  11 +-
 crypto/Makefile                               |   1 -
 crypto/gcm.c                                  | 413 ++++--------------
 crypto/ghash-generic.c                        | 162 -------
 crypto/hctr2.c                                |   2 +-
 crypto/tcrypt.c                               |   9 -
 crypto/testmgr.c                              |  16 +-
 crypto/testmgr.h                              | 109 -----
 drivers/crypto/starfive/jh7110-aes.c          |   2 +-
 include/crypto/gcm.h                          |   4 +-
 include/crypto/{polyval.h => gf128hash.h}     | 126 +++++-
 include/crypto/gf128mul.h                     |  17 +-
 include/crypto/ghash.h                        |  12 -
 lib/crypto/.kunitconfig                       |   1 +
 lib/crypto/Kconfig                            |  31 +-
 lib/crypto/Makefile                           |  47 +-
 lib/crypto/aesgcm.c                           |  55 +--
 lib/crypto/arm/gf128hash.h                    |  43 ++
 lib/crypto/arm/ghash-neon-core.S              | 209 +++++++++
 lib/crypto/arm64/gf128hash.h                  | 137 ++++++
 lib/crypto/arm64/ghash-neon-core.S            | 220 ++++++++++
 lib/crypto/arm64/polyval.h                    |  80 ----
 lib/crypto/{polyval.c => gf128hash.c}         | 183 ++++++--
 lib/crypto/gf128mul.c                         |  73 +---
 lib/crypto/powerpc/.gitignore                 |   1 +
 lib/crypto/powerpc/gf128hash.h                | 109 +++++
 .../crypto/powerpc}/ghashp8-ppc.pl            |   1 +
 lib/crypto/riscv/gf128hash.h                  |  57 +++
 .../crypto/riscv}/ghash-riscv64-zvkg.S        |  13 +-
 lib/crypto/s390/gf128hash.h                   |  54 +++
 lib/crypto/tests/Kconfig                      |  12 +-
 lib/crypto/tests/Makefile                     |   1 +
 lib/crypto/tests/ghash-testvecs.h             | 186 ++++++++
 lib/crypto/tests/ghash_kunit.c                | 194 ++++++++
 lib/crypto/tests/polyval_kunit.c              |   2 +-
 lib/crypto/x86/{polyval.h => gf128hash.h}     |  72 ++-
 .../crypto/x86/ghash-pclmul.S                 |  98 ++---
 scripts/crypto/gen-hash-testvecs.py           |  63 ++-
 62 files changed, 1903 insertions(+), 2345 deletions(-)
 delete mode 100644 arch/powerpc/crypto/ghash.c
 delete mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
 delete mode 100644 arch/s390/crypto/ghash_s390.c
 delete mode 100644 arch/x86/crypto/ghash-clmulni-intel_glue.c
 delete mode 100644 crypto/ghash-generic.c
 rename include/crypto/{polyval.h => gf128hash.h} (60%)
 create mode 100644 lib/crypto/arm/gf128hash.h
 create mode 100644 lib/crypto/arm/ghash-neon-core.S
 create mode 100644 lib/crypto/arm64/gf128hash.h
 create mode 100644 lib/crypto/arm64/ghash-neon-core.S
 delete mode 100644 lib/crypto/arm64/polyval.h
 rename lib/crypto/{polyval.c => gf128hash.c} (61%)
 create mode 100644 lib/crypto/powerpc/gf128hash.h
 rename {arch/powerpc/crypto => lib/crypto/powerpc}/ghashp8-ppc.pl (98%)
 create mode 100644 lib/crypto/riscv/gf128hash.h
 rename {arch/riscv/crypto => lib/crypto/riscv}/ghash-riscv64-zvkg.S (91%)
 create mode 100644 lib/crypto/s390/gf128hash.h
 create mode 100644 lib/crypto/tests/ghash-testvecs.h
 create mode 100644 lib/crypto/tests/ghash_kunit.c
 rename lib/crypto/x86/{polyval.h => gf128hash.h} (51%)
 rename arch/x86/crypto/ghash-clmulni-intel_asm.S => lib/crypto/x86/ghash-pclmul.S (54%)


base-commit: 520a39fb6916ac3a269ad4ea87a6cb9af9d5a910
-- 
2.53.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/19] lib/crypto: gf128hash: Rename polyval module to gf128hash
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 02/19] lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL functions Eric Biggers
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Currently, the standalone GHASH code is coupled with crypto_shash.  This
has resulted in unnecessary complexity and overhead, as well as the code
being unavailable to library code such as the AES-GCM library.  Like was
done with POLYVAL, it needs to find a new home in lib/crypto/.

GHASH and POLYVAL are closely related and can each be implemented in
terms of each other.  Optimized code for one can be reused with the
other.  But also since GHASH tends to be difficult to implement directly
due to its unnatural bit order, most modern GHASH implementations
(including the existing arm, arm64, powerpc, and x86 optimized GHASH
code, and the new generic GHASH code I'll be adding) actually
reinterpret the GHASH computation as an equivalent POLYVAL computation,
pre and post-processing the inputs and outputs to map to/from POLYVAL.

Given this close relationship, it makes sense to group the GHASH and
POLYVAL code together in the same module.  This gives us a wide range of
options for implementing them, reusing code between the two and properly
utilizing whatever instructions each architecture provides.

Thus, GHASH support will be added to the library module that is
currently called "polyval".  Rename it to an appropriate name:
"gf128hash".  Rename files, options, functions, etc. where appropriate
to reflect the upcoming sharing with GHASH.  (Note: polyval_kunit is not
renamed, as ghash_kunit will be added alongside it instead.)

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 crypto/Kconfig                              |  2 +-
 crypto/hctr2.c                              |  2 +-
 include/crypto/{polyval.h => gf128hash.h}   | 16 ++++++-------
 lib/crypto/Kconfig                          | 24 +++++++++----------
 lib/crypto/Makefile                         | 20 ++++++++--------
 lib/crypto/arm64/{polyval.h => gf128hash.h} |  4 ++--
 lib/crypto/{polyval.c => gf128hash.c}       | 26 ++++++++++-----------
 lib/crypto/tests/Kconfig                    |  4 ++--
 lib/crypto/tests/polyval_kunit.c            |  2 +-
 lib/crypto/x86/{polyval.h => gf128hash.h}   |  4 ++--
 10 files changed, 52 insertions(+), 52 deletions(-)
 rename include/crypto/{polyval.h => gf128hash.h} (94%)
 rename lib/crypto/arm64/{polyval.h => gf128hash.h} (95%)
 rename lib/crypto/{polyval.c => gf128hash.c} (94%)
 rename lib/crypto/x86/{polyval.h => gf128hash.h} (95%)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index b8608ef6823b..5627b3691561 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -684,11 +684,11 @@ config CRYPTO_ECB
 	  ECB (Electronic Codebook) mode (NIST SP800-38A)
 
 config CRYPTO_HCTR2
 	tristate "HCTR2"
 	select CRYPTO_XCTR
-	select CRYPTO_LIB_POLYVAL
+	select CRYPTO_LIB_GF128HASH
 	select CRYPTO_MANAGER
 	help
 	  HCTR2 length-preserving encryption mode
 
 	  A mode for storage encryption that is efficient on processors with
diff --git a/crypto/hctr2.c b/crypto/hctr2.c
index f4cd6c29b4d3..ad5edf9366ac 100644
--- a/crypto/hctr2.c
+++ b/crypto/hctr2.c
@@ -14,13 +14,13 @@
  *
  * For more details, see the paper: "Length-preserving encryption with HCTR2"
  * (https://eprint.iacr.org/2021/1441.pdf)
  */
 
+#include <crypto/gf128hash.h>
 #include <crypto/internal/cipher.h>
 #include <crypto/internal/skcipher.h>
-#include <crypto/polyval.h>
 #include <crypto/scatterwalk.h>
 #include <linux/module.h>
 
 #define BLOCKCIPHER_BLOCK_SIZE		16
 
diff --git a/include/crypto/polyval.h b/include/crypto/gf128hash.h
similarity index 94%
rename from include/crypto/polyval.h
rename to include/crypto/gf128hash.h
index b28b8ef11353..5ffa86f5c13f 100644
--- a/include/crypto/polyval.h
+++ b/include/crypto/gf128hash.h
@@ -1,14 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * POLYVAL library API
+ * GF(2^128) polynomial hashing: GHASH and POLYVAL
  *
  * Copyright 2025 Google LLC
  */
 
-#ifndef _CRYPTO_POLYVAL_H
-#define _CRYPTO_POLYVAL_H
+#ifndef _CRYPTO_GF128HASH_H
+#define _CRYPTO_GF128HASH_H
 
 #include <linux/string.h>
 #include <linux/types.h>
 
 #define POLYVAL_BLOCK_SIZE	16
@@ -42,24 +42,24 @@ struct polyval_elem {
  *
  * By H^i we mean H^(i-1) * H * x^-128, with base case H^1 = H.  I.e. the
  * exponentiation repeats the POLYVAL dot operation, with its "extra" x^-128.
  */
 struct polyval_key {
-#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 #ifdef CONFIG_ARM64
 	/** @h_powers: Powers of the hash key H^8 through H^1 */
 	struct polyval_elem h_powers[8];
 #elif defined(CONFIG_X86)
 	/** @h_powers: Powers of the hash key H^8 through H^1 */
 	struct polyval_elem h_powers[8];
 #else
 #error "Unhandled arch"
 #endif
-#else /* CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
+#else /* CONFIG_CRYPTO_LIB_GF128HASH_ARCH */
 	/** @h: The hash key H */
 	struct polyval_elem h;
-#endif /* !CONFIG_CRYPTO_LIB_POLYVAL_ARCH */
+#endif /* !CONFIG_CRYPTO_LIB_GF128HASH_ARCH */
 };
 
 /**
  * struct polyval_ctx - Context for computing a POLYVAL value
  * @key: Pointer to the prepared POLYVAL key.  The user of the API is
@@ -82,11 +82,11 @@ struct polyval_ctx {
  * copy, or it may involve precomputing powers of the key, depending on the
  * platform's POLYVAL implementation.
  *
  * Context: Any context.
  */
-#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 void polyval_preparekey(struct polyval_key *key,
 			const u8 raw_key[POLYVAL_BLOCK_SIZE]);
 
 #else
 static inline void polyval_preparekey(struct polyval_key *key,
@@ -185,6 +185,6 @@ static inline void polyval(const struct polyval_key *key,
 	polyval_init(&ctx, key);
 	polyval_update(&ctx, data, len);
 	polyval_final(&ctx, out);
 }
 
-#endif /* _CRYPTO_POLYVAL_H */
+#endif /* _CRYPTO_GF128HASH_H */
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 4910fe20e42a..98cedd95c2a5 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -108,10 +108,22 @@ config CRYPTO_LIB_CURVE25519_GENERIC
 	default y if !CRYPTO_LIB_CURVE25519_ARCH || ARM || X86_64
 
 config CRYPTO_LIB_DES
 	tristate
 
+config CRYPTO_LIB_GF128HASH
+	tristate
+	help
+	  The GHASH and POLYVAL library functions.  Select this if your module
+	  uses any of the functions from <crypto/gf128hash.h>.
+
+config CRYPTO_LIB_GF128HASH_ARCH
+	bool
+	depends on CRYPTO_LIB_GF128HASH && !UML
+	default y if ARM64
+	default y if X86_64
+
 config CRYPTO_LIB_MD5
 	tristate
 	help
 	  The MD5 and HMAC-MD5 library functions.  Select this if your module
 	  uses any of the functions from <crypto/md5.h>.
@@ -176,22 +188,10 @@ config CRYPTO_LIB_POLY1305_RSIZE
 	default 2 if MIPS || RISCV
 	default 11 if X86_64
 	default 9 if ARM || ARM64
 	default 1
 
-config CRYPTO_LIB_POLYVAL
-	tristate
-	help
-	  The POLYVAL library functions.  Select this if your module uses any of
-	  the functions from <crypto/polyval.h>.
-
-config CRYPTO_LIB_POLYVAL_ARCH
-	bool
-	depends on CRYPTO_LIB_POLYVAL && !UML
-	default y if ARM64
-	default y if X86_64
-
 config CRYPTO_LIB_CHACHA20POLY1305
 	tristate
 	select CRYPTO_LIB_CHACHA
 	select CRYPTO_LIB_POLY1305
 	select CRYPTO_LIB_UTILS
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index a961615c8c7f..fc30622123d2 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -152,10 +152,20 @@ endif
 obj-$(CONFIG_CRYPTO_LIB_DES)			+= libdes.o
 libdes-y					:= des.o
 
 ################################################################################
 
+obj-$(CONFIG_CRYPTO_LIB_GF128HASH) += libgf128hash.o
+libgf128hash-y := gf128hash.o
+ifeq ($(CONFIG_CRYPTO_LIB_GF128HASH_ARCH),y)
+CFLAGS_gf128hash.o += -I$(src)/$(SRCARCH)
+libgf128hash-$(CONFIG_ARM64) += arm64/polyval-ce-core.o
+libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
+endif
+
+################################################################################
+
 obj-$(CONFIG_CRYPTO_LIB_MD5) += libmd5.o
 libmd5-y := md5.o
 ifeq ($(CONFIG_CRYPTO_LIB_MD5_ARCH),y)
 CFLAGS_md5.o += -I$(src)/$(SRCARCH)
 libmd5-$(CONFIG_PPC) += powerpc/md5-asm.o
@@ -249,20 +259,10 @@ clean-files += arm/poly1305-core.S \
 	       riscv/poly1305-core.S \
 	       x86/poly1305-x86_64-cryptogams.S
 
 ################################################################################
 
-obj-$(CONFIG_CRYPTO_LIB_POLYVAL) += libpolyval.o
-libpolyval-y := polyval.o
-ifeq ($(CONFIG_CRYPTO_LIB_POLYVAL_ARCH),y)
-CFLAGS_polyval.o += -I$(src)/$(SRCARCH)
-libpolyval-$(CONFIG_ARM64) += arm64/polyval-ce-core.o
-libpolyval-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
-endif
-
-################################################################################
-
 obj-$(CONFIG_CRYPTO_LIB_SHA1) += libsha1.o
 libsha1-y := sha1.o
 ifeq ($(CONFIG_CRYPTO_LIB_SHA1_ARCH),y)
 CFLAGS_sha1.o += -I$(src)/$(SRCARCH)
 ifeq ($(CONFIG_ARM),y)
diff --git a/lib/crypto/arm64/polyval.h b/lib/crypto/arm64/gf128hash.h
similarity index 95%
rename from lib/crypto/arm64/polyval.h
rename to lib/crypto/arm64/gf128hash.h
index a39763395e9b..c1012007adcf 100644
--- a/lib/crypto/arm64/polyval.h
+++ b/lib/crypto/arm64/gf128hash.h
@@ -70,11 +70,11 @@ static void polyval_blocks_arch(struct polyval_elem *acc,
 		polyval_blocks_generic(acc, &key->h_powers[NUM_H_POWERS - 1],
 				       data, nblocks);
 	}
 }
 
-#define polyval_mod_init_arch polyval_mod_init_arch
-static void polyval_mod_init_arch(void)
+#define gf128hash_mod_init_arch gf128hash_mod_init_arch
+static void gf128hash_mod_init_arch(void)
 {
 	if (cpu_have_named_feature(PMULL))
 		static_branch_enable(&have_pmull);
 }
diff --git a/lib/crypto/polyval.c b/lib/crypto/gf128hash.c
similarity index 94%
rename from lib/crypto/polyval.c
rename to lib/crypto/gf128hash.c
index 5796275f574a..8bb848bf26b7 100644
--- a/lib/crypto/polyval.c
+++ b/lib/crypto/gf128hash.c
@@ -1,13 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /*
- * POLYVAL library functions
+ * GF(2^128) polynomial hashing: GHASH and POLYVAL
  *
  * Copyright 2025 Google LLC
  */
 
-#include <crypto/polyval.h>
+#include <crypto/gf128hash.h>
 #include <linux/export.h>
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/unaligned.h>
 
@@ -216,12 +216,12 @@ polyval_blocks_generic(struct polyval_elem *acc, const struct polyval_elem *key,
 		data += POLYVAL_BLOCK_SIZE;
 	} while (--nblocks);
 }
 
 /* Include the arch-optimized implementation of POLYVAL, if one is available. */
-#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
-#include "polyval.h" /* $(SRCARCH)/polyval.h */
+#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
+#include "gf128hash.h" /* $(SRCARCH)/gf128hash.h */
 void polyval_preparekey(struct polyval_key *key,
 			const u8 raw_key[POLYVAL_BLOCK_SIZE])
 {
 	polyval_preparekey_arch(key, raw_key);
 }
@@ -236,21 +236,21 @@ EXPORT_SYMBOL_GPL(polyval_preparekey);
  * code is needed to pass the appropriate key argument.
  */
 
 static void polyval_mul(struct polyval_ctx *ctx)
 {
-#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 	polyval_mul_arch(&ctx->acc, ctx->key);
 #else
 	polyval_mul_generic(&ctx->acc, &ctx->key->h);
 #endif
 }
 
 static void polyval_blocks(struct polyval_ctx *ctx,
 			   const u8 *data, size_t nblocks)
 {
-#ifdef CONFIG_CRYPTO_LIB_POLYVAL_ARCH
+#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 	polyval_blocks_arch(&ctx->acc, ctx->key, data, nblocks);
 #else
 	polyval_blocks_generic(&ctx->acc, &ctx->key->h, data, nblocks);
 #endif
 }
@@ -287,21 +287,21 @@ void polyval_final(struct polyval_ctx *ctx, u8 out[POLYVAL_BLOCK_SIZE])
 	memcpy(out, &ctx->acc, POLYVAL_BLOCK_SIZE);
 	memzero_explicit(ctx, sizeof(*ctx));
 }
 EXPORT_SYMBOL_GPL(polyval_final);
 
-#ifdef polyval_mod_init_arch
-static int __init polyval_mod_init(void)
+#ifdef gf128hash_mod_init_arch
+static int __init gf128hash_mod_init(void)
 {
-	polyval_mod_init_arch();
+	gf128hash_mod_init_arch();
 	return 0;
 }
-subsys_initcall(polyval_mod_init);
+subsys_initcall(gf128hash_mod_init);
 
-static void __exit polyval_mod_exit(void)
+static void __exit gf128hash_mod_exit(void)
 {
 }
-module_exit(polyval_mod_exit);
+module_exit(gf128hash_mod_exit);
 #endif
 
-MODULE_DESCRIPTION("POLYVAL almost-XOR-universal hash function");
+MODULE_DESCRIPTION("GF(2^128) polynomial hashing: GHASH and POLYVAL");
 MODULE_LICENSE("GPL");
diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig
index 42e1770e1883..aa627b6b9855 100644
--- a/lib/crypto/tests/Kconfig
+++ b/lib/crypto/tests/Kconfig
@@ -67,11 +67,11 @@ config CRYPTO_LIB_POLY1305_KUNIT_TEST
 	help
 	  KUnit tests for the Poly1305 library functions.
 
 config CRYPTO_LIB_POLYVAL_KUNIT_TEST
 	tristate "KUnit tests for POLYVAL" if !KUNIT_ALL_TESTS
-	depends on KUNIT && CRYPTO_LIB_POLYVAL
+	depends on KUNIT && CRYPTO_LIB_GF128HASH
 	default KUNIT_ALL_TESTS
 	select CRYPTO_LIB_BENCHMARK_VISIBLE
 	help
 	  KUnit tests for the POLYVAL library functions.
 
@@ -120,15 +120,15 @@ config CRYPTO_LIB_ENABLE_ALL_FOR_KUNIT
 	tristate "Enable all crypto library code for KUnit tests"
 	depends on KUNIT
 	select CRYPTO_LIB_AES_CBC_MACS
 	select CRYPTO_LIB_BLAKE2B
 	select CRYPTO_LIB_CURVE25519
+	select CRYPTO_LIB_GF128HASH
 	select CRYPTO_LIB_MD5
 	select CRYPTO_LIB_MLDSA
 	select CRYPTO_LIB_NH
 	select CRYPTO_LIB_POLY1305
-	select CRYPTO_LIB_POLYVAL
 	select CRYPTO_LIB_SHA1
 	select CRYPTO_LIB_SHA256
 	select CRYPTO_LIB_SHA512
 	select CRYPTO_LIB_SHA3
 	help
diff --git a/lib/crypto/tests/polyval_kunit.c b/lib/crypto/tests/polyval_kunit.c
index f47f41a39a41..d1f53a690ab8 100644
--- a/lib/crypto/tests/polyval_kunit.c
+++ b/lib/crypto/tests/polyval_kunit.c
@@ -1,10 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /*
  * Copyright 2025 Google LLC
  */
-#include <crypto/polyval.h>
+#include <crypto/gf128hash.h>
 #include "polyval-testvecs.h"
 
 /*
  * A fixed key used when presenting POLYVAL as an unkeyed hash function in order
  * to reuse hash-test-template.h.  At the beginning of the test suite, this is
diff --git a/lib/crypto/x86/polyval.h b/lib/crypto/x86/gf128hash.h
similarity index 95%
rename from lib/crypto/x86/polyval.h
rename to lib/crypto/x86/gf128hash.h
index ef8797521420..fe506cf6431b 100644
--- a/lib/crypto/x86/polyval.h
+++ b/lib/crypto/x86/gf128hash.h
@@ -72,12 +72,12 @@ static void polyval_blocks_arch(struct polyval_elem *acc,
 		polyval_blocks_generic(acc, &key->h_powers[NUM_H_POWERS - 1],
 				       data, nblocks);
 	}
 }
 
-#define polyval_mod_init_arch polyval_mod_init_arch
-static void polyval_mod_init_arch(void)
+#define gf128hash_mod_init_arch gf128hash_mod_init_arch
+static void gf128hash_mod_init_arch(void)
 {
 	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ) &&
 	    boot_cpu_has(X86_FEATURE_AVX))
 		static_branch_enable(&have_pclmul_avx);
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/19] lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL functions
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
  2026-03-19  6:17 ` [PATCH 01/19] lib/crypto: gf128hash: Rename polyval module to gf128hash Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 03/19] lib/crypto: gf128hash: Add GHASH support Eric Biggers
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Currently, some architectures (arm64 and x86) have optimized code for
both GHASH and POLYVAL.  Others (arm, powerpc, riscv, and s390) have
optimized code only for GHASH.  While POLYVAL support could be
implemented on these other architectures, until then we need to support
the case where arch-optimized functions are present only for GHASH.

Therefore, update the support for arch-optimized POLYVAL functions to
allow architectures to opt into supporting these functions individually.

The new meaning of CONFIG_CRYPTO_LIB_GF128HASH_ARCH is that some level
of GHASH and/or POLYVAL acceleration is provided.

Also provide an implementation of polyval_mul() based on
polyval_blocks_arch(), for when polyval_mul_arch() isn't implemented.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 include/crypto/gf128hash.h   | 22 +++-------------------
 lib/crypto/arm64/gf128hash.h |  3 +++
 lib/crypto/gf128hash.c       | 16 ++++++++++++----
 lib/crypto/x86/gf128hash.h   |  3 +++
 4 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/include/crypto/gf128hash.h b/include/crypto/gf128hash.h
index 5ffa86f5c13f..1052041e3499 100644
--- a/include/crypto/gf128hash.h
+++ b/include/crypto/gf128hash.h
@@ -42,24 +42,18 @@ struct polyval_elem {
  *
  * By H^i we mean H^(i-1) * H * x^-128, with base case H^1 = H.  I.e. the
  * exponentiation repeats the POLYVAL dot operation, with its "extra" x^-128.
  */
 struct polyval_key {
-#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
-#ifdef CONFIG_ARM64
-	/** @h_powers: Powers of the hash key H^8 through H^1 */
-	struct polyval_elem h_powers[8];
-#elif defined(CONFIG_X86)
+#if defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && \
+	(defined(CONFIG_ARM64) || defined(CONFIG_X86))
 	/** @h_powers: Powers of the hash key H^8 through H^1 */
 	struct polyval_elem h_powers[8];
 #else
-#error "Unhandled arch"
-#endif
-#else /* CONFIG_CRYPTO_LIB_GF128HASH_ARCH */
 	/** @h: The hash key H */
 	struct polyval_elem h;
-#endif /* !CONFIG_CRYPTO_LIB_GF128HASH_ARCH */
+#endif
 };
 
 /**
  * struct polyval_ctx - Context for computing a POLYVAL value
  * @key: Pointer to the prepared POLYVAL key.  The user of the API is
@@ -82,23 +76,13 @@ struct polyval_ctx {
  * copy, or it may involve precomputing powers of the key, depending on the
  * platform's POLYVAL implementation.
  *
  * Context: Any context.
  */
-#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 void polyval_preparekey(struct polyval_key *key,
 			const u8 raw_key[POLYVAL_BLOCK_SIZE]);
 
-#else
-static inline void polyval_preparekey(struct polyval_key *key,
-				      const u8 raw_key[POLYVAL_BLOCK_SIZE])
-{
-	/* Just a simple copy, so inline it. */
-	memcpy(key->h.bytes, raw_key, POLYVAL_BLOCK_SIZE);
-}
-#endif
-
 /**
  * polyval_init() - Initialize a POLYVAL context for a new message
  * @ctx: The context to initialize
  * @key: The key to use.  Note that a pointer to the key is saved in the
  *	 context, so the key must live at least as long as the context.
diff --git a/lib/crypto/arm64/gf128hash.h b/lib/crypto/arm64/gf128hash.h
index c1012007adcf..796c36804dda 100644
--- a/lib/crypto/arm64/gf128hash.h
+++ b/lib/crypto/arm64/gf128hash.h
@@ -15,10 +15,11 @@ asmlinkage void polyval_mul_pmull(struct polyval_elem *a,
 				  const struct polyval_elem *b);
 asmlinkage void polyval_blocks_pmull(struct polyval_elem *acc,
 				     const struct polyval_key *key,
 				     const u8 *data, size_t nblocks);
 
+#define polyval_preparekey_arch polyval_preparekey_arch
 static void polyval_preparekey_arch(struct polyval_key *key,
 				    const u8 raw_key[POLYVAL_BLOCK_SIZE])
 {
 	static_assert(ARRAY_SIZE(key->h_powers) == NUM_H_POWERS);
 	memcpy(&key->h_powers[NUM_H_POWERS - 1], raw_key, POLYVAL_BLOCK_SIZE);
@@ -38,10 +39,11 @@ static void polyval_preparekey_arch(struct polyval_key *key,
 					    &key->h_powers[NUM_H_POWERS - 1]);
 		}
 	}
 }
 
+#define polyval_mul_arch polyval_mul_arch
 static void polyval_mul_arch(struct polyval_elem *acc,
 			     const struct polyval_key *key)
 {
 	if (static_branch_likely(&have_pmull) && may_use_simd()) {
 		scoped_ksimd()
@@ -49,10 +51,11 @@ static void polyval_mul_arch(struct polyval_elem *acc,
 	} else {
 		polyval_mul_generic(acc, &key->h_powers[NUM_H_POWERS - 1]);
 	}
 }
 
+#define polyval_blocks_arch polyval_blocks_arch
 static void polyval_blocks_arch(struct polyval_elem *acc,
 				const struct polyval_key *key,
 				const u8 *data, size_t nblocks)
 {
 	if (static_branch_likely(&have_pmull) && may_use_simd()) {
diff --git a/lib/crypto/gf128hash.c b/lib/crypto/gf128hash.c
index 8bb848bf26b7..05f44a9193f7 100644
--- a/lib/crypto/gf128hash.c
+++ b/lib/crypto/gf128hash.c
@@ -215,20 +215,24 @@ polyval_blocks_generic(struct polyval_elem *acc, const struct polyval_elem *key,
 		polyval_mul_generic(acc, key);
 		data += POLYVAL_BLOCK_SIZE;
 	} while (--nblocks);
 }
 
-/* Include the arch-optimized implementation of POLYVAL, if one is available. */
 #ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 #include "gf128hash.h" /* $(SRCARCH)/gf128hash.h */
+#endif
+
 void polyval_preparekey(struct polyval_key *key,
 			const u8 raw_key[POLYVAL_BLOCK_SIZE])
 {
+#ifdef polyval_preparekey_arch
 	polyval_preparekey_arch(key, raw_key);
+#else
+	memcpy(key->h.bytes, raw_key, POLYVAL_BLOCK_SIZE);
+#endif
 }
 EXPORT_SYMBOL_GPL(polyval_preparekey);
-#endif /* Else, polyval_preparekey() is an inline function. */
 
 /*
  * polyval_mul_generic() and polyval_blocks_generic() take the key as a
  * polyval_elem rather than a polyval_key, so that arch-optimized
  * implementations with a different key format can use it as a fallback (if they
@@ -236,21 +240,25 @@ EXPORT_SYMBOL_GPL(polyval_preparekey);
  * code is needed to pass the appropriate key argument.
  */
 
 static void polyval_mul(struct polyval_ctx *ctx)
 {
-#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
+#ifdef polyval_mul_arch
 	polyval_mul_arch(&ctx->acc, ctx->key);
+#elif defined(polyval_blocks_arch)
+	static const u8 zeroes[POLYVAL_BLOCK_SIZE];
+
+	polyval_blocks_arch(&ctx->acc, ctx->key, zeroes, 1);
 #else
 	polyval_mul_generic(&ctx->acc, &ctx->key->h);
 #endif
 }
 
 static void polyval_blocks(struct polyval_ctx *ctx,
 			   const u8 *data, size_t nblocks)
 {
-#ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
+#ifdef polyval_blocks_arch
 	polyval_blocks_arch(&ctx->acc, ctx->key, data, nblocks);
 #else
 	polyval_blocks_generic(&ctx->acc, &ctx->key->h, data, nblocks);
 #endif
 }
diff --git a/lib/crypto/x86/gf128hash.h b/lib/crypto/x86/gf128hash.h
index fe506cf6431b..adf6147ea677 100644
--- a/lib/crypto/x86/gf128hash.h
+++ b/lib/crypto/x86/gf128hash.h
@@ -15,10 +15,11 @@ asmlinkage void polyval_mul_pclmul_avx(struct polyval_elem *a,
 				       const struct polyval_elem *b);
 asmlinkage void polyval_blocks_pclmul_avx(struct polyval_elem *acc,
 					  const struct polyval_key *key,
 					  const u8 *data, size_t nblocks);
 
+#define polyval_preparekey_arch polyval_preparekey_arch
 static void polyval_preparekey_arch(struct polyval_key *key,
 				    const u8 raw_key[POLYVAL_BLOCK_SIZE])
 {
 	static_assert(ARRAY_SIZE(key->h_powers) == NUM_H_POWERS);
 	memcpy(&key->h_powers[NUM_H_POWERS - 1], raw_key, POLYVAL_BLOCK_SIZE);
@@ -38,10 +39,11 @@ static void polyval_preparekey_arch(struct polyval_key *key,
 					    &key->h_powers[NUM_H_POWERS - 1]);
 		}
 	}
 }
 
+#define polyval_mul_arch polyval_mul_arch
 static void polyval_mul_arch(struct polyval_elem *acc,
 			     const struct polyval_key *key)
 {
 	if (static_branch_likely(&have_pclmul_avx) && irq_fpu_usable()) {
 		kernel_fpu_begin();
@@ -50,10 +52,11 @@ static void polyval_mul_arch(struct polyval_elem *acc,
 	} else {
 		polyval_mul_generic(acc, &key->h_powers[NUM_H_POWERS - 1]);
 	}
 }
 
+#define polyval_blocks_arch polyval_blocks_arch
 static void polyval_blocks_arch(struct polyval_elem *acc,
 				const struct polyval_key *key,
 				const u8 *data, size_t nblocks)
 {
 	if (static_branch_likely(&have_pclmul_avx) && irq_fpu_usable()) {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/19] lib/crypto: gf128hash: Add GHASH support
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
  2026-03-19  6:17 ` [PATCH 01/19] lib/crypto: gf128hash: Rename polyval module to gf128hash Eric Biggers
  2026-03-19  6:17 ` [PATCH 02/19] lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL functions Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 04/19] lib/crypto: tests: Add KUnit tests for GHASH Eric Biggers
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Add GHASH support to the gf128hash module.

This will replace the GHASH support in the crypto_shash API.  It will be
used by the "gcm" template and by the AES-GCM library (when an
arch-optimized implementation of the full AES-GCM is unavailable).

This consists of a simple API that mirrors the existing POLYVAL API, a
generic implementation of that API based on the existing efficient and
side-channel-resistant polyval_mul_generic(), and the framework for
architecture-optimized implementations of the GHASH functions.

The GHASH accumulator is stored in POLYVAL format rather than GHASH
format, since this is what most modern GHASH implementations actually
need.  The few implementations that expect the accumulator in GHASH
format will just convert the accumulator to/from GHASH format
temporarily.  (Supporting architecture-specific accumulator formats
would be possible, but doesn't seem worth the complexity.)

However, architecture-specific formats of struct ghash_key will be
supported, since a variety of formats will be needed there anyway.  The
default format is just the key in POLYVAL format.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 include/crypto/gf128hash.h |  95 ++++++++++++++++++++++++
 lib/crypto/gf128hash.c     | 145 +++++++++++++++++++++++++++++++++----
 2 files changed, 227 insertions(+), 13 deletions(-)

diff --git a/include/crypto/gf128hash.h b/include/crypto/gf128hash.h
index 1052041e3499..5090fbaa87f8 100644
--- a/include/crypto/gf128hash.h
+++ b/include/crypto/gf128hash.h
@@ -9,10 +9,12 @@
 #define _CRYPTO_GF128HASH_H
 
 #include <linux/string.h>
 #include <linux/types.h>
 
+#define GHASH_BLOCK_SIZE	16
+#define GHASH_DIGEST_SIZE	16
 #define POLYVAL_BLOCK_SIZE	16
 #define POLYVAL_DIGEST_SIZE	16
 
 /**
  * struct polyval_elem - An element of the POLYVAL finite field
@@ -31,10 +33,20 @@ struct polyval_elem {
 			__le64 hi;
 		};
 	};
 };
 
+/**
+ * struct ghash_key - Prepared key for GHASH
+ *
+ * Use ghash_preparekey() to initialize this.
+ */
+struct ghash_key {
+	/** @h: The hash key H, in POLYVAL format */
+	struct polyval_elem h;
+};
+
 /**
  * struct polyval_key - Prepared key for POLYVAL
  *
  * This may contain just the raw key H, or it may contain precomputed key
  * powers, depending on the platform's POLYVAL implementation.  Use
@@ -52,10 +64,24 @@ struct polyval_key {
 	/** @h: The hash key H */
 	struct polyval_elem h;
 #endif
 };
 
+/**
+ * struct ghash_ctx - Context for computing a GHASH value
+ * @key: Pointer to the prepared GHASH key.  The user of the API is
+ *	 responsible for ensuring that the key lives as long as the context.
+ * @acc: The accumulator.  It is stored in POLYVAL format rather than GHASH
+ *	 format, since most implementations want it in POLYVAL format.
+ * @partial: Number of data bytes processed so far modulo GHASH_BLOCK_SIZE
+ */
+struct ghash_ctx {
+	const struct ghash_key *key;
+	struct polyval_elem acc;
+	size_t partial;
+};
+
 /**
  * struct polyval_ctx - Context for computing a POLYVAL value
  * @key: Pointer to the prepared POLYVAL key.  The user of the API is
  *	 responsible for ensuring that the key lives as long as the context.
  * @acc: The accumulator
@@ -65,10 +91,22 @@ struct polyval_ctx {
 	const struct polyval_key *key;
 	struct polyval_elem acc;
 	size_t partial;
 };
 
+/**
+ * ghash_preparekey() - Prepare a GHASH key
+ * @key: (output) The key structure to initialize
+ * @raw_key: The raw hash key
+ *
+ * Initialize a GHASH key structure from a raw key.
+ *
+ * Context: Any context.
+ */
+void ghash_preparekey(struct ghash_key *key,
+		      const u8 raw_key[GHASH_BLOCK_SIZE]);
+
 /**
  * polyval_preparekey() - Prepare a POLYVAL key
  * @key: (output) The key structure to initialize
  * @raw_key: The raw hash key
  *
@@ -79,10 +117,22 @@ struct polyval_ctx {
  * Context: Any context.
  */
 void polyval_preparekey(struct polyval_key *key,
 			const u8 raw_key[POLYVAL_BLOCK_SIZE]);
 
+/**
+ * ghash_init() - Initialize a GHASH context for a new message
+ * @ctx: The context to initialize
+ * @key: The key to use.  Note that a pointer to the key is saved in the
+ *	 context, so the key must live at least as long as the context.
+ */
+static inline void ghash_init(struct ghash_ctx *ctx,
+			      const struct ghash_key *key)
+{
+	*ctx = (struct ghash_ctx){ .key = key };
+}
+
 /**
  * polyval_init() - Initialize a POLYVAL context for a new message
  * @ctx: The context to initialize
  * @key: The key to use.  Note that a pointer to the key is saved in the
  *	 context, so the key must live at least as long as the context.
@@ -123,10 +173,22 @@ static inline void polyval_export_blkaligned(const struct polyval_ctx *ctx,
 					     struct polyval_elem *acc)
 {
 	*acc = ctx->acc;
 }
 
+/**
+ * ghash_update() - Update a GHASH context with message data
+ * @ctx: The context to update; must have been initialized
+ * @data: The message data
+ * @len: The data length in bytes.  Doesn't need to be block-aligned.
+ *
+ * This can be called any number of times.
+ *
+ * Context: Any context.
+ */
+void ghash_update(struct ghash_ctx *ctx, const u8 *data, size_t len);
+
 /**
  * polyval_update() - Update a POLYVAL context with message data
  * @ctx: The context to update; must have been initialized
  * @data: The message data
  * @len: The data length in bytes.  Doesn't need to be block-aligned.
@@ -135,10 +197,24 @@ static inline void polyval_export_blkaligned(const struct polyval_ctx *ctx,
  *
  * Context: Any context.
  */
 void polyval_update(struct polyval_ctx *ctx, const u8 *data, size_t len);
 
+/**
+ * ghash_final() - Finish computing a GHASH value
+ * @ctx: The context to finalize
+ * @out: The output value
+ *
+ * If the total data length isn't a multiple of GHASH_BLOCK_SIZE, then the
+ * final block is automatically zero-padded.
+ *
+ * After finishing, this zeroizes @ctx.  So the caller does not need to do it.
+ *
+ * Context: Any context.
+ */
+void ghash_final(struct ghash_ctx *ctx, u8 out[GHASH_BLOCK_SIZE]);
+
 /**
  * polyval_final() - Finish computing a POLYVAL value
  * @ctx: The context to finalize
  * @out: The output value
  *
@@ -149,10 +225,29 @@ void polyval_update(struct polyval_ctx *ctx, const u8 *data, size_t len);
  *
  * Context: Any context.
  */
 void polyval_final(struct polyval_ctx *ctx, u8 out[POLYVAL_BLOCK_SIZE]);
 
+/**
+ * ghash() - Compute a GHASH value
+ * @key: The prepared key
+ * @data: The message data
+ * @len: The data length in bytes.  Doesn't need to be block-aligned.
+ * @out: The output value
+ *
+ * Context: Any context.
+ */
+static inline void ghash(const struct ghash_key *key, const u8 *data,
+			 size_t len, u8 out[GHASH_BLOCK_SIZE])
+{
+	struct ghash_ctx ctx;
+
+	ghash_init(&ctx, key);
+	ghash_update(&ctx, data, len);
+	ghash_final(&ctx, out);
+}
+
 /**
  * polyval() - Compute a POLYVAL value
  * @key: The prepared key
  * @data: The message data
  * @len: The data length in bytes.  Doesn't need to be block-aligned.
diff --git a/lib/crypto/gf128hash.c b/lib/crypto/gf128hash.c
index 05f44a9193f7..2650603d8ba8 100644
--- a/lib/crypto/gf128hash.c
+++ b/lib/crypto/gf128hash.c
@@ -10,27 +10,34 @@
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/unaligned.h>
 
 /*
- * POLYVAL is an almost-XOR-universal hash function.  Similar to GHASH, POLYVAL
- * interprets the message as the coefficients of a polynomial in GF(2^128) and
- * evaluates that polynomial at a secret point.  POLYVAL has a simple
- * mathematical relationship with GHASH, but it uses a better field convention
- * which makes it easier and faster to implement.
+ * GHASH and POLYVAL are almost-XOR-universal hash functions.  They interpret
+ * the message as the coefficients of a polynomial in the finite field GF(2^128)
+ * and evaluate that polynomial at a secret point.
  *
- * POLYVAL is not a cryptographic hash function, and it should be used only by
- * algorithms that are specifically designed to use it.
+ * Neither GHASH nor POLYVAL is a cryptographic hash function.  They should be
+ * used only by algorithms that are specifically designed to use them.
  *
- * POLYVAL is specified by "AES-GCM-SIV: Nonce Misuse-Resistant Authenticated
- * Encryption" (https://datatracker.ietf.org/doc/html/rfc8452)
+ * GHASH is the older variant, defined as part of GCM in NIST SP 800-38D
+ * (https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-38d.pdf).
+ * GHASH is hard to implement directly, due to its backwards mapping between
+ * bits and polynomial coefficients.  GHASH implementations typically pre and
+ * post-process the inputs and outputs (mainly by byte-swapping) to convert the
+ * GHASH computation into an equivalent computation over a different,
+ * easier-to-use representation of GF(2^128).
  *
- * POLYVAL is also used by HCTR2.  See "Length-preserving encryption with HCTR2"
- * (https://eprint.iacr.org/2021/1441.pdf).
+ * POLYVAL is a newer GF(2^128) polynomial hash, originally defined as part of
+ * AES-GCM-SIV (https://datatracker.ietf.org/doc/html/rfc8452) and also used by
+ * HCTR2 (https://eprint.iacr.org/2021/1441.pdf).  It uses that easier-to-use
+ * field representation directly, eliminating the data conversion steps.
  *
- * This file provides a library API for POLYVAL.  This API can delegate to
- * either a generic implementation or an architecture-optimized implementation.
+ * This file provides library APIs for GHASH and POLYVAL.  These APIs can
+ * delegate to either a generic implementation or an architecture-optimized
+ * implementation.  Due to the mathematical relationship between GHASH and
+ * POLYVAL, in some cases code for one is reused with the other.
  *
  * For the generic implementation, we don't use the traditional table approach
  * to GF(2^128) multiplication.  That approach is not constant-time and requires
  * a lot of memory.  Instead, we use a different approach which emulates
  * carryless multiplication using standard multiplications by spreading the data
@@ -203,10 +210,23 @@ polyval_mul_generic(struct polyval_elem *a, const struct polyval_elem *b)
 	/* Return (c2, c3).  This implicitly multiplies by x^-128. */
 	a->lo = cpu_to_le64(c2);
 	a->hi = cpu_to_le64(c3);
 }
 
+static void __maybe_unused ghash_blocks_generic(struct polyval_elem *acc,
+						const struct polyval_elem *key,
+						const u8 *data, size_t nblocks)
+{
+	do {
+		acc->lo ^=
+			cpu_to_le64(get_unaligned_be64((__be64 *)(data + 8)));
+		acc->hi ^= cpu_to_le64(get_unaligned_be64((__be64 *)data));
+		polyval_mul_generic(acc, key);
+		data += GHASH_BLOCK_SIZE;
+	} while (--nblocks);
+}
+
 static void __maybe_unused
 polyval_blocks_generic(struct polyval_elem *acc, const struct polyval_elem *key,
 		       const u8 *data, size_t nblocks)
 {
 	do {
@@ -215,14 +235,112 @@ polyval_blocks_generic(struct polyval_elem *acc, const struct polyval_elem *key,
 		polyval_mul_generic(acc, key);
 		data += POLYVAL_BLOCK_SIZE;
 	} while (--nblocks);
 }
 
+/* Convert the key from GHASH format to POLYVAL format. */
+static void __maybe_unused ghash_key_to_polyval(const u8 in[GHASH_BLOCK_SIZE],
+						struct polyval_elem *out)
+{
+	u64 hi = get_unaligned_be64(&in[0]);
+	u64 lo = get_unaligned_be64(&in[8]);
+	u64 mask = (s64)hi >> 63;
+
+	hi = (hi << 1) ^ (lo >> 63) ^ (mask & ((u64)0xc2 << 56));
+	lo = (lo << 1) ^ (mask & 1);
+	out->lo = cpu_to_le64(lo);
+	out->hi = cpu_to_le64(hi);
+}
+
+/* Convert the accumulator from POLYVAL format to GHASH format. */
+static void polyval_acc_to_ghash(const struct polyval_elem *in,
+				 u8 out[GHASH_BLOCK_SIZE])
+{
+	put_unaligned_be64(le64_to_cpu(in->hi), &out[0]);
+	put_unaligned_be64(le64_to_cpu(in->lo), &out[8]);
+}
+
+/* Convert the accumulator from GHASH format to POLYVAL format. */
+static void __maybe_unused ghash_acc_to_polyval(const u8 in[GHASH_BLOCK_SIZE],
+						struct polyval_elem *out)
+{
+	out->lo = cpu_to_le64(get_unaligned_be64(&in[8]));
+	out->hi = cpu_to_le64(get_unaligned_be64(&in[0]));
+}
+
 #ifdef CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 #include "gf128hash.h" /* $(SRCARCH)/gf128hash.h */
 #endif
 
+void ghash_preparekey(struct ghash_key *key, const u8 raw_key[GHASH_BLOCK_SIZE])
+{
+#ifdef ghash_preparekey_arch
+	ghash_preparekey_arch(key, raw_key);
+#else
+	ghash_key_to_polyval(raw_key, &key->h);
+#endif
+}
+EXPORT_SYMBOL_GPL(ghash_preparekey);
+
+static void ghash_mul(struct ghash_ctx *ctx)
+{
+#ifdef ghash_mul_arch
+	ghash_mul_arch(&ctx->acc, ctx->key);
+#elif defined(ghash_blocks_arch)
+	static const u8 zeroes[GHASH_BLOCK_SIZE];
+
+	ghash_blocks_arch(&ctx->acc, ctx->key, zeroes, 1);
+#else
+	polyval_mul_generic(&ctx->acc, &ctx->key->h);
+#endif
+}
+
+/* nblocks is always >= 1. */
+static void ghash_blocks(struct ghash_ctx *ctx, const u8 *data, size_t nblocks)
+{
+#ifdef ghash_blocks_arch
+	ghash_blocks_arch(&ctx->acc, ctx->key, data, nblocks);
+#else
+	ghash_blocks_generic(&ctx->acc, &ctx->key->h, data, nblocks);
+#endif
+}
+
+void ghash_update(struct ghash_ctx *ctx, const u8 *data, size_t len)
+{
+	if (unlikely(ctx->partial)) {
+		size_t n = min(len, GHASH_BLOCK_SIZE - ctx->partial);
+
+		len -= n;
+		while (n--)
+			ctx->acc.bytes[GHASH_BLOCK_SIZE - 1 - ctx->partial++] ^=
+				*data++;
+		if (ctx->partial < GHASH_BLOCK_SIZE)
+			return;
+		ghash_mul(ctx);
+	}
+	if (len >= GHASH_BLOCK_SIZE) {
+		size_t nblocks = len / GHASH_BLOCK_SIZE;
+
+		ghash_blocks(ctx, data, nblocks);
+		data += len & ~(GHASH_BLOCK_SIZE - 1);
+		len &= GHASH_BLOCK_SIZE - 1;
+	}
+	for (size_t i = 0; i < len; i++)
+		ctx->acc.bytes[GHASH_BLOCK_SIZE - 1 - i] ^= data[i];
+	ctx->partial = len;
+}
+EXPORT_SYMBOL_GPL(ghash_update);
+
+void ghash_final(struct ghash_ctx *ctx, u8 out[GHASH_BLOCK_SIZE])
+{
+	if (unlikely(ctx->partial))
+		ghash_mul(ctx);
+	polyval_acc_to_ghash(&ctx->acc, out);
+	memzero_explicit(ctx, sizeof(*ctx));
+}
+EXPORT_SYMBOL_GPL(ghash_final);
+
 void polyval_preparekey(struct polyval_key *key,
 			const u8 raw_key[POLYVAL_BLOCK_SIZE])
 {
 #ifdef polyval_preparekey_arch
 	polyval_preparekey_arch(key, raw_key);
@@ -251,10 +369,11 @@ static void polyval_mul(struct polyval_ctx *ctx)
 #else
 	polyval_mul_generic(&ctx->acc, &ctx->key->h);
 #endif
 }
 
+/* nblocks is always >= 1. */
 static void polyval_blocks(struct polyval_ctx *ctx,
 			   const u8 *data, size_t nblocks)
 {
 #ifdef polyval_blocks_arch
 	polyval_blocks_arch(&ctx->acc, ctx->key, data, nblocks);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/19] lib/crypto: tests: Add KUnit tests for GHASH
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (2 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 03/19] lib/crypto: gf128hash: Add GHASH support Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 05/19] crypto: arm/ghash - Make the "ghash" crypto_shash NEON-only Eric Biggers
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Add a KUnit test suite for the GHASH library functions.

It closely mirrors the POLYVAL test suite.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/.kunitconfig             |   1 +
 lib/crypto/tests/Kconfig            |   8 ++
 lib/crypto/tests/Makefile           |   1 +
 lib/crypto/tests/ghash-testvecs.h   | 186 ++++++++++++++++++++++++++
 lib/crypto/tests/ghash_kunit.c      | 194 ++++++++++++++++++++++++++++
 scripts/crypto/gen-hash-testvecs.py |  63 ++++++++-
 6 files changed, 452 insertions(+), 1 deletion(-)
 create mode 100644 lib/crypto/tests/ghash-testvecs.h
 create mode 100644 lib/crypto/tests/ghash_kunit.c

diff --git a/lib/crypto/.kunitconfig b/lib/crypto/.kunitconfig
index 63a592731d1d..391836511c8b 100644
--- a/lib/crypto/.kunitconfig
+++ b/lib/crypto/.kunitconfig
@@ -4,10 +4,11 @@ CONFIG_CRYPTO_LIB_ENABLE_ALL_FOR_KUNIT=y
 
 CONFIG_CRYPTO_LIB_AES_CBC_MACS_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_BLAKE2B_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_BLAKE2S_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_CURVE25519_KUNIT_TEST=y
+CONFIG_CRYPTO_LIB_GHASH_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_MD5_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_MLDSA_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_NH_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_POLY1305_KUNIT_TEST=y
 CONFIG_CRYPTO_LIB_POLYVAL_KUNIT_TEST=y
diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig
index aa627b6b9855..279ff1a339be 100644
--- a/lib/crypto/tests/Kconfig
+++ b/lib/crypto/tests/Kconfig
@@ -33,10 +33,18 @@ config CRYPTO_LIB_CURVE25519_KUNIT_TEST
 	default KUNIT_ALL_TESTS
 	select CRYPTO_LIB_BENCHMARK_VISIBLE
 	help
 	  KUnit tests for the Curve25519 Diffie-Hellman function.
 
+config CRYPTO_LIB_GHASH_KUNIT_TEST
+	tristate "KUnit tests for GHASH" if !KUNIT_ALL_TESTS
+	depends on KUNIT && CRYPTO_LIB_GF128HASH
+	default KUNIT_ALL_TESTS
+	select CRYPTO_LIB_BENCHMARK_VISIBLE
+	help
+	  KUnit tests for GHASH library functions.
+
 config CRYPTO_LIB_MD5_KUNIT_TEST
 	tristate "KUnit tests for MD5" if !KUNIT_ALL_TESTS
 	depends on KUNIT && CRYPTO_LIB_MD5
 	default KUNIT_ALL_TESTS
 	select CRYPTO_LIB_BENCHMARK_VISIBLE
diff --git a/lib/crypto/tests/Makefile b/lib/crypto/tests/Makefile
index f864e0ffbee4..751ae507fdd0 100644
--- a/lib/crypto/tests/Makefile
+++ b/lib/crypto/tests/Makefile
@@ -2,10 +2,11 @@
 
 obj-$(CONFIG_CRYPTO_LIB_AES_CBC_MACS_KUNIT_TEST) += aes_cbc_macs_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_BLAKE2B_KUNIT_TEST) += blake2b_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_BLAKE2S_KUNIT_TEST) += blake2s_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_CURVE25519_KUNIT_TEST) += curve25519_kunit.o
+obj-$(CONFIG_CRYPTO_LIB_GHASH_KUNIT_TEST) += ghash_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_MD5_KUNIT_TEST) += md5_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_MLDSA_KUNIT_TEST) += mldsa_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_NH_KUNIT_TEST) += nh_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_POLY1305_KUNIT_TEST) += poly1305_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_POLYVAL_KUNIT_TEST) += polyval_kunit.o
diff --git a/lib/crypto/tests/ghash-testvecs.h b/lib/crypto/tests/ghash-testvecs.h
new file mode 100644
index 000000000000..759eb4072336
--- /dev/null
+++ b/lib/crypto/tests/ghash-testvecs.h
@@ -0,0 +1,186 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* This file was generated by: ./scripts/crypto/gen-hash-testvecs.py ghash */
+
+static const struct {
+	size_t data_len;
+	u8 digest[GHASH_DIGEST_SIZE];
+} hash_testvecs[] = {
+	{
+		.data_len = 0,
+		.digest = {
+			0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		},
+	},
+	{
+		.data_len = 1,
+		.digest = {
+			0x13, 0x91, 0xa1, 0x11, 0x08, 0xc3, 0x7e, 0xeb,
+			0x21, 0x42, 0x4a, 0xd6, 0x45, 0x0f, 0x41, 0xa7,
+		},
+	},
+	{
+		.data_len = 2,
+		.digest = {
+			0xde, 0x00, 0x63, 0x3f, 0x71, 0x0f, 0xc6, 0x29,
+			0x53, 0x2e, 0x49, 0xd9, 0xc2, 0xb7, 0x73, 0xce,
+		},
+	},
+	{
+		.data_len = 3,
+		.digest = {
+			0xcf, 0xc7, 0xa8, 0x20, 0x24, 0xe9, 0x7a, 0x6c,
+			0x2c, 0x2a, 0x34, 0x70, 0x26, 0xba, 0xd5, 0x9a,
+		},
+	},
+	{
+		.data_len = 16,
+		.digest = {
+			0xaa, 0xe0, 0xdc, 0x7f, 0xcf, 0x8b, 0xe6, 0x0c,
+			0x2e, 0x93, 0x89, 0x7d, 0x68, 0x4e, 0xc2, 0x63,
+		},
+	},
+	{
+		.data_len = 32,
+		.digest = {
+			0x4b, 0x8b, 0x93, 0x5c, 0x79, 0xad, 0x85, 0x08,
+			0xd3, 0x8a, 0xcd, 0xdd, 0x4c, 0x6e, 0x0e, 0x6f,
+		},
+	},
+	{
+		.data_len = 48,
+		.digest = {
+			0xfa, 0xa0, 0x25, 0xdd, 0x61, 0x9a, 0x52, 0x9a,
+			0xea, 0xee, 0xc6, 0x62, 0xb2, 0xba, 0x11, 0x49,
+		},
+	},
+	{
+		.data_len = 49,
+		.digest = {
+			0x23, 0xf1, 0x05, 0xeb, 0x30, 0x40, 0xb9, 0x1d,
+			0xe6, 0x35, 0x51, 0x4e, 0x0f, 0xc0, 0x1b, 0x9e,
+		},
+	},
+	{
+		.data_len = 63,
+		.digest = {
+			0x8d, 0xcf, 0xa0, 0xc8, 0x83, 0x21, 0x06, 0x81,
+			0xc6, 0x36, 0xd5, 0x62, 0xbf, 0xa0, 0xcd, 0x9c,
+		},
+	},
+	{
+		.data_len = 64,
+		.digest = {
+			0xe7, 0xca, 0xbe, 0xe7, 0x66, 0xc8, 0x85, 0xad,
+			0xbc, 0xaf, 0x58, 0x21, 0xd7, 0x67, 0x82, 0x15,
+		},
+	},
+	{
+		.data_len = 65,
+		.digest = {
+			0x9f, 0x48, 0x10, 0xd9, 0xa2, 0x6b, 0x9d, 0xe0,
+			0xb1, 0x87, 0xe1, 0x39, 0xc3, 0xd7, 0xee, 0x09,
+		},
+	},
+	{
+		.data_len = 127,
+		.digest = {
+			0xa4, 0x36, 0xb7, 0x82, 0xd2, 0x67, 0x7e, 0xaf,
+			0x5d, 0xfd, 0x67, 0x9c, 0x1d, 0x9f, 0xe4, 0xf7,
+		},
+	},
+	{
+		.data_len = 128,
+		.digest = {
+			0x57, 0xe7, 0x1d, 0x78, 0xf0, 0x8e, 0xc7, 0x0c,
+			0x15, 0xee, 0x18, 0xc4, 0xd1, 0x75, 0x90, 0xaa,
+		},
+	},
+	{
+		.data_len = 129,
+		.digest = {
+			0x9b, 0xad, 0x81, 0xa9, 0x22, 0xb2, 0x19, 0x53,
+			0x60, 0x30, 0xe7, 0xa0, 0x4f, 0xd6, 0x72, 0x42,
+		},
+	},
+	{
+		.data_len = 256,
+		.digest = {
+			0xf7, 0x33, 0x42, 0xbf, 0x58, 0xde, 0x88, 0x0f,
+			0x8d, 0x3d, 0xa6, 0x11, 0x14, 0xc3, 0xf1, 0xdc,
+		},
+	},
+	{
+		.data_len = 511,
+		.digest = {
+			0x59, 0xdc, 0xa9, 0xc0, 0x4e, 0xd6, 0x97, 0xb3,
+			0x60, 0xaf, 0xa8, 0xa0, 0xea, 0x54, 0x8e, 0xc3,
+		},
+	},
+	{
+		.data_len = 513,
+		.digest = {
+			0xa2, 0x23, 0x37, 0xcc, 0x97, 0xec, 0xea, 0xbe,
+			0xd6, 0xc7, 0x13, 0xf7, 0x93, 0x73, 0xc0, 0x64,
+		},
+	},
+	{
+		.data_len = 1000,
+		.digest = {
+			0x46, 0x8b, 0x43, 0x77, 0x9b, 0xc2, 0xfc, 0xa4,
+			0x68, 0x6a, 0x6c, 0x07, 0xa4, 0x6f, 0x47, 0x65,
+		},
+	},
+	{
+		.data_len = 3333,
+		.digest = {
+			0x69, 0x7f, 0x19, 0xc3, 0xb9, 0xa4, 0xff, 0x40,
+			0xe3, 0x03, 0x71, 0xa3, 0x88, 0x8a, 0xf1, 0xbd,
+		},
+	},
+	{
+		.data_len = 4096,
+		.digest = {
+			0x4d, 0x65, 0xe6, 0x9c, 0xeb, 0x6a, 0x46, 0x8d,
+			0xe9, 0x32, 0x96, 0x72, 0xb3, 0x0d, 0x08, 0xa9,
+		},
+	},
+	{
+		.data_len = 4128,
+		.digest = {
+			0xfc, 0xa1, 0x74, 0x46, 0x21, 0x64, 0xa7, 0x64,
+			0xbe, 0x47, 0x03, 0x1e, 0x05, 0xf7, 0xd8, 0x37,
+		},
+	},
+	{
+		.data_len = 4160,
+		.digest = {
+			0x70, 0x5b, 0xe9, 0x17, 0xab, 0xd5, 0xa2, 0xee,
+			0xcb, 0x39, 0xa4, 0x81, 0x2f, 0x41, 0x70, 0xae,
+		},
+	},
+	{
+		.data_len = 4224,
+		.digest = {
+			0x07, 0xbd, 0xb6, 0x52, 0xe2, 0x75, 0x2c, 0x33,
+			0x6d, 0x1b, 0x63, 0x56, 0x58, 0xda, 0x98, 0x55,
+		},
+	},
+	{
+		.data_len = 16384,
+		.digest = {
+			0x9c, 0xb5, 0xf4, 0x14, 0xe8, 0xa8, 0x4a, 0xde,
+			0xee, 0x7b, 0xbb, 0xd6, 0x21, 0x6d, 0x6a, 0x69,
+		},
+	},
+};
+
+static const u8 hash_testvec_consolidated[GHASH_DIGEST_SIZE] = {
+	0x08, 0xef, 0xf5, 0x27, 0xb1, 0xca, 0xd4, 0x1d,
+	0xad, 0x38, 0x69, 0x88, 0x6b, 0x16, 0xdf, 0xa8,
+};
+
+static const u8 ghash_allones_hashofhashes[GHASH_DIGEST_SIZE] = {
+	0xef, 0x85, 0x58, 0xf8, 0x54, 0x9c, 0x5e, 0x54,
+	0xd9, 0xbe, 0x04, 0x1f, 0xff, 0xff, 0xff, 0xff,
+};
diff --git a/lib/crypto/tests/ghash_kunit.c b/lib/crypto/tests/ghash_kunit.c
new file mode 100644
index 000000000000..68b3837a3607
--- /dev/null
+++ b/lib/crypto/tests/ghash_kunit.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2026 Google LLC
+ */
+#include <crypto/gf128hash.h>
+#include "ghash-testvecs.h"
+
+/*
+ * A fixed key used when presenting GHASH as an unkeyed hash function in order
+ * to reuse hash-test-template.h.  At the beginning of the test suite, this is
+ * initialized to a key prepared from bytes generated from a fixed seed.
+ */
+static struct ghash_key test_key;
+
+static void ghash_init_withtestkey(struct ghash_ctx *ctx)
+{
+	ghash_init(ctx, &test_key);
+}
+
+static void ghash_withtestkey(const u8 *data, size_t len,
+			      u8 out[GHASH_BLOCK_SIZE])
+{
+	ghash(&test_key, data, len, out);
+}
+
+/* Generate the HASH_KUNIT_CASES using hash-test-template.h. */
+#define HASH ghash_withtestkey
+#define HASH_CTX ghash_ctx
+#define HASH_SIZE GHASH_BLOCK_SIZE
+#define HASH_INIT ghash_init_withtestkey
+#define HASH_UPDATE ghash_update
+#define HASH_FINAL ghash_final
+#include "hash-test-template.h"
+
+/*
+ * Test a key and messages containing all one bits.  This is useful to detect
+ * overflow bugs in implementations that emulate carryless multiplication using
+ * a series of standard multiplications with the bits spread out.
+ */
+static void test_ghash_allones_key_and_message(struct kunit *test)
+{
+	struct ghash_key key;
+	struct ghash_ctx hashofhashes_ctx;
+	u8 hash[GHASH_BLOCK_SIZE];
+
+	static_assert(TEST_BUF_LEN >= 4096);
+	memset(test_buf, 0xff, 4096);
+
+	ghash_preparekey(&key, test_buf);
+	ghash_init(&hashofhashes_ctx, &key);
+	for (size_t len = 0; len <= 4096; len += 16) {
+		ghash(&key, test_buf, len, hash);
+		ghash_update(&hashofhashes_ctx, hash, sizeof(hash));
+	}
+	ghash_final(&hashofhashes_ctx, hash);
+	KUNIT_ASSERT_MEMEQ(test, hash, ghash_allones_hashofhashes,
+			   sizeof(hash));
+}
+
+#define MAX_LEN_FOR_KEY_CHECK 1024
+
+/*
+ * Given two prepared keys which should be identical (but may differ in
+ * alignment and/or whether they are followed by a guard page or not), verify
+ * that they produce consistent results on various data lengths.
+ */
+static void check_key_consistency(struct kunit *test,
+				  const struct ghash_key *key1,
+				  const struct ghash_key *key2)
+{
+	u8 *data = test_buf;
+	u8 hash1[GHASH_BLOCK_SIZE];
+	u8 hash2[GHASH_BLOCK_SIZE];
+
+	rand_bytes(data, MAX_LEN_FOR_KEY_CHECK);
+	KUNIT_ASSERT_MEMEQ(test, key1, key2, sizeof(*key1));
+
+	for (int i = 0; i < 100; i++) {
+		size_t len = rand_length(MAX_LEN_FOR_KEY_CHECK);
+
+		ghash(key1, data, len, hash1);
+		ghash(key2, data, len, hash2);
+		KUNIT_ASSERT_MEMEQ(test, hash1, hash2, sizeof(hash1));
+	}
+}
+
+/* Test that no buffer overreads occur on either raw_key or ghash_key. */
+static void test_ghash_with_guarded_key(struct kunit *test)
+{
+	u8 raw_key[GHASH_BLOCK_SIZE];
+	u8 *guarded_raw_key = &test_buf[TEST_BUF_LEN - sizeof(raw_key)];
+	struct ghash_key key1, key2;
+	struct ghash_key *guarded_key =
+		(struct ghash_key *)&test_buf[TEST_BUF_LEN - sizeof(key1)];
+
+	/* Prepare with regular buffers. */
+	rand_bytes(raw_key, sizeof(raw_key));
+	ghash_preparekey(&key1, raw_key);
+
+	/* Prepare with guarded raw_key, then check that it works. */
+	memcpy(guarded_raw_key, raw_key, sizeof(raw_key));
+	ghash_preparekey(&key2, guarded_raw_key);
+	check_key_consistency(test, &key1, &key2);
+
+	/* Prepare guarded ghash_key, then check that it works. */
+	ghash_preparekey(guarded_key, raw_key);
+	check_key_consistency(test, &key1, guarded_key);
+}
+
+/*
+ * Test that ghash_key only needs to be aligned to
+ * __alignof__(struct ghash_key), i.e. 8 bytes.  The assembly code may prefer
+ * 16-byte or higher alignment, but it mustn't require it.
+ */
+static void test_ghash_with_minimally_aligned_key(struct kunit *test)
+{
+	u8 raw_key[GHASH_BLOCK_SIZE];
+	struct ghash_key key;
+	struct ghash_key *minaligned_key =
+		(struct ghash_key *)&test_buf[MAX_LEN_FOR_KEY_CHECK +
+					      __alignof__(struct ghash_key)];
+
+	KUNIT_ASSERT_TRUE(test, IS_ALIGNED((uintptr_t)minaligned_key,
+					   __alignof__(struct ghash_key)));
+	KUNIT_ASSERT_TRUE(test, !IS_ALIGNED((uintptr_t)minaligned_key,
+					    2 * __alignof__(struct ghash_key)));
+
+	rand_bytes(raw_key, sizeof(raw_key));
+	ghash_preparekey(&key, raw_key);
+	ghash_preparekey(minaligned_key, raw_key);
+	check_key_consistency(test, &key, minaligned_key);
+}
+
+struct ghash_irq_test_state {
+	struct ghash_key expected_key;
+	u8 raw_key[GHASH_BLOCK_SIZE];
+};
+
+static bool ghash_irq_test_func(void *state_)
+{
+	struct ghash_irq_test_state *state = state_;
+	struct ghash_key key;
+
+	ghash_preparekey(&key, state->raw_key);
+	return memcmp(&key, &state->expected_key, sizeof(key)) == 0;
+}
+
+/*
+ * Test that ghash_preparekey() produces the same output regardless of whether
+ * FPU or vector registers are usable when it is called.
+ */
+static void test_ghash_preparekey_in_irqs(struct kunit *test)
+{
+	struct ghash_irq_test_state state;
+
+	rand_bytes(state.raw_key, sizeof(state.raw_key));
+	ghash_preparekey(&state.expected_key, state.raw_key);
+	kunit_run_irq_test(test, ghash_irq_test_func, 200000, &state);
+}
+
+static int ghash_suite_init(struct kunit_suite *suite)
+{
+	u8 raw_key[GHASH_BLOCK_SIZE];
+
+	rand_bytes_seeded_from_len(raw_key, sizeof(raw_key));
+	ghash_preparekey(&test_key, raw_key);
+	return hash_suite_init(suite);
+}
+
+static void ghash_suite_exit(struct kunit_suite *suite)
+{
+	hash_suite_exit(suite);
+}
+
+static struct kunit_case ghash_test_cases[] = {
+	HASH_KUNIT_CASES,
+	KUNIT_CASE(test_ghash_allones_key_and_message),
+	KUNIT_CASE(test_ghash_with_guarded_key),
+	KUNIT_CASE(test_ghash_with_minimally_aligned_key),
+	KUNIT_CASE(test_ghash_preparekey_in_irqs),
+	KUNIT_CASE(benchmark_hash),
+	{},
+};
+
+static struct kunit_suite ghash_test_suite = {
+	.name = "ghash",
+	.test_cases = ghash_test_cases,
+	.suite_init = ghash_suite_init,
+	.suite_exit = ghash_suite_exit,
+};
+kunit_test_suite(ghash_test_suite);
+
+MODULE_DESCRIPTION("KUnit tests and benchmark for GHASH");
+MODULE_LICENSE("GPL");
diff --git a/scripts/crypto/gen-hash-testvecs.py b/scripts/crypto/gen-hash-testvecs.py
index 34b7c48f3456..e69ce213fb33 100755
--- a/scripts/crypto/gen-hash-testvecs.py
+++ b/scripts/crypto/gen-hash-testvecs.py
@@ -66,10 +66,56 @@ class Poly1305:
     # nondestructive, i.e. not changing any field of self.
     def digest(self):
         m = (self.h + self.s) % 2**128
         return m.to_bytes(16, byteorder='little')
 
+GHASH_POLY = sum((1 << i) for i in [128, 7, 2, 1, 0])
+GHASH_BLOCK_SIZE = 16
+
+# A straightforward, unoptimized implementation of GHASH.
+class Ghash:
+
+    @staticmethod
+    def reflect_bits_in_bytes(v):
+        res = 0
+        for offs in range(0, 128, 8):
+            for bit in range(8):
+                if (v & (1 << (offs + bit))) != 0:
+                    res ^= 1 << (offs + 7 - bit)
+        return res
+
+    @staticmethod
+    def bytes_to_poly(data):
+        return Ghash.reflect_bits_in_bytes(int.from_bytes(data, byteorder='little'))
+
+    @staticmethod
+    def poly_to_bytes(poly):
+        return Ghash.reflect_bits_in_bytes(poly).to_bytes(16, byteorder='little')
+
+    def __init__(self, key):
+        assert len(key) == 16
+        self.h = Ghash.bytes_to_poly(key)
+        self.acc = 0
+
+    # Note: this supports partial blocks only at the end.
+    def update(self, data):
+        for i in range(0, len(data), 16):
+            # acc += block
+            self.acc ^= Ghash.bytes_to_poly(data[i:i+16])
+            # acc = (acc * h) mod GHASH_POLY
+            product = 0
+            for j in range(127, -1, -1):
+                if (self.h & (1 << j)) != 0:
+                    product ^= self.acc << j
+                if (product & (1 << (128 + j))) != 0:
+                    product ^= GHASH_POLY << j
+            self.acc = product
+        return self
+
+    def digest(self):
+        return Ghash.poly_to_bytes(self.acc)
+
 POLYVAL_POLY = sum((1 << i) for i in [128, 127, 126, 121, 0])
 POLYVAL_BLOCK_SIZE = 16
 
 # A straightforward, unoptimized implementation of POLYVAL.
 # Reference: https://datatracker.ietf.org/doc/html/rfc8452
@@ -101,10 +147,12 @@ def hash_init(alg):
     # The keyed hash functions are assigned a fixed random key here, to present
     # them as unkeyed hash functions.  This allows all the test cases for
     # unkeyed hash functions to work on them.
     if alg == 'aes-cmac':
         return AesCmac(rand_bytes(AES_256_KEY_SIZE))
+    if alg == 'ghash':
+        return Ghash(rand_bytes(GHASH_BLOCK_SIZE))
     if alg == 'poly1305':
         return Poly1305(rand_bytes(POLY1305_KEY_SIZE))
     if alg == 'polyval':
         return Polyval(rand_bytes(POLYVAL_BLOCK_SIZE))
     return hashlib.new(alg)
@@ -255,10 +303,19 @@ def gen_additional_poly1305_testvecs():
             data += ctx.digest()
     print_static_u8_array_definition(
             'poly1305_allones_macofmacs[POLY1305_DIGEST_SIZE]',
             Poly1305(key).update(data).digest())
 
+def gen_additional_ghash_testvecs():
+    key = b'\xff' * GHASH_BLOCK_SIZE
+    hashes = b''
+    for data_len in range(0, 4097, 16):
+        hashes += Ghash(key).update(b'\xff' * data_len).digest()
+    print_static_u8_array_definition(
+            'ghash_allones_hashofhashes[GHASH_DIGEST_SIZE]',
+            Ghash(key).update(hashes).digest())
+
 def gen_additional_polyval_testvecs():
     key = b'\xff' * POLYVAL_BLOCK_SIZE
     hashes = b''
     for data_len in range(0, 4097, 16):
         hashes += Polyval(key).update(b'\xff' * data_len).digest()
@@ -266,11 +323,12 @@ def gen_additional_polyval_testvecs():
             'polyval_allones_hashofhashes[POLYVAL_DIGEST_SIZE]',
             Polyval(key).update(hashes).digest())
 
 if len(sys.argv) != 2:
     sys.stderr.write('Usage: gen-hash-testvecs.py ALGORITHM\n')
-    sys.stderr.write('ALGORITHM may be any supported by Python hashlib; or poly1305, polyval, or sha3.\n')
+    sys.stderr.write('ALGORITHM may be any supported by Python hashlib;\n')
+    sys.stderr.write('  or aes-cmac, ghash, nh, poly1305, polyval, or sha3.\n')
     sys.stderr.write('Example: gen-hash-testvecs.py sha512\n')
     sys.exit(1)
 
 alg = sys.argv[1]
 print('/* SPDX-License-Identifier: GPL-2.0-or-later */')
@@ -278,10 +336,13 @@ print(f'/* This file was generated by: {sys.argv[0]} {" ".join(sys.argv[1:])} */
 if alg == 'aes-cmac':
     gen_unkeyed_testvecs(alg)
 elif alg.startswith('blake2'):
     gen_unkeyed_testvecs(alg)
     gen_additional_blake2_testvecs(alg)
+elif alg == 'ghash':
+    gen_unkeyed_testvecs(alg)
+    gen_additional_ghash_testvecs()
 elif alg == 'nh':
     gen_nh_testvecs()
 elif alg == 'poly1305':
     gen_unkeyed_testvecs(alg)
     gen_additional_poly1305_testvecs()
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/19] crypto: arm/ghash - Make the "ghash" crypto_shash NEON-only
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (3 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 04/19] lib/crypto: tests: Add KUnit tests for GHASH Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 06/19] crypto: arm/ghash - Move NEON GHASH assembly into its own file Eric Biggers
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

arch/arm/crypto/ghash-ce-glue.c originally provided only a "ghash"
crypto_shash algorithm using PMULL if available, else NEON.

Significantly later, it was updated to also provide a full AES-GCM
implementation using PMULL.

This made the PMULL support in the "ghash" crypto_shash largely
obsolete.  Indeed, the arm64 equivalent of this file unconditionally
uses only ASIMD in its "ghash" crypto_shash.

Given that inconsistency and the fact that the NEON-only code is more
easily separable into the GHASH library than the PMULL based code is,
let's align with arm64 and just support NEON-only for the pure GHASH.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm/crypto/ghash-ce-glue.c | 32 ++++++--------------------------
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/arm/crypto/ghash-ce-glue.c b/arch/arm/crypto/ghash-ce-glue.c
index 454adcc62cc6..d7d787de7dd3 100644
--- a/arch/arm/crypto/ghash-ce-glue.c
+++ b/arch/arm/crypto/ghash-ce-glue.c
@@ -34,11 +34,11 @@ MODULE_ALIAS_CRYPTO("rfc4106(gcm(aes))");
 
 #define RFC4106_NONCE_SIZE	4
 
 struct ghash_key {
 	be128	k;
-	u64	h[][2];
+	u64	h[1][2];
 };
 
 struct gcm_key {
 	u64	h[4][2];
 	u32	rk[AES_MAX_KEYLENGTH_U32];
@@ -49,16 +49,14 @@ struct gcm_key {
 struct arm_ghash_desc_ctx {
 	u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
 };
 
 asmlinkage void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
-				       u64 const h[][2], const char *head);
+				       u64 const h[4][2], const char *head);
 
 asmlinkage void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
-				      u64 const h[][2], const char *head);
-
-static __ro_after_init DEFINE_STATIC_KEY_FALSE(use_p64);
+				      u64 const h[1][2], const char *head);
 
 static int ghash_init(struct shash_desc *desc)
 {
 	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
 
@@ -68,14 +66,11 @@ static int ghash_init(struct shash_desc *desc)
 
 static void ghash_do_update(int blocks, u64 dg[], const char *src,
 			    struct ghash_key *key, const char *head)
 {
 	kernel_neon_begin();
-	if (static_branch_likely(&use_p64))
-		pmull_ghash_update_p64(blocks, dg, src, key->h, head);
-	else
-		pmull_ghash_update_p8(blocks, dg, src, key->h, head);
+	pmull_ghash_update_p8(blocks, dg, src, key->h, head);
 	kernel_neon_end();
 }
 
 static int ghash_update(struct shash_desc *desc, const u8 *src,
 			unsigned int len)
@@ -145,23 +140,10 @@ static int ghash_setkey(struct crypto_shash *tfm,
 		return -EINVAL;
 
 	/* needed for the fallback */
 	memcpy(&key->k, inkey, GHASH_BLOCK_SIZE);
 	ghash_reflect(key->h[0], &key->k);
-
-	if (static_branch_likely(&use_p64)) {
-		be128 h = key->k;
-
-		gf128mul_lle(&h, &key->k);
-		ghash_reflect(key->h[1], &h);
-
-		gf128mul_lle(&h, &key->k);
-		ghash_reflect(key->h[2], &h);
-
-		gf128mul_lle(&h, &key->k);
-		ghash_reflect(key->h[3], &h);
-	}
 	return 0;
 }
 
 static struct shash_alg ghash_alg = {
 	.digestsize		= GHASH_DIGEST_SIZE,
@@ -173,15 +155,15 @@ static struct shash_alg ghash_alg = {
 	.import			= ghash_import,
 	.descsize		= sizeof(struct arm_ghash_desc_ctx),
 	.statesize		= sizeof(struct ghash_desc_ctx),
 
 	.base.cra_name		= "ghash",
-	.base.cra_driver_name	= "ghash-ce",
+	.base.cra_driver_name	= "ghash-neon",
 	.base.cra_priority	= 300,
 	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
 	.base.cra_blocksize	= GHASH_BLOCK_SIZE,
-	.base.cra_ctxsize	= sizeof(struct ghash_key) + sizeof(u64[2]),
+	.base.cra_ctxsize	= sizeof(struct ghash_key),
 	.base.cra_module	= THIS_MODULE,
 };
 
 void pmull_gcm_encrypt(int blocks, u64 dg[], const char *src,
 		       struct gcm_key const *k, char *dst,
@@ -569,12 +551,10 @@ static int __init ghash_ce_mod_init(void)
 	if (elf_hwcap2 & HWCAP2_PMULL) {
 		err = crypto_register_aeads(gcm_aes_algs,
 					    ARRAY_SIZE(gcm_aes_algs));
 		if (err)
 			return err;
-		ghash_alg.base.cra_ctxsize += 3 * sizeof(u64[2]);
-		static_branch_enable(&use_p64);
 	}
 
 	err = crypto_register_shash(&ghash_alg);
 	if (err)
 		goto err_aead;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/19] crypto: arm/ghash - Move NEON GHASH assembly into its own file
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (4 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 05/19] crypto: arm/ghash - Make the "ghash" crypto_shash NEON-only Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 07/19] lib/crypto: arm/ghash: Migrate optimized code into library Eric Biggers
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

arch/arm/crypto/ghash-ce-core.S implements pmull_ghash_update_p8(),
which is used only by a crypto_shash implementation of GHASH.  It also
implements other functions, including pmull_ghash_update_p64() and
others, which are used only by a crypto_aead implementation of AES-GCM.

While some code is shared between pmull_ghash_update_p8() and
pmull_ghash_update_p64(), it's not very much.  Since
pmull_ghash_update_p8() will also need to be migrated into lib/crypto/
to achieve parity in the standalone GHASH support, let's move it into a
separate file ghash-neon-core.S.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm/crypto/Makefile          |   2 +-
 arch/arm/crypto/ghash-ce-core.S   | 171 ++----------------------
 arch/arm/crypto/ghash-neon-core.S | 207 ++++++++++++++++++++++++++++++
 3 files changed, 222 insertions(+), 158 deletions(-)
 create mode 100644 arch/arm/crypto/ghash-neon-core.S

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index e73099e120b3..cedce94d5ee5 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,6 @@ obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
 
 aes-arm-bs-y	:= aes-neonbs-core.o aes-neonbs-glue.o
 aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
-ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
+ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o ghash-neon-core.o
diff --git a/arch/arm/crypto/ghash-ce-core.S b/arch/arm/crypto/ghash-ce-core.S
index 858c0d66798b..a449525d61f8 100644
--- a/arch/arm/crypto/ghash-ce-core.S
+++ b/arch/arm/crypto/ghash-ce-core.S
@@ -1,8 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Accelerated GHASH implementation with NEON/ARMv8 vmull.p8/64 instructions.
+ * Accelerated AES-GCM implementation with ARMv8 Crypto Extensions.
  *
  * Copyright (C) 2015 - 2017 Linaro Ltd.
  * Copyright (C) 2023 Google LLC. <ardb@google.com>
  */
 
@@ -27,43 +27,14 @@
 	XL_H		.req	d5
 	XM_L		.req	d6
 	XM_H		.req	d7
 	XH_L		.req	d8
 
-	t0l		.req	d10
-	t0h		.req	d11
-	t1l		.req	d12
-	t1h		.req	d13
-	t2l		.req	d14
-	t2h		.req	d15
-	t3l		.req	d16
-	t3h		.req	d17
-	t4l		.req	d18
-	t4h		.req	d19
-
-	t0q		.req	q5
-	t1q		.req	q6
-	t2q		.req	q7
-	t3q		.req	q8
-	t4q		.req	q9
 	XH2		.req	q9
 
-	s1l		.req	d20
-	s1h		.req	d21
-	s2l		.req	d22
-	s2h		.req	d23
-	s3l		.req	d24
-	s3h		.req	d25
-	s4l		.req	d26
-	s4h		.req	d27
-
 	MASK		.req	d28
-	SHASH2_p8	.req	d28
 
-	k16		.req	d29
-	k32		.req	d30
-	k48		.req	d31
 	SHASH2_p64	.req	d31
 
 	HH		.req	q10
 	HH3		.req	q11
 	HH4		.req	q12
@@ -91,76 +62,10 @@
 	T3_L		.req	d16
 	T3_H		.req	d17
 
 	.text
 
-	.macro		__pmull_p64, rd, rn, rm, b1, b2, b3, b4
-	vmull.p64	\rd, \rn, \rm
-	.endm
-
-	/*
-	 * This implementation of 64x64 -> 128 bit polynomial multiplication
-	 * using vmull.p8 instructions (8x8 -> 16) is taken from the paper
-	 * "Fast Software Polynomial Multiplication on ARM Processors Using
-	 * the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
-	 * Ricardo Dahab (https://hal.inria.fr/hal-01506572)
-	 *
-	 * It has been slightly tweaked for in-order performance, and to allow
-	 * 'rq' to overlap with 'ad' or 'bd'.
-	 */
-	.macro		__pmull_p8, rq, ad, bd, b1=t4l, b2=t3l, b3=t4l, b4=t3l
-	vext.8		t0l, \ad, \ad, #1	@ A1
-	.ifc		\b1, t4l
-	vext.8		t4l, \bd, \bd, #1	@ B1
-	.endif
-	vmull.p8	t0q, t0l, \bd		@ F = A1*B
-	vext.8		t1l, \ad, \ad, #2	@ A2
-	vmull.p8	t4q, \ad, \b1		@ E = A*B1
-	.ifc		\b2, t3l
-	vext.8		t3l, \bd, \bd, #2	@ B2
-	.endif
-	vmull.p8	t1q, t1l, \bd		@ H = A2*B
-	vext.8		t2l, \ad, \ad, #3	@ A3
-	vmull.p8	t3q, \ad, \b2		@ G = A*B2
-	veor		t0q, t0q, t4q		@ L = E + F
-	.ifc		\b3, t4l
-	vext.8		t4l, \bd, \bd, #3	@ B3
-	.endif
-	vmull.p8	t2q, t2l, \bd		@ J = A3*B
-	veor		t0l, t0l, t0h		@ t0 = (L) (P0 + P1) << 8
-	veor		t1q, t1q, t3q		@ M = G + H
-	.ifc		\b4, t3l
-	vext.8		t3l, \bd, \bd, #4	@ B4
-	.endif
-	vmull.p8	t4q, \ad, \b3		@ I = A*B3
-	veor		t1l, t1l, t1h		@ t1 = (M) (P2 + P3) << 16
-	vmull.p8	t3q, \ad, \b4		@ K = A*B4
-	vand		t0h, t0h, k48
-	vand		t1h, t1h, k32
-	veor		t2q, t2q, t4q		@ N = I + J
-	veor		t0l, t0l, t0h
-	veor		t1l, t1l, t1h
-	veor		t2l, t2l, t2h		@ t2 = (N) (P4 + P5) << 24
-	vand		t2h, t2h, k16
-	veor		t3l, t3l, t3h		@ t3 = (K) (P6 + P7) << 32
-	vmov.i64	t3h, #0
-	vext.8		t0q, t0q, t0q, #15
-	veor		t2l, t2l, t2h
-	vext.8		t1q, t1q, t1q, #14
-	vmull.p8	\rq, \ad, \bd		@ D = A*B
-	vext.8		t2q, t2q, t2q, #13
-	vext.8		t3q, t3q, t3q, #12
-	veor		t0q, t0q, t1q
-	veor		t2q, t2q, t3q
-	veor		\rq, \rq, t0q
-	veor		\rq, \rq, t2q
-	.endm
-
-	//
-	// PMULL (64x64->128) based reduction for CPUs that can do
-	// it in a single instruction.
-	//
 	.macro		__pmull_reduce_p64
 	vmull.p64	T1, XL_L, MASK
 
 	veor		XH_L, XH_L, XM_H
 	vext.8		T1, T1, T1, #8
@@ -168,34 +73,11 @@
 	veor		T1, T1, XL
 
 	vmull.p64	XL, T1_H, MASK
 	.endm
 
-	//
-	// Alternative reduction for CPUs that lack support for the
-	// 64x64->128 PMULL instruction
-	//
-	.macro		__pmull_reduce_p8
-	veor		XL_H, XL_H, XM_L
-	veor		XH_L, XH_L, XM_H
-
-	vshl.i64	T1, XL, #57
-	vshl.i64	T2, XL, #62
-	veor		T1, T1, T2
-	vshl.i64	T2, XL, #63
-	veor		T1, T1, T2
-	veor		XL_H, XL_H, T1_L
-	veor		XH_L, XH_L, T1_H
-
-	vshr.u64	T1, XL, #1
-	veor		XH, XH, XL
-	veor		XL, XL, T1
-	vshr.u64	T1, T1, #6
-	vshr.u64	XL, XL, #1
-	.endm
-
-	.macro		ghash_update, pn, enc, aggregate=1, head=1
+	.macro		ghash_update, enc, aggregate=1, head=1
 	vld1.64		{XL}, [r1]
 
 	.if		\head
 	/* do the head block first, if supplied */
 	ldr		ip, [sp]
@@ -204,12 +86,11 @@
 	vld1.64		{T1}, [ip]
 	teq		r0, #0
 	b		3f
 	.endif
 
-0:	.ifc		\pn, p64
-	.if		\aggregate
+0:	.if		\aggregate
 	tst		r0, #3			// skip until #blocks is a
 	bne		2f			// round multiple of 4
 
 	vld1.8		{XL2-XM2}, [r2]!
 1:	vld1.8		{T2-T3}, [r2]!
@@ -286,11 +167,10 @@
 	veor		T1, T1, XH
 	veor		XL, XL, T1
 
 	b		1b
 	.endif
-	.endif
 
 2:	vld1.8		{T1}, [r2]!
 
 	.ifnb		\enc
 	\enc\()_1x	T1
@@ -306,29 +186,29 @@
 
 	vext.8		IN1, T1, T1, #8
 	veor		T1_L, T1_L, XL_H
 	veor		XL, XL, IN1
 
-	__pmull_\pn	XH, XL_H, SHASH_H, s1h, s2h, s3h, s4h	@ a1 * b1
+	vmull.p64	XH, XL_H, SHASH_H		@ a1 * b1
 	veor		T1, T1, XL
-	__pmull_\pn	XL, XL_L, SHASH_L, s1l, s2l, s3l, s4l	@ a0 * b0
-	__pmull_\pn	XM, T1_L, SHASH2_\pn			@ (a1+a0)(b1+b0)
+	vmull.p64	XL, XL_L, SHASH_L		@ a0 * b0
+	vmull.p64	XM, T1_L, SHASH2_p64		@ (a1+a0)(b1+b0)
 
 4:	veor		T1, XL, XH
 	veor		XM, XM, T1
 
-	__pmull_reduce_\pn
+	__pmull_reduce_p64
 
 	veor		T1, T1, XH
 	veor		XL, XL, T1
 
 	bne		0b
 	.endm
 
 	/*
-	 * void pmull_ghash_update(int blocks, u64 dg[], const char *src,
-	 *			   struct ghash_key const *k, const char *head)
+	 * void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
+	 *			       u64 const h[4][2], const char *head)
 	 */
 ENTRY(pmull_ghash_update_p64)
 	vld1.64		{SHASH}, [r3]!
 	vld1.64		{HH}, [r3]!
 	vld1.64		{HH3-HH4}, [r3]
@@ -339,39 +219,16 @@ ENTRY(pmull_ghash_update_p64)
 	veor		HH34_H, HH4_L, HH4_H
 
 	vmov.i8		MASK, #0xe1
 	vshl.u64	MASK, MASK, #57
 
-	ghash_update	p64
+	ghash_update
 	vst1.64		{XL}, [r1]
 
 	bx		lr
 ENDPROC(pmull_ghash_update_p64)
 
-ENTRY(pmull_ghash_update_p8)
-	vld1.64		{SHASH}, [r3]
-	veor		SHASH2_p8, SHASH_L, SHASH_H
-
-	vext.8		s1l, SHASH_L, SHASH_L, #1
-	vext.8		s2l, SHASH_L, SHASH_L, #2
-	vext.8		s3l, SHASH_L, SHASH_L, #3
-	vext.8		s4l, SHASH_L, SHASH_L, #4
-	vext.8		s1h, SHASH_H, SHASH_H, #1
-	vext.8		s2h, SHASH_H, SHASH_H, #2
-	vext.8		s3h, SHASH_H, SHASH_H, #3
-	vext.8		s4h, SHASH_H, SHASH_H, #4
-
-	vmov.i64	k16, #0xffff
-	vmov.i64	k32, #0xffffffff
-	vmov.i64	k48, #0xffffffffffff
-
-	ghash_update	p8
-	vst1.64		{XL}, [r1]
-
-	bx		lr
-ENDPROC(pmull_ghash_update_p8)
-
 	e0		.req	q9
 	e1		.req	q10
 	e2		.req	q11
 	e3		.req	q12
 	e0l		.req	d18
@@ -534,11 +391,11 @@ ENTRY(pmull_gcm_encrypt)
 	ldrd		r4, r5, [sp, #24]
 	ldrd		r6, r7, [sp, #32]
 
 	vld1.64		{SHASH}, [r3]
 
-	ghash_update	p64, enc, head=0
+	ghash_update	enc, head=0
 	vst1.64		{XL}, [r1]
 
 	pop		{r4-r8, pc}
 ENDPROC(pmull_gcm_encrypt)
 
@@ -552,11 +409,11 @@ ENTRY(pmull_gcm_decrypt)
 	ldrd		r4, r5, [sp, #24]
 	ldrd		r6, r7, [sp, #32]
 
 	vld1.64		{SHASH}, [r3]
 
-	ghash_update	p64, dec, head=0
+	ghash_update	dec, head=0
 	vst1.64		{XL}, [r1]
 
 	pop		{r4-r8, pc}
 ENDPROC(pmull_gcm_decrypt)
 
@@ -601,11 +458,11 @@ ENTRY(pmull_gcm_enc_final)
 	vmov.i8		MASK, #0xe1
 	veor		SHASH2_p64, SHASH_L, SHASH_H
 	vshl.u64	MASK, MASK, #57
 	mov		r0, #1
 	bne		3f			// process head block first
-	ghash_update	p64, aggregate=0, head=0
+	ghash_update	aggregate=0, head=0
 
 	vrev64.8	XL, XL
 	vext.8		XL, XL, XL, #8
 	veor		XL, XL, e1
 
@@ -658,11 +515,11 @@ ENTRY(pmull_gcm_dec_final)
 	vmov.i8		MASK, #0xe1
 	veor		SHASH2_p64, SHASH_L, SHASH_H
 	vshl.u64	MASK, MASK, #57
 	mov		r0, #1
 	bne		3f			// process head block first
-	ghash_update	p64, aggregate=0, head=0
+	ghash_update	aggregate=0, head=0
 
 	vrev64.8	XL, XL
 	vext.8		XL, XL, XL, #8
 	veor		XL, XL, e1
 
diff --git a/arch/arm/crypto/ghash-neon-core.S b/arch/arm/crypto/ghash-neon-core.S
new file mode 100644
index 000000000000..bdf6fb6d063c
--- /dev/null
+++ b/arch/arm/crypto/ghash-neon-core.S
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Accelerated GHASH implementation with NEON vmull.p8 instructions.
+ *
+ * Copyright (C) 2015 - 2017 Linaro Ltd.
+ * Copyright (C) 2023 Google LLC. <ardb@google.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.fpu		neon
+
+	SHASH		.req	q0
+	T1		.req	q1
+	XL		.req	q2
+	XM		.req	q3
+	XH		.req	q4
+	IN1		.req	q4
+
+	SHASH_L		.req	d0
+	SHASH_H		.req	d1
+	T1_L		.req	d2
+	T1_H		.req	d3
+	XL_L		.req	d4
+	XL_H		.req	d5
+	XM_L		.req	d6
+	XM_H		.req	d7
+	XH_L		.req	d8
+
+	t0l		.req	d10
+	t0h		.req	d11
+	t1l		.req	d12
+	t1h		.req	d13
+	t2l		.req	d14
+	t2h		.req	d15
+	t3l		.req	d16
+	t3h		.req	d17
+	t4l		.req	d18
+	t4h		.req	d19
+
+	t0q		.req	q5
+	t1q		.req	q6
+	t2q		.req	q7
+	t3q		.req	q8
+	t4q		.req	q9
+
+	s1l		.req	d20
+	s1h		.req	d21
+	s2l		.req	d22
+	s2h		.req	d23
+	s3l		.req	d24
+	s3h		.req	d25
+	s4l		.req	d26
+	s4h		.req	d27
+
+	SHASH2_p8	.req	d28
+
+	k16		.req	d29
+	k32		.req	d30
+	k48		.req	d31
+
+	T2		.req	q7
+
+	.text
+
+	/*
+	 * This implementation of 64x64 -> 128 bit polynomial multiplication
+	 * using vmull.p8 instructions (8x8 -> 16) is taken from the paper
+	 * "Fast Software Polynomial Multiplication on ARM Processors Using
+	 * the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
+	 * Ricardo Dahab (https://hal.inria.fr/hal-01506572)
+	 *
+	 * It has been slightly tweaked for in-order performance, and to allow
+	 * 'rq' to overlap with 'ad' or 'bd'.
+	 */
+	.macro		__pmull_p8, rq, ad, bd, b1=t4l, b2=t3l, b3=t4l, b4=t3l
+	vext.8		t0l, \ad, \ad, #1	@ A1
+	.ifc		\b1, t4l
+	vext.8		t4l, \bd, \bd, #1	@ B1
+	.endif
+	vmull.p8	t0q, t0l, \bd		@ F = A1*B
+	vext.8		t1l, \ad, \ad, #2	@ A2
+	vmull.p8	t4q, \ad, \b1		@ E = A*B1
+	.ifc		\b2, t3l
+	vext.8		t3l, \bd, \bd, #2	@ B2
+	.endif
+	vmull.p8	t1q, t1l, \bd		@ H = A2*B
+	vext.8		t2l, \ad, \ad, #3	@ A3
+	vmull.p8	t3q, \ad, \b2		@ G = A*B2
+	veor		t0q, t0q, t4q		@ L = E + F
+	.ifc		\b3, t4l
+	vext.8		t4l, \bd, \bd, #3	@ B3
+	.endif
+	vmull.p8	t2q, t2l, \bd		@ J = A3*B
+	veor		t0l, t0l, t0h		@ t0 = (L) (P0 + P1) << 8
+	veor		t1q, t1q, t3q		@ M = G + H
+	.ifc		\b4, t3l
+	vext.8		t3l, \bd, \bd, #4	@ B4
+	.endif
+	vmull.p8	t4q, \ad, \b3		@ I = A*B3
+	veor		t1l, t1l, t1h		@ t1 = (M) (P2 + P3) << 16
+	vmull.p8	t3q, \ad, \b4		@ K = A*B4
+	vand		t0h, t0h, k48
+	vand		t1h, t1h, k32
+	veor		t2q, t2q, t4q		@ N = I + J
+	veor		t0l, t0l, t0h
+	veor		t1l, t1l, t1h
+	veor		t2l, t2l, t2h		@ t2 = (N) (P4 + P5) << 24
+	vand		t2h, t2h, k16
+	veor		t3l, t3l, t3h		@ t3 = (K) (P6 + P7) << 32
+	vmov.i64	t3h, #0
+	vext.8		t0q, t0q, t0q, #15
+	veor		t2l, t2l, t2h
+	vext.8		t1q, t1q, t1q, #14
+	vmull.p8	\rq, \ad, \bd		@ D = A*B
+	vext.8		t2q, t2q, t2q, #13
+	vext.8		t3q, t3q, t3q, #12
+	veor		t0q, t0q, t1q
+	veor		t2q, t2q, t3q
+	veor		\rq, \rq, t0q
+	veor		\rq, \rq, t2q
+	.endm
+
+	.macro		__pmull_reduce_p8
+	veor		XL_H, XL_H, XM_L
+	veor		XH_L, XH_L, XM_H
+
+	vshl.i64	T1, XL, #57
+	vshl.i64	T2, XL, #62
+	veor		T1, T1, T2
+	vshl.i64	T2, XL, #63
+	veor		T1, T1, T2
+	veor		XL_H, XL_H, T1_L
+	veor		XH_L, XH_L, T1_H
+
+	vshr.u64	T1, XL, #1
+	veor		XH, XH, XL
+	veor		XL, XL, T1
+	vshr.u64	T1, T1, #6
+	vshr.u64	XL, XL, #1
+	.endm
+
+	.macro		ghash_update
+	vld1.64		{XL}, [r1]
+
+	/* do the head block first, if supplied */
+	ldr		ip, [sp]
+	teq		ip, #0
+	beq		0f
+	vld1.64		{T1}, [ip]
+	teq		r0, #0
+	b		3f
+
+0:
+	vld1.8		{T1}, [r2]!
+	subs		r0, r0, #1
+
+3:	/* multiply XL by SHASH in GF(2^128) */
+	vrev64.8	T1, T1
+
+	vext.8		IN1, T1, T1, #8
+	veor		T1_L, T1_L, XL_H
+	veor		XL, XL, IN1
+
+	__pmull_p8	XH, XL_H, SHASH_H, s1h, s2h, s3h, s4h	@ a1 * b1
+	veor		T1, T1, XL
+	__pmull_p8	XL, XL_L, SHASH_L, s1l, s2l, s3l, s4l	@ a0 * b0
+	__pmull_p8	XM, T1_L, SHASH2_p8			@ (a1+a0)(b1+b0)
+
+	veor		T1, XL, XH
+	veor		XM, XM, T1
+
+	__pmull_reduce_p8
+
+	veor		T1, T1, XH
+	veor		XL, XL, T1
+
+	bne		0b
+	.endm
+
+	/*
+	 * void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
+	 *			      u64 const h[1][2], const char *head)
+	 */
+ENTRY(pmull_ghash_update_p8)
+	vld1.64		{SHASH}, [r3]
+	veor		SHASH2_p8, SHASH_L, SHASH_H
+
+	vext.8		s1l, SHASH_L, SHASH_L, #1
+	vext.8		s2l, SHASH_L, SHASH_L, #2
+	vext.8		s3l, SHASH_L, SHASH_L, #3
+	vext.8		s4l, SHASH_L, SHASH_L, #4
+	vext.8		s1h, SHASH_H, SHASH_H, #1
+	vext.8		s2h, SHASH_H, SHASH_H, #2
+	vext.8		s3h, SHASH_H, SHASH_H, #3
+	vext.8		s4h, SHASH_H, SHASH_H, #4
+
+	vmov.i64	k16, #0xffff
+	vmov.i64	k32, #0xffffffff
+	vmov.i64	k48, #0xffffffffffff
+
+	ghash_update
+	vst1.64		{XL}, [r1]
+
+	bx		lr
+ENDPROC(pmull_ghash_update_p8)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/19] lib/crypto: arm/ghash: Migrate optimized code into library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (5 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 06/19] crypto: arm/ghash - Move NEON GHASH assembly into its own file Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 08/19] crypto: arm64/ghash - Move NEON GHASH assembly into its own file Eric Biggers
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the "ghash-neon" crypto_shash algorithm.  Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.

This makes the GHASH library be optimized on arm (though only with NEON,
not PMULL; for now the goal is just parity with crypto_shash).  It
greatly reduces the amount of arm-specific glue code that is needed, and
it fixes the issue where this optimization was disabled by default.

To integrate the assembly code correctly with the library, make the
following tweaks:

- Change the type of 'blocks' from int to size_t.
- Change the types of 'dg' and 'k' to polyval_elem.  Note that this
  simply reflects the format that the code was already using, at least
  on little endian CPUs.  For big endian CPUs, add byte-swaps.
- Remove the 'head' argument, which is no longer needed.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm/crypto/Kconfig                       |  13 +-
 arch/arm/crypto/Makefile                      |   2 +-
 arch/arm/crypto/ghash-ce-glue.c               | 144 +-----------------
 lib/crypto/Kconfig                            |   1 +
 lib/crypto/Makefile                           |   1 +
 lib/crypto/arm/gf128hash.h                    |  43 ++++++
 .../crypto/arm}/ghash-neon-core.S             |  24 +--
 7 files changed, 66 insertions(+), 162 deletions(-)
 create mode 100644 lib/crypto/arm/gf128hash.h
 rename {arch/arm/crypto => lib/crypto/arm}/ghash-neon-core.S (92%)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index b9c28c818b7c..f884b8b2fd93 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -1,30 +1,21 @@
 # SPDX-License-Identifier: GPL-2.0
 
 menu "Accelerated Cryptographic Algorithms for CPU (arm)"
 
 config CRYPTO_GHASH_ARM_CE
-	tristate "Hash functions: GHASH (PMULL/NEON/ARMv8 Crypto Extensions)"
+	tristate "AEAD cipher: AES in GCM mode (ARMv8 Crypto Extensions)"
 	depends on KERNEL_MODE_NEON
 	select CRYPTO_AEAD
-	select CRYPTO_HASH
-	select CRYPTO_CRYPTD
 	select CRYPTO_LIB_AES
 	select CRYPTO_LIB_GF128MUL
 	help
-	  GCM GHASH function (NIST SP800-38D)
+	  AEAD cipher: AES-GCM
 
 	  Architecture: arm using
-	  - PMULL (Polynomial Multiply Long) instructions
-	  - NEON (Advanced SIMD) extensions
 	  - ARMv8 Crypto Extensions
 
-	  Use an implementation of GHASH (used by the GCM AEAD chaining mode)
-	  that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
-	  that is part of the ARMv8 Crypto Extensions, or a slower variant that
-	  uses the vmull.p8 instruction that is part of the basic NEON ISA.
-
 config CRYPTO_AES_ARM_BS
 	tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS (bit-sliced NEON)"
 	depends on KERNEL_MODE_NEON
 	select CRYPTO_SKCIPHER
 	select CRYPTO_LIB_AES
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index cedce94d5ee5..e73099e120b3 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,6 @@ obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
 
 aes-arm-bs-y	:= aes-neonbs-core.o aes-neonbs-glue.o
 aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
-ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o ghash-neon-core.o
+ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
diff --git a/arch/arm/crypto/ghash-ce-glue.c b/arch/arm/crypto/ghash-ce-glue.c
index d7d787de7dd3..9aa0ada5b627 100644
--- a/arch/arm/crypto/ghash-ce-glue.c
+++ b/arch/arm/crypto/ghash-ce-glue.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Accelerated GHASH implementation with ARMv8 vmull.p64 instructions.
+ * AES-GCM using ARMv8 Crypto Extensions
  *
  * Copyright (C) 2015 - 2018 Linaro Ltd.
  * Copyright (C) 2023 Google LLC.
  */
 
@@ -12,116 +12,38 @@
 #include <crypto/b128ops.h>
 #include <crypto/gcm.h>
 #include <crypto/gf128mul.h>
 #include <crypto/ghash.h>
 #include <crypto/internal/aead.h>
-#include <crypto/internal/hash.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/scatterwalk.h>
 #include <linux/cpufeature.h>
 #include <linux/errno.h>
 #include <linux/jump_label.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/unaligned.h>
 
-MODULE_DESCRIPTION("GHASH hash function using ARMv8 Crypto Extensions");
+MODULE_DESCRIPTION("AES-GCM using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ardb@kernel.org>");
 MODULE_LICENSE("GPL");
-MODULE_ALIAS_CRYPTO("ghash");
 MODULE_ALIAS_CRYPTO("gcm(aes)");
 MODULE_ALIAS_CRYPTO("rfc4106(gcm(aes))");
 
 #define RFC4106_NONCE_SIZE	4
 
-struct ghash_key {
-	be128	k;
-	u64	h[1][2];
-};
-
 struct gcm_key {
 	u64	h[4][2];
 	u32	rk[AES_MAX_KEYLENGTH_U32];
 	int	rounds;
 	u8	nonce[];	// for RFC4106 nonce
 };
 
-struct arm_ghash_desc_ctx {
-	u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
-};
-
 asmlinkage void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
 				       u64 const h[4][2], const char *head);
 
-asmlinkage void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
-				      u64 const h[1][2], const char *head);
-
-static int ghash_init(struct shash_desc *desc)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	*ctx = (struct arm_ghash_desc_ctx){};
-	return 0;
-}
-
-static void ghash_do_update(int blocks, u64 dg[], const char *src,
-			    struct ghash_key *key, const char *head)
-{
-	kernel_neon_begin();
-	pmull_ghash_update_p8(blocks, dg, src, key->h, head);
-	kernel_neon_end();
-}
-
-static int ghash_update(struct shash_desc *desc, const u8 *src,
-			unsigned int len)
-{
-	struct ghash_key *key = crypto_shash_ctx(desc->tfm);
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	int blocks;
-
-	blocks = len / GHASH_BLOCK_SIZE;
-	ghash_do_update(blocks, ctx->digest, src, key, NULL);
-	return len - blocks * GHASH_BLOCK_SIZE;
-}
-
-static int ghash_export(struct shash_desc *desc, void *out)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	u8 *dst = out;
-
-	put_unaligned_be64(ctx->digest[1], dst);
-	put_unaligned_be64(ctx->digest[0], dst + 8);
-	return 0;
-}
-
-static int ghash_import(struct shash_desc *desc, const void *in)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	const u8 *src = in;
-
-	ctx->digest[1] = get_unaligned_be64(src);
-	ctx->digest[0] = get_unaligned_be64(src + 8);
-	return 0;
-}
-
-static int ghash_finup(struct shash_desc *desc, const u8 *src,
-		       unsigned int len, u8 *dst)
-{
-	struct ghash_key *key = crypto_shash_ctx(desc->tfm);
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	if (len) {
-		u8 buf[GHASH_BLOCK_SIZE] = {};
-
-		memcpy(buf, src, len);
-		ghash_do_update(1, ctx->digest, buf, key, NULL);
-		memzero_explicit(buf, sizeof(buf));
-	}
-	return ghash_export(desc, dst);
-}
-
 static void ghash_reflect(u64 h[], const be128 *k)
 {
 	u64 carry = be64_to_cpu(k->a) >> 63;
 
 	h[0] = (be64_to_cpu(k->b) << 1) | carry;
@@ -129,44 +51,10 @@ static void ghash_reflect(u64 h[], const be128 *k)
 
 	if (carry)
 		h[1] ^= 0xc200000000000000UL;
 }
 
-static int ghash_setkey(struct crypto_shash *tfm,
-			const u8 *inkey, unsigned int keylen)
-{
-	struct ghash_key *key = crypto_shash_ctx(tfm);
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	/* needed for the fallback */
-	memcpy(&key->k, inkey, GHASH_BLOCK_SIZE);
-	ghash_reflect(key->h[0], &key->k);
-	return 0;
-}
-
-static struct shash_alg ghash_alg = {
-	.digestsize		= GHASH_DIGEST_SIZE,
-	.init			= ghash_init,
-	.update			= ghash_update,
-	.finup			= ghash_finup,
-	.setkey			= ghash_setkey,
-	.export			= ghash_export,
-	.import			= ghash_import,
-	.descsize		= sizeof(struct arm_ghash_desc_ctx),
-	.statesize		= sizeof(struct ghash_desc_ctx),
-
-	.base.cra_name		= "ghash",
-	.base.cra_driver_name	= "ghash-neon",
-	.base.cra_priority	= 300,
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= GHASH_BLOCK_SIZE,
-	.base.cra_ctxsize	= sizeof(struct ghash_key),
-	.base.cra_module	= THIS_MODULE,
-};
-
 void pmull_gcm_encrypt(int blocks, u64 dg[], const char *src,
 		       struct gcm_key const *k, char *dst,
 		       const char *iv, int rounds, u32 counter);
 
 void pmull_gcm_enc_final(int blocks, u64 dg[], char *tag,
@@ -541,40 +429,18 @@ static struct aead_alg gcm_aes_algs[] = {{
 	.base.cra_module	= THIS_MODULE,
 }};
 
 static int __init ghash_ce_mod_init(void)
 {
-	int err;
-
-	if (!(elf_hwcap & HWCAP_NEON))
+	if (!(elf_hwcap & HWCAP_NEON) || !(elf_hwcap2 & HWCAP2_PMULL))
 		return -ENODEV;
 
-	if (elf_hwcap2 & HWCAP2_PMULL) {
-		err = crypto_register_aeads(gcm_aes_algs,
-					    ARRAY_SIZE(gcm_aes_algs));
-		if (err)
-			return err;
-	}
-
-	err = crypto_register_shash(&ghash_alg);
-	if (err)
-		goto err_aead;
-
-	return 0;
-
-err_aead:
-	if (elf_hwcap2 & HWCAP2_PMULL)
-		crypto_unregister_aeads(gcm_aes_algs,
-					ARRAY_SIZE(gcm_aes_algs));
-	return err;
+	return crypto_register_aeads(gcm_aes_algs, ARRAY_SIZE(gcm_aes_algs));
 }
 
 static void __exit ghash_ce_mod_exit(void)
 {
-	crypto_unregister_shash(&ghash_alg);
-	if (elf_hwcap2 & HWCAP2_PMULL)
-		crypto_unregister_aeads(gcm_aes_algs,
-					ARRAY_SIZE(gcm_aes_algs));
+	crypto_unregister_aeads(gcm_aes_algs, ARRAY_SIZE(gcm_aes_algs));
 }
 
 module_init(ghash_ce_mod_init);
 module_exit(ghash_ce_mod_exit);
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 98cedd95c2a5..4f1a79883a56 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -117,10 +117,11 @@ config CRYPTO_LIB_GF128HASH
 	  uses any of the functions from <crypto/gf128hash.h>.
 
 config CRYPTO_LIB_GF128HASH_ARCH
 	bool
 	depends on CRYPTO_LIB_GF128HASH && !UML
+	default y if ARM && KERNEL_MODE_NEON
 	default y if ARM64
 	default y if X86_64
 
 config CRYPTO_LIB_MD5
 	tristate
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index fc30622123d2..8a06dd6a43ea 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -156,10 +156,11 @@ libdes-y					:= des.o
 
 obj-$(CONFIG_CRYPTO_LIB_GF128HASH) += libgf128hash.o
 libgf128hash-y := gf128hash.o
 ifeq ($(CONFIG_CRYPTO_LIB_GF128HASH_ARCH),y)
 CFLAGS_gf128hash.o += -I$(src)/$(SRCARCH)
+libgf128hash-$(CONFIG_ARM) += arm/ghash-neon-core.o
 libgf128hash-$(CONFIG_ARM64) += arm64/polyval-ce-core.o
 libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
 endif
 
 ################################################################################
diff --git a/lib/crypto/arm/gf128hash.h b/lib/crypto/arm/gf128hash.h
new file mode 100644
index 000000000000..cb929bed29d5
--- /dev/null
+++ b/lib/crypto/arm/gf128hash.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * GHASH, arm optimized
+ *
+ * Copyright 2026 Google LLC
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
+
+void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
+			   const u8 *src, const struct polyval_elem *k);
+
+#define ghash_blocks_arch ghash_blocks_arch
+static void ghash_blocks_arch(struct polyval_elem *acc,
+			      const struct ghash_key *key,
+			      const u8 *data, size_t nblocks)
+{
+	if (static_branch_likely(&have_neon) && may_use_simd()) {
+		do {
+			/* Allow rescheduling every 4 KiB. */
+			size_t n =
+				min_t(size_t, nblocks, 4096 / GHASH_BLOCK_SIZE);
+
+			scoped_ksimd()
+				pmull_ghash_update_p8(n, acc, data, &key->h);
+			data += n * GHASH_BLOCK_SIZE;
+			nblocks -= n;
+		} while (nblocks);
+	} else {
+		ghash_blocks_generic(acc, &key->h, data, nblocks);
+	}
+}
+
+#define gf128hash_mod_init_arch gf128hash_mod_init_arch
+static void gf128hash_mod_init_arch(void)
+{
+	if (elf_hwcap & HWCAP_NEON)
+		static_branch_enable(&have_neon);
+}
diff --git a/arch/arm/crypto/ghash-neon-core.S b/lib/crypto/arm/ghash-neon-core.S
similarity index 92%
rename from arch/arm/crypto/ghash-neon-core.S
rename to lib/crypto/arm/ghash-neon-core.S
index bdf6fb6d063c..bf423fb06a75 100644
--- a/arch/arm/crypto/ghash-neon-core.S
+++ b/lib/crypto/arm/ghash-neon-core.S
@@ -139,26 +139,25 @@
 	veor		XL, XL, T1
 	vshr.u64	T1, T1, #6
 	vshr.u64	XL, XL, #1
 	.endm
 
+	.macro		vrev64_if_be	a
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	vrev64.8	\a, \a
+#endif
+	.endm
+
 	.macro		ghash_update
 	vld1.64		{XL}, [r1]
-
-	/* do the head block first, if supplied */
-	ldr		ip, [sp]
-	teq		ip, #0
-	beq		0f
-	vld1.64		{T1}, [ip]
-	teq		r0, #0
-	b		3f
+	vrev64_if_be	XL
 
 0:
 	vld1.8		{T1}, [r2]!
 	subs		r0, r0, #1
 
-3:	/* multiply XL by SHASH in GF(2^128) */
+	/* multiply XL by SHASH in GF(2^128) */
 	vrev64.8	T1, T1
 
 	vext.8		IN1, T1, T1, #8
 	veor		T1_L, T1_L, XL_H
 	veor		XL, XL, IN1
@@ -178,15 +177,17 @@
 
 	bne		0b
 	.endm
 
 	/*
-	 * void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
-	 *			      u64 const h[1][2], const char *head)
+	 * void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
+	 *			      const u8 *src,
+	 *			      const struct polyval_elem *k)
 	 */
 ENTRY(pmull_ghash_update_p8)
 	vld1.64		{SHASH}, [r3]
+	vrev64_if_be	SHASH
 	veor		SHASH2_p8, SHASH_L, SHASH_H
 
 	vext.8		s1l, SHASH_L, SHASH_L, #1
 	vext.8		s2l, SHASH_L, SHASH_L, #2
 	vext.8		s3l, SHASH_L, SHASH_L, #3
@@ -199,9 +200,10 @@ ENTRY(pmull_ghash_update_p8)
 	vmov.i64	k16, #0xffff
 	vmov.i64	k32, #0xffffffff
 	vmov.i64	k48, #0xffffffffffff
 
 	ghash_update
+	vrev64_if_be	XL
 	vst1.64		{XL}, [r1]
 
 	bx		lr
 ENDPROC(pmull_ghash_update_p8)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/19] crypto: arm64/ghash - Move NEON GHASH assembly into its own file
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (6 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 07/19] lib/crypto: arm/ghash: Migrate optimized code into library Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 09/19] lib/crypto: arm64/ghash: Migrate optimized code into library Eric Biggers
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

arch/arm64/crypto/ghash-ce-core.S implements pmull_ghash_update_p8(),
which is used only by a crypto_shash implementation of GHASH.  It also
implements other functions, including pmull_ghash_update_p64() and
others, which are used only by a crypto_aead implementation of AES-GCM.

While some code is shared between pmull_ghash_update_p8() and
pmull_ghash_update_p64(), it's not very much.  Since
pmull_ghash_update_p8() will also need to be migrated into lib/crypto/
to achieve parity in the standalone GHASH support, let's move it into a
separate file ghash-neon-core.S.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm64/crypto/Makefile          |   2 +-
 arch/arm64/crypto/ghash-ce-core.S   | 207 ++-----------------------
 arch/arm64/crypto/ghash-neon-core.S | 226 ++++++++++++++++++++++++++++
 3 files changed, 239 insertions(+), 196 deletions(-)
 create mode 100644 arch/arm64/crypto/ghash-neon-core.S

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 8a8e3e551ed3..b7ba43ce8584 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -25,11 +25,11 @@ sm4-ce-gcm-y := sm4-ce-gcm-glue.o sm4-ce-gcm-core.o
 
 obj-$(CONFIG_CRYPTO_SM4_ARM64_NEON_BLK) += sm4-neon.o
 sm4-neon-y := sm4-neon-glue.o sm4-neon-core.o
 
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
-ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o ghash-neon-core.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
 aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE_BLK) += aes-ce-blk.o
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
index 23ee9a5eaf27..4344fe213d14 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -1,8 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ * Accelerated AES-GCM implementation with ARMv8 Crypto Extensions.
  *
  * Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
  */
 
 #include <linux/linkage.h>
@@ -17,35 +17,10 @@
 	XM		.req	v5
 	XL		.req	v6
 	XH		.req	v7
 	IN1		.req	v7
 
-	k00_16		.req	v8
-	k32_48		.req	v9
-
-	t3		.req	v10
-	t4		.req	v11
-	t5		.req	v12
-	t6		.req	v13
-	t7		.req	v14
-	t8		.req	v15
-	t9		.req	v16
-
-	perm1		.req	v17
-	perm2		.req	v18
-	perm3		.req	v19
-
-	sh1		.req	v20
-	sh2		.req	v21
-	sh3		.req	v22
-	sh4		.req	v23
-
-	ss1		.req	v24
-	ss2		.req	v25
-	ss3		.req	v26
-	ss4		.req	v27
-
 	XL2		.req	v8
 	XM2		.req	v9
 	XH2		.req	v10
 	XL3		.req	v11
 	XM3		.req	v12
@@ -58,94 +33,10 @@
 	HH34		.req	v19
 
 	.text
 	.arch		armv8-a+crypto
 
-	.macro		__pmull_p64, rd, rn, rm
-	pmull		\rd\().1q, \rn\().1d, \rm\().1d
-	.endm
-
-	.macro		__pmull2_p64, rd, rn, rm
-	pmull2		\rd\().1q, \rn\().2d, \rm\().2d
-	.endm
-
-	.macro		__pmull_p8, rq, ad, bd
-	ext		t3.8b, \ad\().8b, \ad\().8b, #1		// A1
-	ext		t5.8b, \ad\().8b, \ad\().8b, #2		// A2
-	ext		t7.8b, \ad\().8b, \ad\().8b, #3		// A3
-
-	__pmull_p8_\bd	\rq, \ad
-	.endm
-
-	.macro		__pmull2_p8, rq, ad, bd
-	tbl		t3.16b, {\ad\().16b}, perm1.16b		// A1
-	tbl		t5.16b, {\ad\().16b}, perm2.16b		// A2
-	tbl		t7.16b, {\ad\().16b}, perm3.16b		// A3
-
-	__pmull2_p8_\bd	\rq, \ad
-	.endm
-
-	.macro		__pmull_p8_SHASH, rq, ad
-	__pmull_p8_tail	\rq, \ad\().8b, SHASH.8b, 8b,, sh1, sh2, sh3, sh4
-	.endm
-
-	.macro		__pmull_p8_SHASH2, rq, ad
-	__pmull_p8_tail	\rq, \ad\().8b, SHASH2.8b, 8b,, ss1, ss2, ss3, ss4
-	.endm
-
-	.macro		__pmull2_p8_SHASH, rq, ad
-	__pmull_p8_tail	\rq, \ad\().16b, SHASH.16b, 16b, 2, sh1, sh2, sh3, sh4
-	.endm
-
-	.macro		__pmull_p8_tail, rq, ad, bd, nb, t, b1, b2, b3, b4
-	pmull\t		t3.8h, t3.\nb, \bd			// F = A1*B
-	pmull\t		t4.8h, \ad, \b1\().\nb			// E = A*B1
-	pmull\t		t5.8h, t5.\nb, \bd			// H = A2*B
-	pmull\t		t6.8h, \ad, \b2\().\nb			// G = A*B2
-	pmull\t		t7.8h, t7.\nb, \bd			// J = A3*B
-	pmull\t		t8.8h, \ad, \b3\().\nb			// I = A*B3
-	pmull\t		t9.8h, \ad, \b4\().\nb			// K = A*B4
-	pmull\t		\rq\().8h, \ad, \bd			// D = A*B
-
-	eor		t3.16b, t3.16b, t4.16b			// L = E + F
-	eor		t5.16b, t5.16b, t6.16b			// M = G + H
-	eor		t7.16b, t7.16b, t8.16b			// N = I + J
-
-	uzp1		t4.2d, t3.2d, t5.2d
-	uzp2		t3.2d, t3.2d, t5.2d
-	uzp1		t6.2d, t7.2d, t9.2d
-	uzp2		t7.2d, t7.2d, t9.2d
-
-	// t3 = (L) (P0 + P1) << 8
-	// t5 = (M) (P2 + P3) << 16
-	eor		t4.16b, t4.16b, t3.16b
-	and		t3.16b, t3.16b, k32_48.16b
-
-	// t7 = (N) (P4 + P5) << 24
-	// t9 = (K) (P6 + P7) << 32
-	eor		t6.16b, t6.16b, t7.16b
-	and		t7.16b, t7.16b, k00_16.16b
-
-	eor		t4.16b, t4.16b, t3.16b
-	eor		t6.16b, t6.16b, t7.16b
-
-	zip2		t5.2d, t4.2d, t3.2d
-	zip1		t3.2d, t4.2d, t3.2d
-	zip2		t9.2d, t6.2d, t7.2d
-	zip1		t7.2d, t6.2d, t7.2d
-
-	ext		t3.16b, t3.16b, t3.16b, #15
-	ext		t5.16b, t5.16b, t5.16b, #14
-	ext		t7.16b, t7.16b, t7.16b, #13
-	ext		t9.16b, t9.16b, t9.16b, #12
-
-	eor		t3.16b, t3.16b, t5.16b
-	eor		t7.16b, t7.16b, t9.16b
-	eor		\rq\().16b, \rq\().16b, t3.16b
-	eor		\rq\().16b, \rq\().16b, t7.16b
-	.endm
-
 	.macro		__pmull_pre_p64
 	add		x8, x3, #16
 	ld1		{HH.2d-HH4.2d}, [x8]
 
 	trn1		SHASH2.2d, SHASH.2d, HH.2d
@@ -158,47 +49,10 @@
 
 	movi		MASK.16b, #0xe1
 	shl		MASK.2d, MASK.2d, #57
 	.endm
 
-	.macro		__pmull_pre_p8
-	ext		SHASH2.16b, SHASH.16b, SHASH.16b, #8
-	eor		SHASH2.16b, SHASH2.16b, SHASH.16b
-
-	// k00_16 := 0x0000000000000000_000000000000ffff
-	// k32_48 := 0x00000000ffffffff_0000ffffffffffff
-	movi		k32_48.2d, #0xffffffff
-	mov		k32_48.h[2], k32_48.h[0]
-	ushr		k00_16.2d, k32_48.2d, #32
-
-	// prepare the permutation vectors
-	mov_q		x5, 0x080f0e0d0c0b0a09
-	movi		T1.8b, #8
-	dup		perm1.2d, x5
-	eor		perm1.16b, perm1.16b, T1.16b
-	ushr		perm2.2d, perm1.2d, #8
-	ushr		perm3.2d, perm1.2d, #16
-	ushr		T1.2d, perm1.2d, #24
-	sli		perm2.2d, perm1.2d, #56
-	sli		perm3.2d, perm1.2d, #48
-	sli		T1.2d, perm1.2d, #40
-
-	// precompute loop invariants
-	tbl		sh1.16b, {SHASH.16b}, perm1.16b
-	tbl		sh2.16b, {SHASH.16b}, perm2.16b
-	tbl		sh3.16b, {SHASH.16b}, perm3.16b
-	tbl		sh4.16b, {SHASH.16b}, T1.16b
-	ext		ss1.8b, SHASH2.8b, SHASH2.8b, #1
-	ext		ss2.8b, SHASH2.8b, SHASH2.8b, #2
-	ext		ss3.8b, SHASH2.8b, SHASH2.8b, #3
-	ext		ss4.8b, SHASH2.8b, SHASH2.8b, #4
-	.endm
-
-	//
-	// PMULL (64x64->128) based reduction for CPUs that can do
-	// it in a single instruction.
-	//
 	.macro		__pmull_reduce_p64
 	pmull		T2.1q, XL.1d, MASK.1d
 	eor		XM.16b, XM.16b, T1.16b
 
 	mov		XH.d[0], XM.d[1]
@@ -207,51 +61,27 @@
 	eor		XL.16b, XM.16b, T2.16b
 	ext		T2.16b, XL.16b, XL.16b, #8
 	pmull		XL.1q, XL.1d, MASK.1d
 	.endm
 
-	//
-	// Alternative reduction for CPUs that lack support for the
-	// 64x64->128 PMULL instruction
-	//
-	.macro		__pmull_reduce_p8
-	eor		XM.16b, XM.16b, T1.16b
-
-	mov		XL.d[1], XM.d[0]
-	mov		XH.d[0], XM.d[1]
-
-	shl		T1.2d, XL.2d, #57
-	shl		T2.2d, XL.2d, #62
-	eor		T2.16b, T2.16b, T1.16b
-	shl		T1.2d, XL.2d, #63
-	eor		T2.16b, T2.16b, T1.16b
-	ext		T1.16b, XL.16b, XH.16b, #8
-	eor		T2.16b, T2.16b, T1.16b
-
-	mov		XL.d[1], T2.d[0]
-	mov		XH.d[0], T2.d[1]
-
-	ushr		T2.2d, XL.2d, #1
-	eor		XH.16b, XH.16b, XL.16b
-	eor		XL.16b, XL.16b, T2.16b
-	ushr		T2.2d, T2.2d, #6
-	ushr		XL.2d, XL.2d, #1
-	.endm
-
-	.macro		__pmull_ghash, pn
+	/*
+	 * void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
+	 *			       u64 const h[][2], const char *head)
+	 */
+SYM_TYPED_FUNC_START(pmull_ghash_update_p64)
 	ld1		{SHASH.2d}, [x3]
 	ld1		{XL.2d}, [x1]
 
-	__pmull_pre_\pn
+	__pmull_pre_p64
 
 	/* do the head block first, if supplied */
 	cbz		x4, 0f
 	ld1		{T1.2d}, [x4]
 	mov		x4, xzr
 	b		3f
 
-0:	.ifc		\pn, p64
+0:
 	tbnz		w0, #0, 2f		// skip until #blocks is a
 	tbnz		w0, #1, 2f		// round multiple of 4
 
 1:	ld1		{XM3.16b-TT4.16b}, [x2], #64
 
@@ -312,11 +142,10 @@
 	eor		T2.16b, T2.16b, XH.16b
 	eor		XL.16b, XL.16b, T2.16b
 
 	cbz		w0, 5f
 	b		1b
-	.endif
 
 2:	ld1		{T1.2d}, [x2], #16
 	sub		w0, w0, #1
 
 3:	/* multiply XL by SHASH in GF(2^128) */
@@ -325,42 +154,30 @@ CPU_LE(	rev64		T1.16b, T1.16b	)
 	ext		T2.16b, XL.16b, XL.16b, #8
 	ext		IN1.16b, T1.16b, T1.16b, #8
 	eor		T1.16b, T1.16b, T2.16b
 	eor		XL.16b, XL.16b, IN1.16b
 
-	__pmull2_\pn	XH, XL, SHASH			// a1 * b1
+	pmull2		XH.1q, XL.2d, SHASH.2d		// a1 * b1
 	eor		T1.16b, T1.16b, XL.16b
-	__pmull_\pn 	XL, XL, SHASH			// a0 * b0
-	__pmull_\pn	XM, T1, SHASH2			// (a1 + a0)(b1 + b0)
+	pmull		XL.1q, XL.1d, SHASH.1d		// a0 * b0
+	pmull		XM.1q, T1.1d, SHASH2.1d		// (a1 + a0)(b1 + b0)
 
 4:	eor		T2.16b, XL.16b, XH.16b
 	ext		T1.16b, XL.16b, XH.16b, #8
 	eor		XM.16b, XM.16b, T2.16b
 
-	__pmull_reduce_\pn
+	__pmull_reduce_p64
 
 	eor		T2.16b, T2.16b, XH.16b
 	eor		XL.16b, XL.16b, T2.16b
 
 	cbnz		w0, 0b
 
 5:	st1		{XL.2d}, [x1]
 	ret
-	.endm
-
-	/*
-	 * void pmull_ghash_update(int blocks, u64 dg[], const char *src,
-	 *			   struct ghash_key const *k, const char *head)
-	 */
-SYM_TYPED_FUNC_START(pmull_ghash_update_p64)
-	__pmull_ghash	p64
 SYM_FUNC_END(pmull_ghash_update_p64)
 
-SYM_TYPED_FUNC_START(pmull_ghash_update_p8)
-	__pmull_ghash	p8
-SYM_FUNC_END(pmull_ghash_update_p8)
-
 	KS0		.req	v8
 	KS1		.req	v9
 	KS2		.req	v10
 	KS3		.req	v11
 
diff --git a/arch/arm64/crypto/ghash-neon-core.S b/arch/arm64/crypto/ghash-neon-core.S
new file mode 100644
index 000000000000..6157135ad566
--- /dev/null
+++ b/arch/arm64/crypto/ghash-neon-core.S
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Accelerated GHASH implementation with ARMv8 ASIMD instructions.
+ *
+ * Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/assembler.h>
+
+	SHASH		.req	v0
+	SHASH2		.req	v1
+	T1		.req	v2
+	T2		.req	v3
+	XM		.req	v5
+	XL		.req	v6
+	XH		.req	v7
+	IN1		.req	v7
+
+	k00_16		.req	v8
+	k32_48		.req	v9
+
+	t3		.req	v10
+	t4		.req	v11
+	t5		.req	v12
+	t6		.req	v13
+	t7		.req	v14
+	t8		.req	v15
+	t9		.req	v16
+
+	perm1		.req	v17
+	perm2		.req	v18
+	perm3		.req	v19
+
+	sh1		.req	v20
+	sh2		.req	v21
+	sh3		.req	v22
+	sh4		.req	v23
+
+	ss1		.req	v24
+	ss2		.req	v25
+	ss3		.req	v26
+	ss4		.req	v27
+
+	.text
+
+	.macro		__pmull_p8, rq, ad, bd
+	ext		t3.8b, \ad\().8b, \ad\().8b, #1		// A1
+	ext		t5.8b, \ad\().8b, \ad\().8b, #2		// A2
+	ext		t7.8b, \ad\().8b, \ad\().8b, #3		// A3
+
+	__pmull_p8_\bd	\rq, \ad
+	.endm
+
+	.macro		__pmull2_p8, rq, ad, bd
+	tbl		t3.16b, {\ad\().16b}, perm1.16b		// A1
+	tbl		t5.16b, {\ad\().16b}, perm2.16b		// A2
+	tbl		t7.16b, {\ad\().16b}, perm3.16b		// A3
+
+	__pmull2_p8_\bd	\rq, \ad
+	.endm
+
+	.macro		__pmull_p8_SHASH, rq, ad
+	__pmull_p8_tail	\rq, \ad\().8b, SHASH.8b, 8b,, sh1, sh2, sh3, sh4
+	.endm
+
+	.macro		__pmull_p8_SHASH2, rq, ad
+	__pmull_p8_tail	\rq, \ad\().8b, SHASH2.8b, 8b,, ss1, ss2, ss3, ss4
+	.endm
+
+	.macro		__pmull2_p8_SHASH, rq, ad
+	__pmull_p8_tail	\rq, \ad\().16b, SHASH.16b, 16b, 2, sh1, sh2, sh3, sh4
+	.endm
+
+	.macro		__pmull_p8_tail, rq, ad, bd, nb, t, b1, b2, b3, b4
+	pmull\t		t3.8h, t3.\nb, \bd			// F = A1*B
+	pmull\t		t4.8h, \ad, \b1\().\nb			// E = A*B1
+	pmull\t		t5.8h, t5.\nb, \bd			// H = A2*B
+	pmull\t		t6.8h, \ad, \b2\().\nb			// G = A*B2
+	pmull\t		t7.8h, t7.\nb, \bd			// J = A3*B
+	pmull\t		t8.8h, \ad, \b3\().\nb			// I = A*B3
+	pmull\t		t9.8h, \ad, \b4\().\nb			// K = A*B4
+	pmull\t		\rq\().8h, \ad, \bd			// D = A*B
+
+	eor		t3.16b, t3.16b, t4.16b			// L = E + F
+	eor		t5.16b, t5.16b, t6.16b			// M = G + H
+	eor		t7.16b, t7.16b, t8.16b			// N = I + J
+
+	uzp1		t4.2d, t3.2d, t5.2d
+	uzp2		t3.2d, t3.2d, t5.2d
+	uzp1		t6.2d, t7.2d, t9.2d
+	uzp2		t7.2d, t7.2d, t9.2d
+
+	// t3 = (L) (P0 + P1) << 8
+	// t5 = (M) (P2 + P3) << 16
+	eor		t4.16b, t4.16b, t3.16b
+	and		t3.16b, t3.16b, k32_48.16b
+
+	// t7 = (N) (P4 + P5) << 24
+	// t9 = (K) (P6 + P7) << 32
+	eor		t6.16b, t6.16b, t7.16b
+	and		t7.16b, t7.16b, k00_16.16b
+
+	eor		t4.16b, t4.16b, t3.16b
+	eor		t6.16b, t6.16b, t7.16b
+
+	zip2		t5.2d, t4.2d, t3.2d
+	zip1		t3.2d, t4.2d, t3.2d
+	zip2		t9.2d, t6.2d, t7.2d
+	zip1		t7.2d, t6.2d, t7.2d
+
+	ext		t3.16b, t3.16b, t3.16b, #15
+	ext		t5.16b, t5.16b, t5.16b, #14
+	ext		t7.16b, t7.16b, t7.16b, #13
+	ext		t9.16b, t9.16b, t9.16b, #12
+
+	eor		t3.16b, t3.16b, t5.16b
+	eor		t7.16b, t7.16b, t9.16b
+	eor		\rq\().16b, \rq\().16b, t3.16b
+	eor		\rq\().16b, \rq\().16b, t7.16b
+	.endm
+
+	.macro		__pmull_pre_p8
+	ext		SHASH2.16b, SHASH.16b, SHASH.16b, #8
+	eor		SHASH2.16b, SHASH2.16b, SHASH.16b
+
+	// k00_16 := 0x0000000000000000_000000000000ffff
+	// k32_48 := 0x00000000ffffffff_0000ffffffffffff
+	movi		k32_48.2d, #0xffffffff
+	mov		k32_48.h[2], k32_48.h[0]
+	ushr		k00_16.2d, k32_48.2d, #32
+
+	// prepare the permutation vectors
+	mov_q		x5, 0x080f0e0d0c0b0a09
+	movi		T1.8b, #8
+	dup		perm1.2d, x5
+	eor		perm1.16b, perm1.16b, T1.16b
+	ushr		perm2.2d, perm1.2d, #8
+	ushr		perm3.2d, perm1.2d, #16
+	ushr		T1.2d, perm1.2d, #24
+	sli		perm2.2d, perm1.2d, #56
+	sli		perm3.2d, perm1.2d, #48
+	sli		T1.2d, perm1.2d, #40
+
+	// precompute loop invariants
+	tbl		sh1.16b, {SHASH.16b}, perm1.16b
+	tbl		sh2.16b, {SHASH.16b}, perm2.16b
+	tbl		sh3.16b, {SHASH.16b}, perm3.16b
+	tbl		sh4.16b, {SHASH.16b}, T1.16b
+	ext		ss1.8b, SHASH2.8b, SHASH2.8b, #1
+	ext		ss2.8b, SHASH2.8b, SHASH2.8b, #2
+	ext		ss3.8b, SHASH2.8b, SHASH2.8b, #3
+	ext		ss4.8b, SHASH2.8b, SHASH2.8b, #4
+	.endm
+
+	.macro		__pmull_reduce_p8
+	eor		XM.16b, XM.16b, T1.16b
+
+	mov		XL.d[1], XM.d[0]
+	mov		XH.d[0], XM.d[1]
+
+	shl		T1.2d, XL.2d, #57
+	shl		T2.2d, XL.2d, #62
+	eor		T2.16b, T2.16b, T1.16b
+	shl		T1.2d, XL.2d, #63
+	eor		T2.16b, T2.16b, T1.16b
+	ext		T1.16b, XL.16b, XH.16b, #8
+	eor		T2.16b, T2.16b, T1.16b
+
+	mov		XL.d[1], T2.d[0]
+	mov		XH.d[0], T2.d[1]
+
+	ushr		T2.2d, XL.2d, #1
+	eor		XH.16b, XH.16b, XL.16b
+	eor		XL.16b, XL.16b, T2.16b
+	ushr		T2.2d, T2.2d, #6
+	ushr		XL.2d, XL.2d, #1
+	.endm
+
+	/*
+	 * void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
+	 *			      u64 const h[][2], const char *head)
+	 */
+SYM_TYPED_FUNC_START(pmull_ghash_update_p8)
+	ld1		{SHASH.2d}, [x3]
+	ld1		{XL.2d}, [x1]
+
+	__pmull_pre_p8
+
+	/* do the head block first, if supplied */
+	cbz		x4, 0f
+	ld1		{T1.2d}, [x4]
+	mov		x4, xzr
+	b		3f
+
+0:	ld1		{T1.2d}, [x2], #16
+	sub		w0, w0, #1
+
+3:	/* multiply XL by SHASH in GF(2^128) */
+CPU_LE(	rev64		T1.16b, T1.16b	)
+
+	ext		T2.16b, XL.16b, XL.16b, #8
+	ext		IN1.16b, T1.16b, T1.16b, #8
+	eor		T1.16b, T1.16b, T2.16b
+	eor		XL.16b, XL.16b, IN1.16b
+
+	__pmull2_p8	XH, XL, SHASH			// a1 * b1
+	eor		T1.16b, T1.16b, XL.16b
+	__pmull_p8 	XL, XL, SHASH			// a0 * b0
+	__pmull_p8	XM, T1, SHASH2			// (a1 + a0)(b1 + b0)
+
+	eor		T2.16b, XL.16b, XH.16b
+	ext		T1.16b, XL.16b, XH.16b, #8
+	eor		XM.16b, XM.16b, T2.16b
+
+	__pmull_reduce_p8
+
+	eor		T2.16b, T2.16b, XH.16b
+	eor		XL.16b, XL.16b, T2.16b
+
+	cbnz		w0, 0b
+
+	st1		{XL.2d}, [x1]
+	ret
+SYM_FUNC_END(pmull_ghash_update_p8)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/19] lib/crypto: arm64/ghash: Migrate optimized code into library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (7 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 08/19] crypto: arm64/ghash - Move NEON GHASH assembly into its own file Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 10/19] crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized Eric Biggers
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the "ghash-neon" crypto_shash algorithm.  Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.

This makes the GHASH library be optimized on arm64 (though only with
NEON, not PMULL; for now the goal is just parity with crypto_shash).  It
greatly reduces the amount of arm64-specific glue code that is needed,
and it fixes the issue where this optimization was disabled by default.

To integrate the assembly code correctly with the library, make the
following tweaks:

- Change the type of 'blocks' from int to size_t
- Change the types of 'dg' and 'k' to polyval_elem.  Note that this
  simply reflects the format that the code was already using.
- Remove the 'head' argument, which is no longer needed.
- Remove the CFI stubs, as indirect calls are no longer used.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm64/crypto/Kconfig                     |   5 +-
 arch/arm64/crypto/Makefile                    |   2 +-
 arch/arm64/crypto/ghash-ce-core.S             |   3 +-
 arch/arm64/crypto/ghash-ce-glue.c             | 146 ++----------------
 lib/crypto/Makefile                           |   3 +-
 lib/crypto/arm64/gf128hash.h                  |  68 +++++++-
 .../crypto/arm64}/ghash-neon-core.S           |  20 +--
 7 files changed, 86 insertions(+), 161 deletions(-)
 rename {arch/arm64/crypto => lib/crypto/arm64}/ghash-neon-core.S (93%)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 82794afaffc9..1a0c553fbfd7 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -1,18 +1,17 @@
 # SPDX-License-Identifier: GPL-2.0
 
 menu "Accelerated Cryptographic Algorithms for CPU (arm64)"
 
 config CRYPTO_GHASH_ARM64_CE
-	tristate "Hash functions: GHASH (ARMv8 Crypto Extensions)"
+	tristate "AEAD cipher: AES in GCM mode (ARMv8 Crypto Extensions)"
 	depends on KERNEL_MODE_NEON
-	select CRYPTO_HASH
 	select CRYPTO_LIB_AES
 	select CRYPTO_LIB_GF128MUL
 	select CRYPTO_AEAD
 	help
-	  GCM GHASH function (NIST SP800-38D)
+	  AEAD cipher: AES-GCM
 
 	  Architecture: arm64 using:
 	  - ARMv8 Crypto Extensions
 
 config CRYPTO_SM3_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index b7ba43ce8584..8a8e3e551ed3 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -25,11 +25,11 @@ sm4-ce-gcm-y := sm4-ce-gcm-glue.o sm4-ce-gcm-core.o
 
 obj-$(CONFIG_CRYPTO_SM4_ARM64_NEON_BLK) += sm4-neon.o
 sm4-neon-y := sm4-neon-glue.o sm4-neon-core.o
 
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
-ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o ghash-neon-core.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
 aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE_BLK) += aes-ce-blk.o
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
index 4344fe213d14..a01f136f4fb2 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -4,11 +4,10 @@
  *
  * Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
  */
 
 #include <linux/linkage.h>
-#include <linux/cfi_types.h>
 #include <asm/assembler.h>
 
 	SHASH		.req	v0
 	SHASH2		.req	v1
 	T1		.req	v2
@@ -65,11 +64,11 @@
 
 	/*
 	 * void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
 	 *			       u64 const h[][2], const char *head)
 	 */
-SYM_TYPED_FUNC_START(pmull_ghash_update_p64)
+SYM_FUNC_START(pmull_ghash_update_p64)
 	ld1		{SHASH.2d}, [x3]
 	ld1		{XL.2d}, [x1]
 
 	__pmull_pre_p64
 
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index 63bb9e062251..42fb46bdc124 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -1,19 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ * AES-GCM using ARMv8 Crypto Extensions
  *
  * Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
  */
 
 #include <crypto/aes.h>
 #include <crypto/b128ops.h>
 #include <crypto/gcm.h>
 #include <crypto/ghash.h>
 #include <crypto/gf128mul.h>
 #include <crypto/internal/aead.h>
-#include <crypto/internal/hash.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/scatterwalk.h>
 #include <linux/cpufeature.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -21,14 +20,15 @@
 #include <linux/string.h>
 #include <linux/unaligned.h>
 
 #include <asm/simd.h>
 
-MODULE_DESCRIPTION("GHASH and AES-GCM using ARMv8 Crypto Extensions");
+MODULE_DESCRIPTION("AES-GCM using ARMv8 Crypto Extensions");
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
-MODULE_ALIAS_CRYPTO("ghash");
+MODULE_ALIAS_CRYPTO("gcm(aes)");
+MODULE_ALIAS_CRYPTO("rfc4106(gcm(aes))");
 
 #define RFC4106_NONCE_SIZE	4
 
 struct ghash_key {
 	be128			k;
@@ -46,100 +46,23 @@ struct gcm_aes_ctx {
 };
 
 asmlinkage void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
 				       u64 const h[][2], const char *head);
 
-asmlinkage void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
-				      u64 const h[][2], const char *head);
-
 asmlinkage void pmull_gcm_encrypt(int bytes, u8 dst[], const u8 src[],
 				  u64 const h[][2], u64 dg[], u8 ctr[],
 				  u32 const rk[], int rounds, u8 tag[]);
 asmlinkage int pmull_gcm_decrypt(int bytes, u8 dst[], const u8 src[],
 				 u64 const h[][2], u64 dg[], u8 ctr[],
 				 u32 const rk[], int rounds, const u8 l[],
 				 const u8 tag[], u64 authsize);
 
-static int ghash_init(struct shash_desc *desc)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	*ctx = (struct arm_ghash_desc_ctx){};
-	return 0;
-}
-
-static __always_inline
-void ghash_do_simd_update(int blocks, u64 dg[], const char *src,
-			  struct ghash_key *key, const char *head,
-			  void (*simd_update)(int blocks, u64 dg[],
-					      const char *src,
-					      u64 const h[][2],
-					      const char *head))
+static void ghash_do_simd_update(int blocks, u64 dg[], const char *src,
+				 struct ghash_key *key, const char *head)
 {
 	scoped_ksimd()
-		simd_update(blocks, dg, src, key->h, head);
-}
-
-/* avoid hogging the CPU for too long */
-#define MAX_BLOCKS	(SZ_64K / GHASH_BLOCK_SIZE)
-
-static int ghash_update(struct shash_desc *desc, const u8 *src,
-			unsigned int len)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	struct ghash_key *key = crypto_shash_ctx(desc->tfm);
-	int blocks;
-
-	blocks = len / GHASH_BLOCK_SIZE;
-	len -= blocks * GHASH_BLOCK_SIZE;
-
-	do {
-		int chunk = min(blocks, MAX_BLOCKS);
-
-		ghash_do_simd_update(chunk, ctx->digest, src, key, NULL,
-				     pmull_ghash_update_p8);
-		blocks -= chunk;
-		src += chunk * GHASH_BLOCK_SIZE;
-	} while (unlikely(blocks > 0));
-	return len;
-}
-
-static int ghash_export(struct shash_desc *desc, void *out)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	u8 *dst = out;
-
-	put_unaligned_be64(ctx->digest[1], dst);
-	put_unaligned_be64(ctx->digest[0], dst + 8);
-	return 0;
-}
-
-static int ghash_import(struct shash_desc *desc, const void *in)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	const u8 *src = in;
-
-	ctx->digest[1] = get_unaligned_be64(src);
-	ctx->digest[0] = get_unaligned_be64(src + 8);
-	return 0;
-}
-
-static int ghash_finup(struct shash_desc *desc, const u8 *src,
-		       unsigned int len, u8 *dst)
-{
-	struct arm_ghash_desc_ctx *ctx = shash_desc_ctx(desc);
-	struct ghash_key *key = crypto_shash_ctx(desc->tfm);
-
-	if (len) {
-		u8 buf[GHASH_BLOCK_SIZE] = {};
-
-		memcpy(buf, src, len);
-		ghash_do_simd_update(1, ctx->digest, buf, key, NULL,
-				     pmull_ghash_update_p8);
-		memzero_explicit(buf, sizeof(buf));
-	}
-	return ghash_export(desc, dst);
+		pmull_ghash_update_p64(blocks, dg, src, key->h, head);
 }
 
 static void ghash_reflect(u64 h[], const be128 *k)
 {
 	u64 carry = be64_to_cpu(k->a) & BIT(63) ? 1 : 0;
@@ -149,45 +72,10 @@ static void ghash_reflect(u64 h[], const be128 *k)
 
 	if (carry)
 		h[1] ^= 0xc200000000000000UL;
 }
 
-static int ghash_setkey(struct crypto_shash *tfm,
-			const u8 *inkey, unsigned int keylen)
-{
-	struct ghash_key *key = crypto_shash_ctx(tfm);
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	/* needed for the fallback */
-	memcpy(&key->k, inkey, GHASH_BLOCK_SIZE);
-
-	ghash_reflect(key->h[0], &key->k);
-	return 0;
-}
-
-static struct shash_alg ghash_alg = {
-	.base.cra_name		= "ghash",
-	.base.cra_driver_name	= "ghash-neon",
-	.base.cra_priority	= 150,
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= GHASH_BLOCK_SIZE,
-	.base.cra_ctxsize	= sizeof(struct ghash_key) + sizeof(u64[2]),
-	.base.cra_module	= THIS_MODULE,
-
-	.digestsize		= GHASH_DIGEST_SIZE,
-	.init			= ghash_init,
-	.update			= ghash_update,
-	.finup			= ghash_finup,
-	.setkey			= ghash_setkey,
-	.export			= ghash_export,
-	.import			= ghash_import,
-	.descsize		= sizeof(struct arm_ghash_desc_ctx),
-	.statesize		= sizeof(struct ghash_desc_ctx),
-};
-
 static int gcm_aes_setkey(struct crypto_aead *tfm, const u8 *inkey,
 			  unsigned int keylen)
 {
 	struct gcm_aes_ctx *ctx = crypto_aead_ctx(tfm);
 	u8 key[GHASH_BLOCK_SIZE];
@@ -238,13 +126,11 @@ static void gcm_update_mac(u64 dg[], const u8 *src, int count, u8 buf[],
 
 	if (count >= GHASH_BLOCK_SIZE || *buf_count == GHASH_BLOCK_SIZE) {
 		int blocks = count / GHASH_BLOCK_SIZE;
 
 		ghash_do_simd_update(blocks, dg, src, &ctx->ghash_key,
-				     *buf_count ? buf : NULL,
-				     pmull_ghash_update_p64);
-
+				     *buf_count ? buf : NULL);
 		src += blocks * GHASH_BLOCK_SIZE;
 		count %= GHASH_BLOCK_SIZE;
 		*buf_count = 0;
 	}
 
@@ -273,12 +159,11 @@ static void gcm_calculate_auth_mac(struct aead_request *req, u64 dg[], u32 len)
 		len -= n;
 	} while (len);
 
 	if (buf_count) {
 		memset(&buf[buf_count], 0, GHASH_BLOCK_SIZE - buf_count);
-		ghash_do_simd_update(1, dg, buf, &ctx->ghash_key, NULL,
-				     pmull_ghash_update_p64);
+		ghash_do_simd_update(1, dg, buf, &ctx->ghash_key, NULL);
 	}
 }
 
 static int gcm_encrypt(struct aead_request *req, char *iv, int assoclen)
 {
@@ -503,26 +388,19 @@ static struct aead_alg gcm_aes_algs[] = {{
 	.base.cra_module	= THIS_MODULE,
 }};
 
 static int __init ghash_ce_mod_init(void)
 {
-	if (!cpu_have_named_feature(ASIMD))
+	if (!cpu_have_named_feature(ASIMD) || !cpu_have_named_feature(PMULL))
 		return -ENODEV;
 
-	if (cpu_have_named_feature(PMULL))
-		return crypto_register_aeads(gcm_aes_algs,
-					     ARRAY_SIZE(gcm_aes_algs));
-
-	return crypto_register_shash(&ghash_alg);
+	return crypto_register_aeads(gcm_aes_algs, ARRAY_SIZE(gcm_aes_algs));
 }
 
 static void __exit ghash_ce_mod_exit(void)
 {
-	if (cpu_have_named_feature(PMULL))
-		crypto_unregister_aeads(gcm_aes_algs, ARRAY_SIZE(gcm_aes_algs));
-	else
-		crypto_unregister_shash(&ghash_alg);
+	crypto_unregister_aeads(gcm_aes_algs, ARRAY_SIZE(gcm_aes_algs));
 }
 
 static const struct cpu_feature __maybe_unused ghash_cpu_feature[] = {
 	{ cpu_feature(PMULL) }, { }
 };
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 8a06dd6a43ea..4ce0bac8fd93 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -157,11 +157,12 @@ libdes-y					:= des.o
 obj-$(CONFIG_CRYPTO_LIB_GF128HASH) += libgf128hash.o
 libgf128hash-y := gf128hash.o
 ifeq ($(CONFIG_CRYPTO_LIB_GF128HASH_ARCH),y)
 CFLAGS_gf128hash.o += -I$(src)/$(SRCARCH)
 libgf128hash-$(CONFIG_ARM) += arm/ghash-neon-core.o
-libgf128hash-$(CONFIG_ARM64) += arm64/polyval-ce-core.o
+libgf128hash-$(CONFIG_ARM64) += arm64/ghash-neon-core.o \
+				arm64/polyval-ce-core.o
 libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
 endif
 
 ################################################################################
 
diff --git a/lib/crypto/arm64/gf128hash.h b/lib/crypto/arm64/gf128hash.h
index 796c36804dda..d5ef1b1b77e1 100644
--- a/lib/crypto/arm64/gf128hash.h
+++ b/lib/crypto/arm64/gf128hash.h
@@ -1,23 +1,27 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * POLYVAL library functions, arm64 optimized
+ * GHASH and POLYVAL, arm64 optimized
  *
  * Copyright 2025 Google LLC
  */
 #include <asm/simd.h>
 #include <linux/cpufeature.h>
 
 #define NUM_H_POWERS 8
 
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_asimd);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pmull);
 
 asmlinkage void polyval_mul_pmull(struct polyval_elem *a,
 				  const struct polyval_elem *b);
 asmlinkage void polyval_blocks_pmull(struct polyval_elem *acc,
 				     const struct polyval_key *key,
 				     const u8 *data, size_t nblocks);
+asmlinkage void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
+				      const u8 *src,
+				      const struct polyval_elem *k);
 
 #define polyval_preparekey_arch polyval_preparekey_arch
 static void polyval_preparekey_arch(struct polyval_key *key,
 				    const u8 raw_key[POLYVAL_BLOCK_SIZE])
 {
@@ -39,19 +43,66 @@ static void polyval_preparekey_arch(struct polyval_key *key,
 					    &key->h_powers[NUM_H_POWERS - 1]);
 		}
 	}
 }
 
+static void polyval_mul_arm64(struct polyval_elem *a,
+			      const struct polyval_elem *b)
+{
+	if (static_branch_likely(&have_asimd) && may_use_simd()) {
+		static const u8 zeroes[GHASH_BLOCK_SIZE];
+
+		scoped_ksimd() {
+			if (static_branch_likely(&have_pmull)) {
+				polyval_mul_pmull(a, b);
+			} else {
+				/*
+				 * Note that this is indeed equivalent to a
+				 * POLYVAL multiplication, since it takes the
+				 * accumulator and key in POLYVAL format, and
+				 * byte-swapping a block of zeroes is a no-op.
+				 */
+				pmull_ghash_update_p8(1, a, zeroes, b);
+			}
+		}
+	} else {
+		polyval_mul_generic(a, b);
+	}
+}
+
+#define ghash_mul_arch ghash_mul_arch
+static void ghash_mul_arch(struct polyval_elem *acc,
+			   const struct ghash_key *key)
+{
+	polyval_mul_arm64(acc, &key->h);
+}
+
 #define polyval_mul_arch polyval_mul_arch
 static void polyval_mul_arch(struct polyval_elem *acc,
 			     const struct polyval_key *key)
 {
-	if (static_branch_likely(&have_pmull) && may_use_simd()) {
-		scoped_ksimd()
-			polyval_mul_pmull(acc, &key->h_powers[NUM_H_POWERS - 1]);
+	polyval_mul_arm64(acc, &key->h_powers[NUM_H_POWERS - 1]);
+}
+
+#define ghash_blocks_arch ghash_blocks_arch
+static void ghash_blocks_arch(struct polyval_elem *acc,
+			      const struct ghash_key *key,
+			      const u8 *data, size_t nblocks)
+{
+	if (static_branch_likely(&have_asimd) && may_use_simd()) {
+		do {
+			/* Allow rescheduling every 4 KiB. */
+			size_t n =
+				min_t(size_t, nblocks, 4096 / GHASH_BLOCK_SIZE);
+
+			scoped_ksimd()
+				pmull_ghash_update_p8(n, acc, data, &key->h);
+			data += n * GHASH_BLOCK_SIZE;
+			nblocks -= n;
+		} while (nblocks);
 	} else {
-		polyval_mul_generic(acc, &key->h_powers[NUM_H_POWERS - 1]);
+		ghash_blocks_generic(acc, &key->h, data, nblocks);
 	}
 }
 
 #define polyval_blocks_arch polyval_blocks_arch
 static void polyval_blocks_arch(struct polyval_elem *acc,
@@ -76,8 +127,11 @@ static void polyval_blocks_arch(struct polyval_elem *acc,
 }
 
 #define gf128hash_mod_init_arch gf128hash_mod_init_arch
 static void gf128hash_mod_init_arch(void)
 {
-	if (cpu_have_named_feature(PMULL))
-		static_branch_enable(&have_pmull);
+	if (cpu_have_named_feature(ASIMD)) {
+		static_branch_enable(&have_asimd);
+		if (cpu_have_named_feature(PMULL))
+			static_branch_enable(&have_pmull);
+	}
 }
diff --git a/arch/arm64/crypto/ghash-neon-core.S b/lib/crypto/arm64/ghash-neon-core.S
similarity index 93%
rename from arch/arm64/crypto/ghash-neon-core.S
rename to lib/crypto/arm64/ghash-neon-core.S
index 6157135ad566..eadd6da47247 100644
--- a/arch/arm64/crypto/ghash-neon-core.S
+++ b/lib/crypto/arm64/ghash-neon-core.S
@@ -4,11 +4,10 @@
  *
  * Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
  */
 
 #include <linux/linkage.h>
-#include <linux/cfi_types.h>
 #include <asm/assembler.h>
 
 	SHASH		.req	v0
 	SHASH2		.req	v1
 	T1		.req	v2
@@ -177,29 +176,24 @@
 	ushr		T2.2d, T2.2d, #6
 	ushr		XL.2d, XL.2d, #1
 	.endm
 
 	/*
-	 * void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
-	 *			      u64 const h[][2], const char *head)
+	 * void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
+	 *			      const u8 *src,
+	 *			      const struct polyval_elem *k)
 	 */
-SYM_TYPED_FUNC_START(pmull_ghash_update_p8)
+SYM_FUNC_START(pmull_ghash_update_p8)
 	ld1		{SHASH.2d}, [x3]
 	ld1		{XL.2d}, [x1]
 
 	__pmull_pre_p8
 
-	/* do the head block first, if supplied */
-	cbz		x4, 0f
-	ld1		{T1.2d}, [x4]
-	mov		x4, xzr
-	b		3f
-
 0:	ld1		{T1.2d}, [x2], #16
-	sub		w0, w0, #1
+	sub		x0, x0, #1
 
-3:	/* multiply XL by SHASH in GF(2^128) */
+	/* multiply XL by SHASH in GF(2^128) */
 CPU_LE(	rev64		T1.16b, T1.16b	)
 
 	ext		T2.16b, XL.16b, XL.16b, #8
 	ext		IN1.16b, T1.16b, T1.16b, #8
 	eor		T1.16b, T1.16b, T2.16b
@@ -217,10 +211,10 @@ CPU_LE(	rev64		T1.16b, T1.16b	)
 	__pmull_reduce_p8
 
 	eor		T2.16b, T2.16b, XH.16b
 	eor		XL.16b, XL.16b, T2.16b
 
-	cbnz		w0, 0b
+	cbnz		x0, 0b
 
 	st1		{XL.2d}, [x1]
 	ret
 SYM_FUNC_END(pmull_ghash_update_p8)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/19] crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (8 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 09/19] lib/crypto: arm64/ghash: Migrate optimized code into library Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 11/19] lib/crypto: powerpc/ghash: Migrate optimized code into library Eric Biggers
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Rename the 'struct ghash_key' in arch/arm64/crypto/ghash-ce-glue.c to
prevent a naming conflict with the library 'struct ghash_key'.  In
addition, declare the 'h' field with an explicit size, now that there's
no longer any reason for it to be a flexible array.

Update the comments in the assembly file to match the C code.  Note that
some of these were out-of-date.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm64/crypto/ghash-ce-core.S | 15 ++++++++-------
 arch/arm64/crypto/ghash-ce-glue.c | 20 +++++++++-----------
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
index a01f136f4fb2..33772d8fe6b5 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -62,11 +62,11 @@
 	pmull		XL.1q, XL.1d, MASK.1d
 	.endm
 
 	/*
 	 * void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
-	 *			       u64 const h[][2], const char *head)
+	 *			       u64 const h[4][2], const char *head)
 	 */
 SYM_FUNC_START(pmull_ghash_update_p64)
 	ld1		{SHASH.2d}, [x3]
 	ld1		{XL.2d}, [x1]
 
@@ -411,22 +411,23 @@ CPU_LE(	rev		w8, w8		)
 	.endif
 	b		3b
 	.endm
 
 	/*
-	 * void pmull_gcm_encrypt(int blocks, u8 dst[], const u8 src[],
-	 *			  struct ghash_key const *k, u64 dg[], u8 ctr[],
-	 *			  int rounds, u8 tag)
+	 * void pmull_gcm_encrypt(int bytes, u8 dst[], const u8 src[],
+	 *			  u64 const h[4][2], u64 dg[], u8 ctr[],
+	 *			  u32 const rk[], int rounds, u8 tag[])
 	 */
 SYM_FUNC_START(pmull_gcm_encrypt)
 	pmull_gcm_do_crypt	1
 SYM_FUNC_END(pmull_gcm_encrypt)
 
 	/*
-	 * void pmull_gcm_decrypt(int blocks, u8 dst[], const u8 src[],
-	 *			  struct ghash_key const *k, u64 dg[], u8 ctr[],
-	 *			  int rounds, u8 tag)
+	 * int pmull_gcm_decrypt(int bytes, u8 dst[], const u8 src[],
+	 *			 u64 const h[4][2], u64 dg[], u8 ctr[],
+	 *			 u32 const rk[], int rounds, const u8 l[],
+	 *			 const u8 tag[], u64 authsize)
 	 */
 SYM_FUNC_START(pmull_gcm_decrypt)
 	pmull_gcm_do_crypt	0
 SYM_FUNC_END(pmull_gcm_decrypt)
 
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index 42fb46bdc124..c74066d430fa 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -28,38 +28,38 @@ MODULE_LICENSE("GPL v2");
 MODULE_ALIAS_CRYPTO("gcm(aes)");
 MODULE_ALIAS_CRYPTO("rfc4106(gcm(aes))");
 
 #define RFC4106_NONCE_SIZE	4
 
-struct ghash_key {
+struct arm_ghash_key {
 	be128			k;
-	u64			h[][2];
+	u64			h[4][2];
 };
 
 struct arm_ghash_desc_ctx {
 	u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
 };
 
 struct gcm_aes_ctx {
 	struct aes_enckey	aes_key;
 	u8			nonce[RFC4106_NONCE_SIZE];
-	struct ghash_key	ghash_key;
+	struct arm_ghash_key	ghash_key;
 };
 
 asmlinkage void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
-				       u64 const h[][2], const char *head);
+				       u64 const h[4][2], const char *head);
 
 asmlinkage void pmull_gcm_encrypt(int bytes, u8 dst[], const u8 src[],
-				  u64 const h[][2], u64 dg[], u8 ctr[],
+				  u64 const h[4][2], u64 dg[], u8 ctr[],
 				  u32 const rk[], int rounds, u8 tag[]);
 asmlinkage int pmull_gcm_decrypt(int bytes, u8 dst[], const u8 src[],
-				 u64 const h[][2], u64 dg[], u8 ctr[],
+				 u64 const h[4][2], u64 dg[], u8 ctr[],
 				 u32 const rk[], int rounds, const u8 l[],
 				 const u8 tag[], u64 authsize);
 
 static void ghash_do_simd_update(int blocks, u64 dg[], const char *src,
-				 struct ghash_key *key, const char *head)
+				 struct arm_ghash_key *key, const char *head)
 {
 	scoped_ksimd()
 		pmull_ghash_update_p64(blocks, dg, src, key->h, head);
 }
 
@@ -365,12 +365,11 @@ static struct aead_alg gcm_aes_algs[] = {{
 
 	.base.cra_name		= "gcm(aes)",
 	.base.cra_driver_name	= "gcm-aes-ce",
 	.base.cra_priority	= 300,
 	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct gcm_aes_ctx) +
-				  4 * sizeof(u64[2]),
+	.base.cra_ctxsize	= sizeof(struct gcm_aes_ctx),
 	.base.cra_module	= THIS_MODULE,
 }, {
 	.ivsize			= GCM_RFC4106_IV_SIZE,
 	.chunksize		= AES_BLOCK_SIZE,
 	.maxauthsize		= AES_BLOCK_SIZE,
@@ -381,12 +380,11 @@ static struct aead_alg gcm_aes_algs[] = {{
 
 	.base.cra_name		= "rfc4106(gcm(aes))",
 	.base.cra_driver_name	= "rfc4106-gcm-aes-ce",
 	.base.cra_priority	= 300,
 	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct gcm_aes_ctx) +
-				  4 * sizeof(u64[2]),
+	.base.cra_ctxsize	= sizeof(struct gcm_aes_ctx),
 	.base.cra_module	= THIS_MODULE,
 }};
 
 static int __init ghash_ce_mod_init(void)
 {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/19] lib/crypto: powerpc/ghash: Migrate optimized code into library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (9 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 10/19] crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 12/19] lib/crypto: riscv/ghash: " Eric Biggers
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the "p8_ghash" crypto_shash algorithm.  Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.

This makes the GHASH library be optimized for POWER8.  It also greatly
reduces the amount of powerpc-specific glue code that is needed, and it
fixes the issue where this optimized GHASH code was disabled by default.

Note that previously the C code defined the POWER8 GHASH key format as
"u128 htable[16]", despite the assembly code only using four entries.
Fix the C code to use the correct key format.  To fulfill the library
API contract, also make the key preparation work in all contexts.

Note that the POWER8 assembly code takes the accumulator in GHASH
format, but it actually byte-reflects it to get it into POLYVAL format.
The library already works with POLYVAL natively.  For now, just wire up
this existing code by converting it to/from GHASH format in C code.
This should be cleaned up to eliminate the unnecessary conversion later.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 MAINTAINERS                                   |   4 +-
 arch/powerpc/crypto/Kconfig                   |   5 +-
 arch/powerpc/crypto/Makefile                  |   8 +-
 arch/powerpc/crypto/aesp8-ppc.h               |   1 -
 arch/powerpc/crypto/ghash.c                   | 160 ------------------
 arch/powerpc/crypto/vmx.c                     |  10 +-
 include/crypto/gf128hash.h                    |   4 +
 lib/crypto/Kconfig                            |   1 +
 lib/crypto/Makefile                           |  25 ++-
 lib/crypto/powerpc/.gitignore                 |   1 +
 lib/crypto/powerpc/gf128hash.h                | 109 ++++++++++++
 .../crypto/powerpc}/ghashp8-ppc.pl            |   1 +
 12 files changed, 143 insertions(+), 186 deletions(-)
 delete mode 100644 arch/powerpc/crypto/ghash.c
 create mode 100644 lib/crypto/powerpc/gf128hash.h
 rename {arch/powerpc/crypto => lib/crypto/powerpc}/ghashp8-ppc.pl (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 77fdfcb55f06..f088f4085653 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12265,14 +12265,14 @@ F:	arch/powerpc/crypto/Makefile
 F:	arch/powerpc/crypto/aes.c
 F:	arch/powerpc/crypto/aes_cbc.c
 F:	arch/powerpc/crypto/aes_ctr.c
 F:	arch/powerpc/crypto/aes_xts.c
 F:	arch/powerpc/crypto/aesp8-ppc.*
-F:	arch/powerpc/crypto/ghash.c
-F:	arch/powerpc/crypto/ghashp8-ppc.pl
 F:	arch/powerpc/crypto/ppc-xlate.pl
 F:	arch/powerpc/crypto/vmx.c
+F:	lib/crypto/powerpc/gf128hash.h
+F:	lib/crypto/powerpc/ghashp8-ppc.pl
 
 IBM ServeRAID RAID DRIVER
 S:	Orphan
 F:	drivers/scsi/ips.*
 
diff --git a/arch/powerpc/crypto/Kconfig b/arch/powerpc/crypto/Kconfig
index 2d056f1fc90f..b247f7ed973e 100644
--- a/arch/powerpc/crypto/Kconfig
+++ b/arch/powerpc/crypto/Kconfig
@@ -52,14 +52,13 @@ config CRYPTO_DEV_VMX_ENCRYPT
 	tristate "Encryption acceleration support on P8 CPU"
 	depends on CRYPTO_DEV_VMX
 	select CRYPTO_AES
 	select CRYPTO_CBC
 	select CRYPTO_CTR
-	select CRYPTO_GHASH
 	select CRYPTO_XTS
 	default m
 	help
 	  Support for VMX cryptographic acceleration instructions on Power8 CPU.
-	  This module supports acceleration for AES and GHASH in hardware. If you
-	  choose 'M' here, this module will be called vmx-crypto.
+	  This module supports acceleration for AES in hardware. If you choose
+	  'M' here, this module will be called vmx-crypto.
 
 endmenu
diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index 3ac0886282a2..a1fe102a90ae 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -9,11 +9,11 @@ obj-$(CONFIG_CRYPTO_AES_PPC_SPE) += aes-ppc-spe.o
 obj-$(CONFIG_CRYPTO_AES_GCM_P10) += aes-gcm-p10-crypto.o
 obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o
 
 aes-ppc-spe-y := aes-spe-glue.o
 aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-ppc.o
-vmx-crypto-objs := vmx.o ghashp8-ppc.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o
+vmx-crypto-objs := vmx.o aes_cbc.o aes_ctr.o aes_xts.o
 
 ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
 override flavour := linux-ppc64le
 else
 ifdef CONFIG_PPC64_ELF_ABI_V2
@@ -24,16 +24,12 @@ endif
 endif
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $< $(flavour) > $@
 
-targets += aesp10-ppc.S ghashp10-ppc.S ghashp8-ppc.S
+targets += aesp10-ppc.S ghashp10-ppc.S
 
 $(obj)/aesp10-ppc.S $(obj)/ghashp10-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE
 	$(call if_changed,perl)
 
-$(obj)/ghashp8-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE
-	$(call if_changed,perl)
-
 OBJECT_FILES_NON_STANDARD_aesp10-ppc.o := y
 OBJECT_FILES_NON_STANDARD_ghashp10-ppc.o := y
-OBJECT_FILES_NON_STANDARD_ghashp8-ppc.o := y
diff --git a/arch/powerpc/crypto/aesp8-ppc.h b/arch/powerpc/crypto/aesp8-ppc.h
index 6862c605cc33..c68f5b6965fa 100644
--- a/arch/powerpc/crypto/aesp8-ppc.h
+++ b/arch/powerpc/crypto/aesp8-ppc.h
@@ -1,8 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/types.h>
 #include <crypto/aes.h>
 
-extern struct shash_alg p8_ghash_alg;
 extern struct skcipher_alg p8_aes_cbc_alg;
 extern struct skcipher_alg p8_aes_ctr_alg;
 extern struct skcipher_alg p8_aes_xts_alg;
diff --git a/arch/powerpc/crypto/ghash.c b/arch/powerpc/crypto/ghash.c
deleted file mode 100644
index 7308735bdb33..000000000000
--- a/arch/powerpc/crypto/ghash.c
+++ /dev/null
@@ -1,160 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GHASH routines supporting VMX instructions on the Power 8
- *
- * Copyright (C) 2015, 2019 International Business Machines Inc.
- *
- * Author: Marcelo Henrique Cerri <mhcerri@br.ibm.com>
- *
- * Extended by Daniel Axtens <dja@axtens.net> to replace the fallback
- * mechanism. The new approach is based on arm64 code, which is:
- *   Copyright (C) 2014 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
- */
-
-#include "aesp8-ppc.h"
-#include <asm/switch_to.h>
-#include <crypto/aes.h>
-#include <crypto/gf128mul.h>
-#include <crypto/ghash.h>
-#include <crypto/internal/hash.h>
-#include <crypto/internal/simd.h>
-#include <linux/err.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-#include <linux/uaccess.h>
-
-void gcm_init_p8(u128 htable[16], const u64 Xi[2]);
-void gcm_gmult_p8(u64 Xi[2], const u128 htable[16]);
-void gcm_ghash_p8(u64 Xi[2], const u128 htable[16],
-		  const u8 *in, size_t len);
-
-struct p8_ghash_ctx {
-	/* key used by vector asm */
-	u128 htable[16];
-	/* key used by software fallback */
-	be128 key;
-};
-
-struct p8_ghash_desc_ctx {
-	u64 shash[2];
-};
-
-static int p8_ghash_init(struct shash_desc *desc)
-{
-	struct p8_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memset(dctx->shash, 0, GHASH_DIGEST_SIZE);
-	return 0;
-}
-
-static int p8_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
-			   unsigned int keylen)
-{
-	struct p8_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm));
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	preempt_disable();
-	pagefault_disable();
-	enable_kernel_vsx();
-	gcm_init_p8(ctx->htable, (const u64 *) key);
-	disable_kernel_vsx();
-	pagefault_enable();
-	preempt_enable();
-
-	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
-
-	return 0;
-}
-
-static inline void __ghash_block(struct p8_ghash_ctx *ctx,
-				 struct p8_ghash_desc_ctx *dctx,
-				 const u8 *src)
-{
-	if (crypto_simd_usable()) {
-		preempt_disable();
-		pagefault_disable();
-		enable_kernel_vsx();
-		gcm_ghash_p8(dctx->shash, ctx->htable, src, GHASH_BLOCK_SIZE);
-		disable_kernel_vsx();
-		pagefault_enable();
-		preempt_enable();
-	} else {
-		crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE);
-		gf128mul_lle((be128 *)dctx->shash, &ctx->key);
-	}
-}
-
-static inline int __ghash_blocks(struct p8_ghash_ctx *ctx,
-				 struct p8_ghash_desc_ctx *dctx,
-				 const u8 *src, unsigned int srclen)
-{
-	int remain = srclen - round_down(srclen, GHASH_BLOCK_SIZE);
-
-	srclen -= remain;
-	if (crypto_simd_usable()) {
-		preempt_disable();
-		pagefault_disable();
-		enable_kernel_vsx();
-		gcm_ghash_p8(dctx->shash, ctx->htable,
-				src, srclen);
-		disable_kernel_vsx();
-		pagefault_enable();
-		preempt_enable();
-	} else {
-		do {
-			crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE);
-			gf128mul_lle((be128 *)dctx->shash, &ctx->key);
-			srclen -= GHASH_BLOCK_SIZE;
-			src += GHASH_BLOCK_SIZE;
-		} while (srclen);
-	}
-
-	return remain;
-}
-
-static int p8_ghash_update(struct shash_desc *desc,
-			   const u8 *src, unsigned int srclen)
-{
-	struct p8_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
-	struct p8_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	return __ghash_blocks(ctx, dctx, src, srclen);
-}
-
-static int p8_ghash_finup(struct shash_desc *desc, const u8 *src,
-			  unsigned int len, u8 *out)
-{
-	struct p8_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
-	struct p8_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	if (len) {
-		u8 buf[GHASH_BLOCK_SIZE] = {};
-
-		memcpy(buf, src, len);
-		__ghash_block(ctx, dctx, buf);
-		memzero_explicit(buf, sizeof(buf));
-	}
-	memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);
-	return 0;
-}
-
-struct shash_alg p8_ghash_alg = {
-	.digestsize = GHASH_DIGEST_SIZE,
-	.init = p8_ghash_init,
-	.update = p8_ghash_update,
-	.finup = p8_ghash_finup,
-	.setkey = p8_ghash_setkey,
-	.descsize = sizeof(struct p8_ghash_desc_ctx),
-	.base = {
-		 .cra_name = "ghash",
-		 .cra_driver_name = "p8_ghash",
-		 .cra_priority = 1000,
-		 .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		 .cra_blocksize = GHASH_BLOCK_SIZE,
-		 .cra_ctxsize = sizeof(struct p8_ghash_ctx),
-		 .cra_module = THIS_MODULE,
-	},
-};
diff --git a/arch/powerpc/crypto/vmx.c b/arch/powerpc/crypto/vmx.c
index 7d2beb774f99..08da5311dfdf 100644
--- a/arch/powerpc/crypto/vmx.c
+++ b/arch/powerpc/crypto/vmx.c
@@ -12,26 +12,21 @@
 #include <linux/types.h>
 #include <linux/err.h>
 #include <linux/cpufeature.h>
 #include <linux/crypto.h>
 #include <asm/cputable.h>
-#include <crypto/internal/hash.h>
 #include <crypto/internal/skcipher.h>
 
 #include "aesp8-ppc.h"
 
 static int __init p8_init(void)
 {
 	int ret;
 
-	ret = crypto_register_shash(&p8_ghash_alg);
-	if (ret)
-		goto err;
-
 	ret = crypto_register_skcipher(&p8_aes_cbc_alg);
 	if (ret)
-		goto err_unregister_ghash;
+		goto err;
 
 	ret = crypto_register_skcipher(&p8_aes_ctr_alg);
 	if (ret)
 		goto err_unregister_aes_cbc;
 
@@ -43,22 +38,19 @@ static int __init p8_init(void)
 
 err_unregister_aes_ctr:
 	crypto_unregister_skcipher(&p8_aes_ctr_alg);
 err_unregister_aes_cbc:
 	crypto_unregister_skcipher(&p8_aes_cbc_alg);
-err_unregister_ghash:
-	crypto_unregister_shash(&p8_ghash_alg);
 err:
 	return ret;
 }
 
 static void __exit p8_exit(void)
 {
 	crypto_unregister_skcipher(&p8_aes_xts_alg);
 	crypto_unregister_skcipher(&p8_aes_ctr_alg);
 	crypto_unregister_skcipher(&p8_aes_cbc_alg);
-	crypto_unregister_shash(&p8_ghash_alg);
 }
 
 module_cpu_feature_match(PPC_MODULE_FEATURE_VEC_CRYPTO, p8_init);
 module_exit(p8_exit);
 
diff --git a/include/crypto/gf128hash.h b/include/crypto/gf128hash.h
index 5090fbaa87f8..650652dd6003 100644
--- a/include/crypto/gf128hash.h
+++ b/include/crypto/gf128hash.h
@@ -39,10 +39,14 @@ struct polyval_elem {
  * struct ghash_key - Prepared key for GHASH
  *
  * Use ghash_preparekey() to initialize this.
  */
 struct ghash_key {
+#if defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && defined(CONFIG_PPC64)
+	/** @htable: GHASH key format used by the POWER8 assembly code */
+	u64 htable[4][2];
+#endif
 	/** @h: The hash key H, in POLYVAL format */
 	struct polyval_elem h;
 };
 
 /**
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 4f1a79883a56..f54add7d9070 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -119,10 +119,11 @@ config CRYPTO_LIB_GF128HASH
 config CRYPTO_LIB_GF128HASH_ARCH
 	bool
 	depends on CRYPTO_LIB_GF128HASH && !UML
 	default y if ARM && KERNEL_MODE_NEON
 	default y if ARM64
+	default y if PPC64 && VSX
 	default y if X86_64
 
 config CRYPTO_LIB_MD5
 	tristate
 	help
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 4ce0bac8fd93..8a9084188778 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -6,10 +6,14 @@ quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) > $(@)
 
 quiet_cmd_perlasm_with_args = PERLASM $@
       cmd_perlasm_with_args = $(PERL) $(<) void $(@)
 
+ppc64-perlasm-flavour-y := linux-ppc64
+ppc64-perlasm-flavour-$(CONFIG_PPC64_ELF_ABI_V2) := linux-ppc64-elfv2
+ppc64-perlasm-flavour-$(CONFIG_CPU_LITTLE_ENDIAN) := linux-ppc64le
+
 obj-$(CONFIG_KUNIT)				+= tests/
 
 obj-$(CONFIG_CRYPTO_HASH_INFO)			+= hash_info.o
 
 obj-$(CONFIG_CRYPTO_LIB_UTILS)			+= libcryptoutils.o
@@ -34,15 +38,12 @@ libaes-y += powerpc/aes-spe-core.o \
 	    powerpc/aes-spe-keys.o \
 	    powerpc/aes-spe-modes.o \
 	    powerpc/aes-tab-4k.o
 else
 libaes-y += powerpc/aesp8-ppc.o
-aes-perlasm-flavour-y := linux-ppc64
-aes-perlasm-flavour-$(CONFIG_PPC64_ELF_ABI_V2) := linux-ppc64-elfv2
-aes-perlasm-flavour-$(CONFIG_CPU_LITTLE_ENDIAN) := linux-ppc64le
 quiet_cmd_perlasm_aes = PERLASM $@
-      cmd_perlasm_aes = $(PERL) $< $(aes-perlasm-flavour-y) $@
+      cmd_perlasm_aes = $(PERL) $< $(ppc64-perlasm-flavour-y) $@
 # Use if_changed instead of cmd, in case the flavour changed.
 $(obj)/powerpc/aesp8-ppc.S: $(src)/powerpc/aesp8-ppc.pl FORCE
 	$(call if_changed,perlasm_aes)
 targets += powerpc/aesp8-ppc.S
 OBJECT_FILES_NON_STANDARD_powerpc/aesp8-ppc.o := y
@@ -159,13 +160,27 @@ libgf128hash-y := gf128hash.o
 ifeq ($(CONFIG_CRYPTO_LIB_GF128HASH_ARCH),y)
 CFLAGS_gf128hash.o += -I$(src)/$(SRCARCH)
 libgf128hash-$(CONFIG_ARM) += arm/ghash-neon-core.o
 libgf128hash-$(CONFIG_ARM64) += arm64/ghash-neon-core.o \
 				arm64/polyval-ce-core.o
-libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
+
+ifeq ($(CONFIG_PPC),y)
+libgf128hash-y += powerpc/ghashp8-ppc.o
+quiet_cmd_perlasm_ghash = PERLASM $@
+      cmd_perlasm_ghash = $(PERL) $< $(ppc64-perlasm-flavour-y) $@
+$(obj)/powerpc/ghashp8-ppc.S: $(src)/powerpc/ghashp8-ppc.pl FORCE
+	$(call if_changed,perlasm_ghash)
+targets += powerpc/ghashp8-ppc.S
+OBJECT_FILES_NON_STANDARD_powerpc/ghashp8-ppc.o := y
 endif
 
+libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
+endif # CONFIG_CRYPTO_LIB_GF128HASH_ARCH
+
+# clean-files must be defined unconditionally
+clean-files += powerpc/ghashp8-ppc.S
+
 ################################################################################
 
 obj-$(CONFIG_CRYPTO_LIB_MD5) += libmd5.o
 libmd5-y := md5.o
 ifeq ($(CONFIG_CRYPTO_LIB_MD5_ARCH),y)
diff --git a/lib/crypto/powerpc/.gitignore b/lib/crypto/powerpc/.gitignore
index 598ca7aff6b1..7aa71d83f739 100644
--- a/lib/crypto/powerpc/.gitignore
+++ b/lib/crypto/powerpc/.gitignore
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
 aesp8-ppc.S
+ghashp8-ppc.S
diff --git a/lib/crypto/powerpc/gf128hash.h b/lib/crypto/powerpc/gf128hash.h
new file mode 100644
index 000000000000..629cd325d0c7
--- /dev/null
+++ b/lib/crypto/powerpc/gf128hash.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * GHASH routines supporting VMX instructions on the Power 8
+ *
+ * Copyright (C) 2015, 2019 International Business Machines Inc.
+ * Copyright (C) 2014 - 2018 Linaro Ltd.
+ * Copyright 2026 Google LLC
+ */
+
+#include <asm/simd.h>
+#include <asm/switch_to.h>
+#include <linux/cpufeature.h>
+#include <linux/jump_label.h>
+#include <linux/preempt.h>
+#include <linux/uaccess.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_vec_crypto);
+
+void gcm_init_p8(u64 htable[4][2], const u8 h[16]);
+void gcm_gmult_p8(u8 Xi[16], const u64 htable[4][2]);
+void gcm_ghash_p8(u8 Xi[16], const u64 htable[4][2], const u8 *in, size_t len);
+
+#define ghash_preparekey_arch ghash_preparekey_arch
+static void ghash_preparekey_arch(struct ghash_key *key,
+				  const u8 raw_key[GHASH_BLOCK_SIZE])
+{
+	ghash_key_to_polyval(raw_key, &key->h);
+
+	if (static_branch_likely(&have_vec_crypto) && likely(may_use_simd())) {
+		preempt_disable();
+		pagefault_disable();
+		enable_kernel_vsx();
+		gcm_init_p8(key->htable, raw_key);
+		disable_kernel_vsx();
+		pagefault_enable();
+		preempt_enable();
+	} else {
+		/* This reproduces gcm_init_p8() on both LE and BE systems. */
+		key->htable[0][0] = 0;
+		key->htable[0][1] = 0xc200000000000000;
+
+		key->htable[1][0] = 0;
+		key->htable[1][1] = le64_to_cpu(key->h.lo);
+
+		key->htable[2][0] = le64_to_cpu(key->h.lo);
+		key->htable[2][1] = le64_to_cpu(key->h.hi);
+
+		key->htable[3][0] = le64_to_cpu(key->h.hi);
+		key->htable[3][1] = 0;
+	}
+}
+
+#define ghash_mul_arch ghash_mul_arch
+static void ghash_mul_arch(struct polyval_elem *acc,
+			   const struct ghash_key *key)
+{
+	if (static_branch_likely(&have_vec_crypto) && likely(may_use_simd())) {
+		u8 ghash_acc[GHASH_BLOCK_SIZE];
+
+		polyval_acc_to_ghash(acc, ghash_acc);
+
+		preempt_disable();
+		pagefault_disable();
+		enable_kernel_vsx();
+		gcm_gmult_p8(ghash_acc, key->htable);
+		disable_kernel_vsx();
+		pagefault_enable();
+		preempt_enable();
+
+		ghash_acc_to_polyval(ghash_acc, acc);
+		memzero_explicit(ghash_acc, sizeof(ghash_acc));
+	} else {
+		polyval_mul_generic(acc, &key->h);
+	}
+}
+
+#define ghash_blocks_arch ghash_blocks_arch
+static void ghash_blocks_arch(struct polyval_elem *acc,
+			      const struct ghash_key *key,
+			      const u8 *data, size_t nblocks)
+{
+	if (static_branch_likely(&have_vec_crypto) && likely(may_use_simd())) {
+		u8 ghash_acc[GHASH_BLOCK_SIZE];
+
+		polyval_acc_to_ghash(acc, ghash_acc);
+
+		preempt_disable();
+		pagefault_disable();
+		enable_kernel_vsx();
+		gcm_ghash_p8(ghash_acc, key->htable, data,
+			     nblocks * GHASH_BLOCK_SIZE);
+		disable_kernel_vsx();
+		pagefault_enable();
+		preempt_enable();
+
+		ghash_acc_to_polyval(ghash_acc, acc);
+		memzero_explicit(ghash_acc, sizeof(ghash_acc));
+	} else {
+		ghash_blocks_generic(acc, &key->h, data, nblocks);
+	}
+}
+
+#define gf128hash_mod_init_arch gf128hash_mod_init_arch
+static void gf128hash_mod_init_arch(void)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+	    (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_VEC_CRYPTO))
+		static_branch_enable(&have_vec_crypto);
+}
diff --git a/arch/powerpc/crypto/ghashp8-ppc.pl b/lib/crypto/powerpc/ghashp8-ppc.pl
similarity index 98%
rename from arch/powerpc/crypto/ghashp8-ppc.pl
rename to lib/crypto/powerpc/ghashp8-ppc.pl
index 041e633c214f..7c38eedc02cc 100644
--- a/arch/powerpc/crypto/ghashp8-ppc.pl
+++ b/lib/crypto/powerpc/ghashp8-ppc.pl
@@ -45,10 +45,11 @@ if ($flavour =~ /64/) {
 } else { die "nonsense $flavour"; }
 
 $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
 ( $xlate="${dir}ppc-xlate.pl" and -f $xlate ) or
 ( $xlate="${dir}../../perlasm/ppc-xlate.pl" and -f $xlate) or
+( $xlate="${dir}../../../arch/powerpc/crypto/ppc-xlate.pl" and -f $xlate) or
 die "can't locate ppc-xlate.pl";
 
 open STDOUT,"| $^X $xlate $flavour $output" || die "can't call $xlate: $!";
 
 my ($Xip,$Htbl,$inp,$len)=map("r$_",(3..6));	# argument block
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/19] lib/crypto: riscv/ghash: Migrate optimized code into library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (10 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 11/19] lib/crypto: powerpc/ghash: Migrate optimized code into library Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 13/19] lib/crypto: s390/ghash: " Eric Biggers
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the "ghash-riscv64-zvkg" crypto_shash algorithm.  Move the
corresponding assembly code into lib/crypto/, modify it to take the
length in blocks instead of bytes, and wire it up to the GHASH library.

This makes the GHASH library be optimized with the RISC-V Vector
Cryptography Extension.  It also greatly reduces the amount of
riscv-specific glue code that is needed, and it fixes the issue where
this optimized GHASH code was disabled by default.

Note that this RISC-V code has multiple opportunities for improvement,
such as adding more parallelism, providing an optimized multiplication
function, and directly supporting POLYVAL.  But for now, this commit
simply tweaks ghash_zvkg() slightly to make it compatible with the
library, then wires it up to ghash_blocks_arch().

ghash_preparekey_arch() is also implemented to store the copy of the raw
key needed by the vghsh.vv instruction.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/riscv/crypto/Kconfig                     |  11 --
 arch/riscv/crypto/Makefile                    |   3 -
 arch/riscv/crypto/ghash-riscv64-glue.c        | 146 ------------------
 include/crypto/gf128hash.h                    |   3 +
 lib/crypto/Kconfig                            |   2 +
 lib/crypto/Makefile                           |   1 +
 lib/crypto/riscv/gf128hash.h                  |  57 +++++++
 .../crypto/riscv}/ghash-riscv64-zvkg.S        |  13 +-
 8 files changed, 69 insertions(+), 167 deletions(-)
 delete mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
 create mode 100644 lib/crypto/riscv/gf128hash.h
 rename {arch/riscv/crypto => lib/crypto/riscv}/ghash-riscv64-zvkg.S (91%)

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 22d4eaab15f3..c208f54afbcd 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -15,21 +15,10 @@ config CRYPTO_AES_RISCV64
 	  - Zvkned vector crypto extension
 	  - Zvbb vector extension (XTS)
 	  - Zvkb vector crypto extension (CTR)
 	  - Zvkg vector crypto extension (XTS)
 
-config CRYPTO_GHASH_RISCV64
-	tristate "Hash functions: GHASH"
-	depends on 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
-		   RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
-	select CRYPTO_GCM
-	help
-	  GCM GHASH function (NIST SP 800-38D)
-
-	  Architecture: riscv64 using:
-	  - Zvkg vector crypto extension
-
 config CRYPTO_SM3_RISCV64
 	tristate "Hash functions: SM3 (ShangMi 3)"
 	depends on 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
 		   RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
 	select CRYPTO_HASH
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 183495a95cc0..5c9ee1b876fa 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -2,13 +2,10 @@
 
 obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
 aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o \
 		 aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o
 
-obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
-ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
-
 obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
 sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh-zvkb.o
 
 obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
 sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed-zvkb.o
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
deleted file mode 100644
index d86073d25387..000000000000
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ /dev/null
@@ -1,146 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * GHASH using the RISC-V vector crypto extensions
- *
- * Copyright (C) 2023 VRULL GmbH
- * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
- *
- * Copyright (C) 2023 SiFive, Inc.
- * Author: Jerry Shih <jerry.shih@sifive.com>
- */
-
-#include <asm/simd.h>
-#include <asm/vector.h>
-#include <crypto/b128ops.h>
-#include <crypto/gf128mul.h>
-#include <crypto/ghash.h>
-#include <crypto/internal/hash.h>
-#include <crypto/internal/simd.h>
-#include <crypto/utils.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-asmlinkage void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data,
-			   size_t len);
-
-struct riscv64_ghash_tfm_ctx {
-	be128 key;
-};
-
-struct riscv64_ghash_desc_ctx {
-	be128 accumulator;
-};
-
-static int riscv64_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
-				unsigned int keylen)
-{
-	struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(tfm);
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	memcpy(&tctx->key, key, GHASH_BLOCK_SIZE);
-
-	return 0;
-}
-
-static int riscv64_ghash_init(struct shash_desc *desc)
-{
-	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	*dctx = (struct riscv64_ghash_desc_ctx){};
-
-	return 0;
-}
-
-static inline void
-riscv64_ghash_blocks(const struct riscv64_ghash_tfm_ctx *tctx,
-		     struct riscv64_ghash_desc_ctx *dctx,
-		     const u8 *src, size_t srclen)
-{
-	/* The srclen is nonzero and a multiple of 16. */
-	if (crypto_simd_usable()) {
-		kernel_vector_begin();
-		ghash_zvkg(&dctx->accumulator, &tctx->key, src, srclen);
-		kernel_vector_end();
-	} else {
-		do {
-			crypto_xor((u8 *)&dctx->accumulator, src,
-				   GHASH_BLOCK_SIZE);
-			gf128mul_lle(&dctx->accumulator, &tctx->key);
-			src += GHASH_BLOCK_SIZE;
-			srclen -= GHASH_BLOCK_SIZE;
-		} while (srclen);
-	}
-}
-
-static int riscv64_ghash_update(struct shash_desc *desc, const u8 *src,
-				unsigned int srclen)
-{
-	const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
-	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	riscv64_ghash_blocks(tctx, dctx, src,
-			     round_down(srclen, GHASH_BLOCK_SIZE));
-	return srclen - round_down(srclen, GHASH_BLOCK_SIZE);
-}
-
-static int riscv64_ghash_finup(struct shash_desc *desc, const u8 *src,
-			       unsigned int len, u8 *out)
-{
-	const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
-	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	if (len) {
-		u8 buf[GHASH_BLOCK_SIZE] = {};
-
-		memcpy(buf, src, len);
-		riscv64_ghash_blocks(tctx, dctx, buf, GHASH_BLOCK_SIZE);
-		memzero_explicit(buf, sizeof(buf));
-	}
-
-	memcpy(out, &dctx->accumulator, GHASH_DIGEST_SIZE);
-	return 0;
-}
-
-static struct shash_alg riscv64_ghash_alg = {
-	.init = riscv64_ghash_init,
-	.update = riscv64_ghash_update,
-	.finup = riscv64_ghash_finup,
-	.setkey = riscv64_ghash_setkey,
-	.descsize = sizeof(struct riscv64_ghash_desc_ctx),
-	.digestsize = GHASH_DIGEST_SIZE,
-	.base = {
-		.cra_blocksize = GHASH_BLOCK_SIZE,
-		.cra_ctxsize = sizeof(struct riscv64_ghash_tfm_ctx),
-		.cra_priority = 300,
-		.cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_name = "ghash",
-		.cra_driver_name = "ghash-riscv64-zvkg",
-		.cra_module = THIS_MODULE,
-	},
-};
-
-static int __init riscv64_ghash_mod_init(void)
-{
-	if (riscv_isa_extension_available(NULL, ZVKG) &&
-	    riscv_vector_vlen() >= 128)
-		return crypto_register_shash(&riscv64_ghash_alg);
-
-	return -ENODEV;
-}
-
-static void __exit riscv64_ghash_mod_exit(void)
-{
-	crypto_unregister_shash(&riscv64_ghash_alg);
-}
-
-module_init(riscv64_ghash_mod_init);
-module_exit(riscv64_ghash_mod_exit);
-
-MODULE_DESCRIPTION("GHASH (RISC-V accelerated)");
-MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_CRYPTO("ghash");
diff --git a/include/crypto/gf128hash.h b/include/crypto/gf128hash.h
index 650652dd6003..b798438cce23 100644
--- a/include/crypto/gf128hash.h
+++ b/include/crypto/gf128hash.h
@@ -42,10 +42,13 @@ struct polyval_elem {
  */
 struct ghash_key {
 #if defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && defined(CONFIG_PPC64)
 	/** @htable: GHASH key format used by the POWER8 assembly code */
 	u64 htable[4][2];
+#elif defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && defined(CONFIG_RISCV)
+	/** @h_raw: The hash key H, in GHASH format */
+	u8 h_raw[GHASH_BLOCK_SIZE];
 #endif
 	/** @h: The hash key H, in POLYVAL format */
 	struct polyval_elem h;
 };
 
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index f54add7d9070..027802e0de33 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -120,10 +120,12 @@ config CRYPTO_LIB_GF128HASH_ARCH
 	bool
 	depends on CRYPTO_LIB_GF128HASH && !UML
 	default y if ARM && KERNEL_MODE_NEON
 	default y if ARM64
 	default y if PPC64 && VSX
+	default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+		     RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
 	default y if X86_64
 
 config CRYPTO_LIB_MD5
 	tristate
 	help
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 8a9084188778..8950509833af 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -171,10 +171,11 @@ $(obj)/powerpc/ghashp8-ppc.S: $(src)/powerpc/ghashp8-ppc.pl FORCE
 	$(call if_changed,perlasm_ghash)
 targets += powerpc/ghashp8-ppc.S
 OBJECT_FILES_NON_STANDARD_powerpc/ghashp8-ppc.o := y
 endif
 
+libgf128hash-$(CONFIG_RISCV) += riscv/ghash-riscv64-zvkg.o
 libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
 endif # CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 
 # clean-files must be defined unconditionally
 clean-files += powerpc/ghashp8-ppc.S
diff --git a/lib/crypto/riscv/gf128hash.h b/lib/crypto/riscv/gf128hash.h
new file mode 100644
index 000000000000..4301a0384f60
--- /dev/null
+++ b/lib/crypto/riscv/gf128hash.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * GHASH, RISC-V optimized
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Copyright (C) 2023 SiFive, Inc.
+ * Copyright 2026 Google LLC
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_zvkg);
+
+asmlinkage void ghash_zvkg(u8 accumulator[GHASH_BLOCK_SIZE],
+			   const u8 key[GHASH_BLOCK_SIZE],
+			   const u8 *data, size_t nblocks);
+
+#define ghash_preparekey_arch ghash_preparekey_arch
+static void ghash_preparekey_arch(struct ghash_key *key,
+				  const u8 raw_key[GHASH_BLOCK_SIZE])
+{
+	/* Save key in POLYVAL format for fallback */
+	ghash_key_to_polyval(raw_key, &key->h);
+
+	/* Save key in GHASH format for zvkg */
+	memcpy(key->h_raw, raw_key, GHASH_BLOCK_SIZE);
+}
+
+#define ghash_blocks_arch ghash_blocks_arch
+static void ghash_blocks_arch(struct polyval_elem *acc,
+			      const struct ghash_key *key,
+			      const u8 *data, size_t nblocks)
+{
+	if (static_branch_likely(&have_zvkg) && likely(may_use_simd())) {
+		u8 ghash_acc[GHASH_BLOCK_SIZE];
+
+		polyval_acc_to_ghash(acc, ghash_acc);
+
+		kernel_vector_begin();
+		ghash_zvkg(ghash_acc, key->h_raw, data, nblocks);
+		kernel_vector_end();
+
+		ghash_acc_to_polyval(ghash_acc, acc);
+		memzero_explicit(ghash_acc, sizeof(ghash_acc));
+	} else {
+		ghash_blocks_generic(acc, &key->h, data, nblocks);
+	}
+}
+
+#define gf128hash_mod_init_arch gf128hash_mod_init_arch
+static void gf128hash_mod_init_arch(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKG) &&
+	    riscv_vector_vlen() >= 128)
+		static_branch_enable(&have_zvkg);
+}
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.S b/lib/crypto/riscv/ghash-riscv64-zvkg.S
similarity index 91%
rename from arch/riscv/crypto/ghash-riscv64-zvkg.S
rename to lib/crypto/riscv/ghash-riscv64-zvkg.S
index f2b43fb4d434..2839ff1a990c 100644
--- a/arch/riscv/crypto/ghash-riscv64-zvkg.S
+++ b/lib/crypto/riscv/ghash-riscv64-zvkg.S
@@ -48,25 +48,24 @@
 .option arch, +zvkg
 
 #define ACCUMULATOR	a0
 #define KEY		a1
 #define DATA		a2
-#define LEN		a3
+#define NBLOCKS		a3
 
-// void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data,
-//		   size_t len);
-//
-// |len| must be nonzero and a multiple of 16 (GHASH_BLOCK_SIZE).
+// void ghash_zvkg(u8 accumulator[GHASH_BLOCK_SIZE],
+//		   const u8 key[GHASH_BLOCK_SIZE],
+//		   const u8 *data, size_t nblocks);
 SYM_FUNC_START(ghash_zvkg)
 	vsetivli	zero, 4, e32, m1, ta, ma
 	vle32.v		v1, (ACCUMULATOR)
 	vle32.v		v2, (KEY)
 .Lnext_block:
 	vle32.v		v3, (DATA)
 	vghsh.vv	v1, v2, v3
 	addi		DATA, DATA, 16
-	addi		LEN, LEN, -16
-	bnez		LEN, .Lnext_block
+	addi		NBLOCKS, NBLOCKS, -1
+	bnez		NBLOCKS, .Lnext_block
 
 	vse32.v		v1, (ACCUMULATOR)
 	ret
 SYM_FUNC_END(ghash_zvkg)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/19] lib/crypto: s390/ghash: Migrate optimized code into library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (11 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 12/19] lib/crypto: riscv/ghash: " Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 14/19] lib/crypto: x86/ghash: " Eric Biggers
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the "ghash-s390" crypto_shash algorithm, and replace it with an
implementation of ghash_blocks_arch() for the GHASH library.

This makes the GHASH library be optimized with CPACF.  It also greatly
reduces the amount of s390-specific glue code that is needed, and it
fixes the issue where this GHASH optimization was disabled by default.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/s390/configs/debug_defconfig |   1 -
 arch/s390/configs/defconfig       |   1 -
 arch/s390/crypto/Kconfig          |  10 ---
 arch/s390/crypto/Makefile         |   1 -
 arch/s390/crypto/ghash_s390.c     | 144 ------------------------------
 include/crypto/gf128hash.h        |   3 +-
 lib/crypto/Kconfig                |   1 +
 lib/crypto/s390/gf128hash.h       |  54 +++++++++++
 8 files changed, 57 insertions(+), 158 deletions(-)
 delete mode 100644 arch/s390/crypto/ghash_s390.c
 create mode 100644 lib/crypto/s390/gf128hash.h

diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig
index 98fd0a2f51c6..aa862d4fcc68 100644
--- a/arch/s390/configs/debug_defconfig
+++ b/arch/s390/configs/debug_defconfig
@@ -807,11 +807,10 @@ CONFIG_CRYPTO_LZ4HC=m
 CONFIG_CRYPTO_ZSTD=m
 CONFIG_CRYPTO_USER_API_HASH=m
 CONFIG_CRYPTO_USER_API_SKCIPHER=m
 CONFIG_CRYPTO_USER_API_RNG=m
 CONFIG_CRYPTO_USER_API_AEAD=m
-CONFIG_CRYPTO_GHASH_S390=m
 CONFIG_CRYPTO_AES_S390=m
 CONFIG_CRYPTO_DES_S390=m
 CONFIG_CRYPTO_HMAC_S390=m
 CONFIG_ZCRYPT=m
 CONFIG_PKEY=m
diff --git a/arch/s390/configs/defconfig b/arch/s390/configs/defconfig
index 0f4cedcab3ce..74f943307c46 100644
--- a/arch/s390/configs/defconfig
+++ b/arch/s390/configs/defconfig
@@ -792,11 +792,10 @@ CONFIG_CRYPTO_ZSTD=m
 CONFIG_CRYPTO_JITTERENTROPY_OSR=1
 CONFIG_CRYPTO_USER_API_HASH=m
 CONFIG_CRYPTO_USER_API_SKCIPHER=m
 CONFIG_CRYPTO_USER_API_RNG=m
 CONFIG_CRYPTO_USER_API_AEAD=m
-CONFIG_CRYPTO_GHASH_S390=m
 CONFIG_CRYPTO_AES_S390=m
 CONFIG_CRYPTO_DES_S390=m
 CONFIG_CRYPTO_HMAC_S390=m
 CONFIG_ZCRYPT=m
 CONFIG_PKEY=m
diff --git a/arch/s390/crypto/Kconfig b/arch/s390/crypto/Kconfig
index 79a2d0034258..ee83052dbc15 100644
--- a/arch/s390/crypto/Kconfig
+++ b/arch/s390/crypto/Kconfig
@@ -1,19 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
 menu "Accelerated Cryptographic Algorithms for CPU (s390)"
 
-config CRYPTO_GHASH_S390
-	tristate "Hash functions: GHASH"
-	select CRYPTO_HASH
-	help
-	  GCM GHASH hash function (NIST SP800-38D)
-
-	  Architecture: s390
-
-	  It is available as of z196.
-
 config CRYPTO_AES_S390
 	tristate "Ciphers: AES, modes: ECB, CBC, CTR, XTS, GCM"
 	select CRYPTO_SKCIPHER
 	help
 	  AEAD cipher: AES with GCM
diff --git a/arch/s390/crypto/Makefile b/arch/s390/crypto/Makefile
index 387a229e1038..4449c1b19ef5 100644
--- a/arch/s390/crypto/Makefile
+++ b/arch/s390/crypto/Makefile
@@ -5,9 +5,8 @@
 
 obj-$(CONFIG_CRYPTO_DES_S390) += des_s390.o
 obj-$(CONFIG_CRYPTO_AES_S390) += aes_s390.o
 obj-$(CONFIG_CRYPTO_PAES_S390) += paes_s390.o
 obj-$(CONFIG_S390_PRNG) += prng.o
-obj-$(CONFIG_CRYPTO_GHASH_S390) += ghash_s390.o
 obj-$(CONFIG_CRYPTO_HMAC_S390) += hmac_s390.o
 obj-$(CONFIG_CRYPTO_PHMAC_S390) += phmac_s390.o
 obj-y += arch_random.o
diff --git a/arch/s390/crypto/ghash_s390.c b/arch/s390/crypto/ghash_s390.c
deleted file mode 100644
index dcbcee37cb63..000000000000
--- a/arch/s390/crypto/ghash_s390.c
+++ /dev/null
@@ -1,144 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Cryptographic API.
- *
- * s390 implementation of the GHASH algorithm for GCM (Galois/Counter Mode).
- *
- * Copyright IBM Corp. 2011
- * Author(s): Gerald Schaefer <gerald.schaefer@de.ibm.com>
- */
-
-#include <asm/cpacf.h>
-#include <crypto/ghash.h>
-#include <crypto/internal/hash.h>
-#include <linux/cpufeature.h>
-#include <linux/err.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-struct s390_ghash_ctx {
-	u8 key[GHASH_BLOCK_SIZE];
-};
-
-struct s390_ghash_desc_ctx {
-	u8 icv[GHASH_BLOCK_SIZE];
-	u8 key[GHASH_BLOCK_SIZE];
-};
-
-static int ghash_init(struct shash_desc *desc)
-{
-	struct s390_ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
-	struct s390_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memset(dctx, 0, sizeof(*dctx));
-	memcpy(dctx->key, ctx->key, GHASH_BLOCK_SIZE);
-
-	return 0;
-}
-
-static int ghash_setkey(struct crypto_shash *tfm,
-			const u8 *key, unsigned int keylen)
-{
-	struct s390_ghash_ctx *ctx = crypto_shash_ctx(tfm);
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	memcpy(ctx->key, key, GHASH_BLOCK_SIZE);
-
-	return 0;
-}
-
-static int ghash_update(struct shash_desc *desc,
-			 const u8 *src, unsigned int srclen)
-{
-	struct s390_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-	unsigned int n;
-
-	n = srclen & ~(GHASH_BLOCK_SIZE - 1);
-	cpacf_kimd(CPACF_KIMD_GHASH, dctx, src, n);
-	return srclen - n;
-}
-
-static void ghash_flush(struct s390_ghash_desc_ctx *dctx, const u8 *src,
-			unsigned int len)
-{
-	if (len) {
-		u8 buf[GHASH_BLOCK_SIZE] = {};
-
-		memcpy(buf, src, len);
-		cpacf_kimd(CPACF_KIMD_GHASH, dctx, buf, GHASH_BLOCK_SIZE);
-		memzero_explicit(buf, sizeof(buf));
-	}
-}
-
-static int ghash_finup(struct shash_desc *desc, const u8 *src,
-		       unsigned int len, u8 *dst)
-{
-	struct s390_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	ghash_flush(dctx, src, len);
-	memcpy(dst, dctx->icv, GHASH_BLOCK_SIZE);
-	return 0;
-}
-
-static int ghash_export(struct shash_desc *desc, void *out)
-{
-	struct s390_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memcpy(out, dctx->icv, GHASH_DIGEST_SIZE);
-	return 0;
-}
-
-static int ghash_import(struct shash_desc *desc, const void *in)
-{
-	struct s390_ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
-	struct s390_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memcpy(dctx->icv, in, GHASH_DIGEST_SIZE);
-	memcpy(dctx->key, ctx->key, GHASH_BLOCK_SIZE);
-	return 0;
-}
-
-static struct shash_alg ghash_alg = {
-	.digestsize	= GHASH_DIGEST_SIZE,
-	.init		= ghash_init,
-	.update		= ghash_update,
-	.finup		= ghash_finup,
-	.setkey		= ghash_setkey,
-	.export		= ghash_export,
-	.import		= ghash_import,
-	.statesize	= sizeof(struct ghash_desc_ctx),
-	.descsize	= sizeof(struct s390_ghash_desc_ctx),
-	.base		= {
-		.cra_name		= "ghash",
-		.cra_driver_name	= "ghash-s390",
-		.cra_priority		= 300,
-		.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize		= GHASH_BLOCK_SIZE,
-		.cra_ctxsize		= sizeof(struct s390_ghash_ctx),
-		.cra_module		= THIS_MODULE,
-	},
-};
-
-static int __init ghash_mod_init(void)
-{
-	if (!cpacf_query_func(CPACF_KIMD, CPACF_KIMD_GHASH))
-		return -ENODEV;
-
-	return crypto_register_shash(&ghash_alg);
-}
-
-static void __exit ghash_mod_exit(void)
-{
-	crypto_unregister_shash(&ghash_alg);
-}
-
-module_cpu_feature_match(S390_CPU_FEATURE_MSA, ghash_mod_init);
-module_exit(ghash_mod_exit);
-
-MODULE_ALIAS_CRYPTO("ghash");
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("GHASH hash function, s390 implementation");
diff --git a/include/crypto/gf128hash.h b/include/crypto/gf128hash.h
index b798438cce23..0bc649d01e12 100644
--- a/include/crypto/gf128hash.h
+++ b/include/crypto/gf128hash.h
@@ -42,11 +42,12 @@ struct polyval_elem {
  */
 struct ghash_key {
 #if defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && defined(CONFIG_PPC64)
 	/** @htable: GHASH key format used by the POWER8 assembly code */
 	u64 htable[4][2];
-#elif defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && defined(CONFIG_RISCV)
+#elif defined(CONFIG_CRYPTO_LIB_GF128HASH_ARCH) && \
+	(defined(CONFIG_RISCV) || defined(CONFIG_S390))
 	/** @h_raw: The hash key H, in GHASH format */
 	u8 h_raw[GHASH_BLOCK_SIZE];
 #endif
 	/** @h: The hash key H, in POLYVAL format */
 	struct polyval_elem h;
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 027802e0de33..a39e7707e9ee 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -122,10 +122,11 @@ config CRYPTO_LIB_GF128HASH_ARCH
 	default y if ARM && KERNEL_MODE_NEON
 	default y if ARM64
 	default y if PPC64 && VSX
 	default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
 		     RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
+	default y if S390
 	default y if X86_64
 
 config CRYPTO_LIB_MD5
 	tristate
 	help
diff --git a/lib/crypto/s390/gf128hash.h b/lib/crypto/s390/gf128hash.h
new file mode 100644
index 000000000000..1e46ce4bca40
--- /dev/null
+++ b/lib/crypto/s390/gf128hash.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * GHASH optimized using the CP Assist for Cryptographic Functions (CPACF)
+ *
+ * Copyright 2026 Google LLC
+ */
+#include <asm/cpacf.h>
+#include <linux/cpufeature.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_cpacf_ghash);
+
+#define ghash_preparekey_arch ghash_preparekey_arch
+static void ghash_preparekey_arch(struct ghash_key *key,
+				  const u8 raw_key[GHASH_BLOCK_SIZE])
+{
+	/* Save key in POLYVAL format for fallback */
+	ghash_key_to_polyval(raw_key, &key->h);
+
+	/* Save key in GHASH format for CPACF_KIMD_GHASH */
+	memcpy(key->h_raw, raw_key, GHASH_BLOCK_SIZE);
+}
+
+#define ghash_blocks_arch ghash_blocks_arch
+static void ghash_blocks_arch(struct polyval_elem *acc,
+			      const struct ghash_key *key,
+			      const u8 *data, size_t nblocks)
+{
+	if (static_branch_likely(&have_cpacf_ghash)) {
+		/*
+		 * CPACF_KIMD_GHASH requires the accumulator and key in a single
+		 * buffer, each using the GHASH convention.
+		 */
+		u8 ctx[2][GHASH_BLOCK_SIZE] __aligned(8);
+
+		polyval_acc_to_ghash(acc, ctx[0]);
+		memcpy(ctx[1], key->h_raw, GHASH_BLOCK_SIZE);
+
+		cpacf_kimd(CPACF_KIMD_GHASH, ctx, data,
+			   nblocks * GHASH_BLOCK_SIZE);
+
+		ghash_acc_to_polyval(ctx[0], acc);
+		memzero_explicit(ctx, sizeof(ctx));
+	} else {
+		ghash_blocks_generic(acc, &key->h, data, nblocks);
+	}
+}
+
+#define gf128hash_mod_init_arch gf128hash_mod_init_arch
+static void gf128hash_mod_init_arch(void)
+{
+	if (cpu_have_feature(S390_CPU_FEATURE_MSA) &&
+	    cpacf_query_func(CPACF_KIMD, CPACF_KIMD_GHASH))
+		static_branch_enable(&have_cpacf_ghash);
+}
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/19] lib/crypto: x86/ghash: Migrate optimized code into library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (12 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 13/19] lib/crypto: s390/ghash: " Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 15/19] crypto: gcm - Use GHASH library instead of crypto_ahash Eric Biggers
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the "ghash-pclmulqdqni" crypto_shash algorithm.  Move the
corresponding assembly code into lib/crypto/, and wire it up to the
GHASH library.

This makes the GHASH library be optimized with x86's carryless
multiplication instructions.  It also greatly reduces the amount of
x86-specific glue code that is needed, and it fixes the issue where this
GHASH optimization was disabled by default.

Rename and adjust the prototypes of the assembly functions to make them
fit better with the library.  Remove the byte-swaps (pshufb
instructions) that are no longer necessary because the library keeps the
accumulator in POLYVAL format rather than GHASH format.

Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it
really does a POLYVAL style multiplication.  Wire it up to both
ghash_mul_arch() and polyval_mul_arch().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/x86/crypto/Kconfig                       |  10 --
 arch/x86/crypto/Makefile                      |   3 -
 arch/x86/crypto/ghash-clmulni-intel_glue.c    | 163 ------------------
 lib/crypto/Makefile                           |   3 +-
 lib/crypto/x86/gf128hash.h                    |  65 ++++++-
 .../crypto/x86/ghash-pclmul.S                 |  98 +++++------
 6 files changed, 104 insertions(+), 238 deletions(-)
 delete mode 100644 arch/x86/crypto/ghash-clmulni-intel_glue.c
 rename arch/x86/crypto/ghash-clmulni-intel_asm.S => lib/crypto/x86/ghash-pclmul.S (54%)

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 7fb2319a0916..905e8a23cec3 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -342,16 +342,6 @@ config CRYPTO_SM3_AVX_X86_64
 	  Architecture: x86_64 using:
 	  - AVX (Advanced Vector Extensions)
 
 	  If unsure, say N.
 
-config CRYPTO_GHASH_CLMUL_NI_INTEL
-	tristate "Hash functions: GHASH (CLMUL-NI)"
-	depends on 64BIT
-	select CRYPTO_CRYPTD
-	help
-	  GCM GHASH hash function (NIST SP800-38D)
-
-	  Architecture: x86_64 using:
-	  - CLMUL-NI (carry-less multiplication new instructions)
-
 endmenu
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index b21ad0978c52..d562f4341da6 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -48,13 +48,10 @@ aesni-intel-$(CONFIG_64BIT) += aes-ctr-avx-x86_64.o \
 			       aes-gcm-aesni-x86_64.o \
 			       aes-gcm-vaes-avx2.o \
 			       aes-gcm-vaes-avx512.o \
 			       aes-xts-avx-x86_64.o
 
-obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
-ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
-
 obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o
 sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o
 
 obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o
 sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o
diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
deleted file mode 100644
index aea5d4d06be7..000000000000
--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
+++ /dev/null
@@ -1,163 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Accelerated GHASH implementation with Intel PCLMULQDQ-NI
- * instructions. This file contains glue code.
- *
- * Copyright (c) 2009 Intel Corp.
- *   Author: Huang Ying <ying.huang@intel.com>
- */
-
-#include <asm/cpu_device_id.h>
-#include <asm/simd.h>
-#include <crypto/b128ops.h>
-#include <crypto/ghash.h>
-#include <crypto/internal/hash.h>
-#include <crypto/utils.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-#include <linux/unaligned.h>
-
-asmlinkage void clmul_ghash_mul(char *dst, const le128 *shash);
-
-asmlinkage int clmul_ghash_update(char *dst, const char *src,
-				  unsigned int srclen, const le128 *shash);
-
-struct x86_ghash_ctx {
-	le128 shash;
-};
-
-static int ghash_init(struct shash_desc *desc)
-{
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memset(dctx, 0, sizeof(*dctx));
-
-	return 0;
-}
-
-static int ghash_setkey(struct crypto_shash *tfm,
-			const u8 *key, unsigned int keylen)
-{
-	struct x86_ghash_ctx *ctx = crypto_shash_ctx(tfm);
-	u64 a, b;
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	/*
-	 * GHASH maps bits to polynomial coefficients backwards, which makes it
-	 * hard to implement.  But it can be shown that the GHASH multiplication
-	 *
-	 *	D * K (mod x^128 + x^7 + x^2 + x + 1)
-	 *
-	 * (where D is a data block and K is the key) is equivalent to:
-	 *
-	 *	bitreflect(D) * bitreflect(K) * x^(-127)
-	 *		(mod x^128 + x^127 + x^126 + x^121 + 1)
-	 *
-	 * So, the code below precomputes:
-	 *
-	 *	bitreflect(K) * x^(-127) (mod x^128 + x^127 + x^126 + x^121 + 1)
-	 *
-	 * ... but in Montgomery form (so that Montgomery multiplication can be
-	 * used), i.e. with an extra x^128 factor, which means actually:
-	 *
-	 *	bitreflect(K) * x (mod x^128 + x^127 + x^126 + x^121 + 1)
-	 *
-	 * The within-a-byte part of bitreflect() cancels out GHASH's built-in
-	 * reflection, and thus bitreflect() is actually a byteswap.
-	 */
-	a = get_unaligned_be64(key);
-	b = get_unaligned_be64(key + 8);
-	ctx->shash.a = cpu_to_le64((a << 1) | (b >> 63));
-	ctx->shash.b = cpu_to_le64((b << 1) | (a >> 63));
-	if (a >> 63)
-		ctx->shash.a ^= cpu_to_le64((u64)0xc2 << 56);
-	return 0;
-}
-
-static int ghash_update(struct shash_desc *desc,
-			 const u8 *src, unsigned int srclen)
-{
-	struct x86_ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-	u8 *dst = dctx->buffer;
-	int remain;
-
-	kernel_fpu_begin();
-	remain = clmul_ghash_update(dst, src, srclen, &ctx->shash);
-	kernel_fpu_end();
-	return remain;
-}
-
-static void ghash_flush(struct x86_ghash_ctx *ctx, struct ghash_desc_ctx *dctx,
-			const u8 *src, unsigned int len)
-{
-	u8 *dst = dctx->buffer;
-
-	kernel_fpu_begin();
-	if (len) {
-		crypto_xor(dst, src, len);
-		clmul_ghash_mul(dst, &ctx->shash);
-	}
-	kernel_fpu_end();
-}
-
-static int ghash_finup(struct shash_desc *desc, const u8 *src,
-		       unsigned int len, u8 *dst)
-{
-	struct x86_ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-	u8 *buf = dctx->buffer;
-
-	ghash_flush(ctx, dctx, src, len);
-	memcpy(dst, buf, GHASH_BLOCK_SIZE);
-
-	return 0;
-}
-
-static struct shash_alg ghash_alg = {
-	.digestsize	= GHASH_DIGEST_SIZE,
-	.init		= ghash_init,
-	.update		= ghash_update,
-	.finup		= ghash_finup,
-	.setkey		= ghash_setkey,
-	.descsize	= sizeof(struct ghash_desc_ctx),
-	.base		= {
-		.cra_name		= "ghash",
-		.cra_driver_name	= "ghash-pclmulqdqni",
-		.cra_priority		= 400,
-		.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize		= GHASH_BLOCK_SIZE,
-		.cra_ctxsize		= sizeof(struct x86_ghash_ctx),
-		.cra_module		= THIS_MODULE,
-	},
-};
-
-static const struct x86_cpu_id pcmul_cpu_id[] = {
-	X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL), /* Pickle-Mickle-Duck */
-	{}
-};
-MODULE_DEVICE_TABLE(x86cpu, pcmul_cpu_id);
-
-static int __init ghash_pclmulqdqni_mod_init(void)
-{
-	if (!x86_match_cpu(pcmul_cpu_id))
-		return -ENODEV;
-
-	return crypto_register_shash(&ghash_alg);
-}
-
-static void __exit ghash_pclmulqdqni_mod_exit(void)
-{
-	crypto_unregister_shash(&ghash_alg);
-}
-
-module_init(ghash_pclmulqdqni_mod_init);
-module_exit(ghash_pclmulqdqni_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("GHASH hash function, accelerated by PCLMULQDQ-NI");
-MODULE_ALIAS_CRYPTO("ghash");
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 8950509833af..19c67f70fb38 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -172,11 +172,12 @@ $(obj)/powerpc/ghashp8-ppc.S: $(src)/powerpc/ghashp8-ppc.pl FORCE
 targets += powerpc/ghashp8-ppc.S
 OBJECT_FILES_NON_STANDARD_powerpc/ghashp8-ppc.o := y
 endif
 
 libgf128hash-$(CONFIG_RISCV) += riscv/ghash-riscv64-zvkg.o
-libgf128hash-$(CONFIG_X86) += x86/polyval-pclmul-avx.o
+libgf128hash-$(CONFIG_X86) += x86/ghash-pclmul.o \
+			      x86/polyval-pclmul-avx.o
 endif # CONFIG_CRYPTO_LIB_GF128HASH_ARCH
 
 # clean-files must be defined unconditionally
 clean-files += powerpc/ghashp8-ppc.S
 
diff --git a/lib/crypto/x86/gf128hash.h b/lib/crypto/x86/gf128hash.h
index adf6147ea677..6b79b06caab0 100644
--- a/lib/crypto/x86/gf128hash.h
+++ b/lib/crypto/x86/gf128hash.h
@@ -1,20 +1,27 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * POLYVAL library functions, x86_64 optimized
+ * GHASH and POLYVAL, x86_64 optimized
  *
  * Copyright 2025 Google LLC
  */
 #include <asm/fpu/api.h>
 #include <linux/cpufeature.h>
 
 #define NUM_H_POWERS 8
 
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pclmul);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pclmul_avx);
 
+asmlinkage void polyval_mul_pclmul(struct polyval_elem *a,
+				   const struct polyval_elem *b);
 asmlinkage void polyval_mul_pclmul_avx(struct polyval_elem *a,
 				       const struct polyval_elem *b);
+
+asmlinkage void ghash_blocks_pclmul(struct polyval_elem *acc,
+				    const struct polyval_elem *key,
+				    const u8 *data, size_t nblocks);
 asmlinkage void polyval_blocks_pclmul_avx(struct polyval_elem *acc,
 					  const struct polyval_key *key,
 					  const u8 *data, size_t nblocks);
 
 #define polyval_preparekey_arch polyval_preparekey_arch
@@ -39,20 +46,58 @@ static void polyval_preparekey_arch(struct polyval_key *key,
 					    &key->h_powers[NUM_H_POWERS - 1]);
 		}
 	}
 }
 
+static void polyval_mul_x86(struct polyval_elem *a,
+			    const struct polyval_elem *b)
+{
+	if (static_branch_likely(&have_pclmul) && irq_fpu_usable()) {
+		kernel_fpu_begin();
+		if (static_branch_likely(&have_pclmul_avx))
+			polyval_mul_pclmul_avx(a, b);
+		else
+			polyval_mul_pclmul(a, b);
+		kernel_fpu_end();
+	} else {
+		polyval_mul_generic(a, b);
+	}
+}
+
+#define ghash_mul_arch ghash_mul_arch
+static void ghash_mul_arch(struct polyval_elem *acc,
+			   const struct ghash_key *key)
+{
+	polyval_mul_x86(acc, &key->h);
+}
+
 #define polyval_mul_arch polyval_mul_arch
 static void polyval_mul_arch(struct polyval_elem *acc,
 			     const struct polyval_key *key)
 {
-	if (static_branch_likely(&have_pclmul_avx) && irq_fpu_usable()) {
-		kernel_fpu_begin();
-		polyval_mul_pclmul_avx(acc, &key->h_powers[NUM_H_POWERS - 1]);
-		kernel_fpu_end();
+	polyval_mul_x86(acc, &key->h_powers[NUM_H_POWERS - 1]);
+}
+
+#define ghash_blocks_arch ghash_blocks_arch
+static void ghash_blocks_arch(struct polyval_elem *acc,
+			      const struct ghash_key *key,
+			      const u8 *data, size_t nblocks)
+{
+	if (static_branch_likely(&have_pclmul) && irq_fpu_usable()) {
+		do {
+			/* Allow rescheduling every 4 KiB. */
+			size_t n = min_t(size_t, nblocks,
+					 4096 / GHASH_BLOCK_SIZE);
+
+			kernel_fpu_begin();
+			ghash_blocks_pclmul(acc, &key->h, data, n);
+			kernel_fpu_end();
+			data += n * GHASH_BLOCK_SIZE;
+			nblocks -= n;
+		} while (nblocks);
 	} else {
-		polyval_mul_generic(acc, &key->h_powers[NUM_H_POWERS - 1]);
+		ghash_blocks_generic(acc, &key->h, data, nblocks);
 	}
 }
 
 #define polyval_blocks_arch polyval_blocks_arch
 static void polyval_blocks_arch(struct polyval_elem *acc,
@@ -78,9 +123,11 @@ static void polyval_blocks_arch(struct polyval_elem *acc,
 }
 
 #define gf128hash_mod_init_arch gf128hash_mod_init_arch
 static void gf128hash_mod_init_arch(void)
 {
-	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ) &&
-	    boot_cpu_has(X86_FEATURE_AVX))
-		static_branch_enable(&have_pclmul_avx);
+	if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
+		static_branch_enable(&have_pclmul);
+		if (boot_cpu_has(X86_FEATURE_AVX))
+			static_branch_enable(&have_pclmul_avx);
+	}
 }
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/lib/crypto/x86/ghash-pclmul.S
similarity index 54%
rename from arch/x86/crypto/ghash-clmulni-intel_asm.S
rename to lib/crypto/x86/ghash-pclmul.S
index c4fbaa82ed7a..6ffb5aea6063 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/lib/crypto/x86/ghash-pclmul.S
@@ -19,12 +19,12 @@
 .section	.rodata.cst16.bswap_mask, "aM", @progbits, 16
 .align 16
 .Lbswap_mask:
 	.octa 0x000102030405060708090a0b0c0d0e0f
 
-#define DATA	%xmm0
-#define SHASH	%xmm1
+#define ACC	%xmm0
+#define KEY	%xmm1
 #define T1	%xmm2
 #define T2	%xmm3
 #define T3	%xmm4
 #define BSWAP	%xmm5
 #define IN1	%xmm6
@@ -32,102 +32,96 @@
 .text
 
 /*
  * __clmul_gf128mul_ble:	internal ABI
  * input:
- *	DATA:			operand1
- *	SHASH:			operand2, hash_key << 1 mod poly
+ *	ACC:			operand1
+ *	KEY:			operand2, hash_key << 1 mod poly
  * output:
- *	DATA:			operand1 * operand2 mod poly
+ *	ACC:			operand1 * operand2 mod poly
  * changed:
  *	T1
  *	T2
  *	T3
  */
 SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
-	movaps DATA, T1
-	pshufd $0b01001110, DATA, T2
-	pshufd $0b01001110, SHASH, T3
-	pxor DATA, T2
-	pxor SHASH, T3
+	movaps ACC, T1
+	pshufd $0b01001110, ACC, T2
+	pshufd $0b01001110, KEY, T3
+	pxor ACC, T2
+	pxor KEY, T3
 
-	pclmulqdq $0x00, SHASH, DATA	# DATA = a0 * b0
-	pclmulqdq $0x11, SHASH, T1	# T1 = a1 * b1
+	pclmulqdq $0x00, KEY, ACC	# ACC = a0 * b0
+	pclmulqdq $0x11, KEY, T1	# T1 = a1 * b1
 	pclmulqdq $0x00, T3, T2		# T2 = (a1 + a0) * (b1 + b0)
-	pxor DATA, T2
+	pxor ACC, T2
 	pxor T1, T2			# T2 = a0 * b1 + a1 * b0
 
 	movaps T2, T3
 	pslldq $8, T3
 	psrldq $8, T2
-	pxor T3, DATA
-	pxor T2, T1			# <T1:DATA> is result of
+	pxor T3, ACC
+	pxor T2, T1			# <T1:ACC> is result of
 					# carry-less multiplication
 
 	# first phase of the reduction
-	movaps DATA, T3
+	movaps ACC, T3
 	psllq $1, T3
-	pxor DATA, T3
+	pxor ACC, T3
 	psllq $5, T3
-	pxor DATA, T3
+	pxor ACC, T3
 	psllq $57, T3
 	movaps T3, T2
 	pslldq $8, T2
 	psrldq $8, T3
-	pxor T2, DATA
+	pxor T2, ACC
 	pxor T3, T1
 
 	# second phase of the reduction
-	movaps DATA, T2
+	movaps ACC, T2
 	psrlq $5, T2
-	pxor DATA, T2
+	pxor ACC, T2
 	psrlq $1, T2
-	pxor DATA, T2
+	pxor ACC, T2
 	psrlq $1, T2
 	pxor T2, T1
-	pxor T1, DATA
+	pxor T1, ACC
 	RET
 SYM_FUNC_END(__clmul_gf128mul_ble)
 
-/* void clmul_ghash_mul(char *dst, const le128 *shash) */
-SYM_FUNC_START(clmul_ghash_mul)
+/*
+ * void polyval_mul_pclmul(struct polyval_elem *a,
+ *			   const struct polyval_elem *b)
+ */
+SYM_FUNC_START(polyval_mul_pclmul)
 	FRAME_BEGIN
-	movups (%rdi), DATA
-	movups (%rsi), SHASH
-	movaps .Lbswap_mask(%rip), BSWAP
-	pshufb BSWAP, DATA
+	movups (%rdi), ACC
+	movups (%rsi), KEY
 	call __clmul_gf128mul_ble
-	pshufb BSWAP, DATA
-	movups DATA, (%rdi)
+	movups ACC, (%rdi)
 	FRAME_END
 	RET
-SYM_FUNC_END(clmul_ghash_mul)
+SYM_FUNC_END(polyval_mul_pclmul)
 
 /*
- * int clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
- *			  const le128 *shash);
+ * void ghash_blocks_pclmul(struct polyval_elem *acc,
+ *			    const struct polyval_elem *key,
+ *			    const u8 *data, size_t nblocks)
  */
-SYM_FUNC_START(clmul_ghash_update)
+SYM_FUNC_START(ghash_blocks_pclmul)
 	FRAME_BEGIN
-	cmp $16, %rdx
-	jb .Lupdate_just_ret	# check length
 	movaps .Lbswap_mask(%rip), BSWAP
-	movups (%rdi), DATA
-	movups (%rcx), SHASH
-	pshufb BSWAP, DATA
+	movups (%rdi), ACC
+	movups (%rsi), KEY
 .align 4
-.Lupdate_loop:
-	movups (%rsi), IN1
+.Lnext_block:
+	movups (%rdx), IN1
 	pshufb BSWAP, IN1
-	pxor IN1, DATA
+	pxor IN1, ACC
 	call __clmul_gf128mul_ble
-	sub $16, %rdx
-	add $16, %rsi
-	cmp $16, %rdx
-	jge .Lupdate_loop
-	pshufb BSWAP, DATA
-	movups DATA, (%rdi)
-.Lupdate_just_ret:
-	mov %rdx, %rax
+	add $16, %rdx
+	dec %rcx
+	jnz .Lnext_block
+	movups ACC, (%rdi)
 	FRAME_END
 	RET
-SYM_FUNC_END(clmul_ghash_update)
+SYM_FUNC_END(ghash_blocks_pclmul)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 15/19] crypto: gcm - Use GHASH library instead of crypto_ahash
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (13 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 14/19] lib/crypto: x86/ghash: " Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 16/19] crypto: ghash - Remove ghash from crypto_shash API Eric Biggers
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Make the "gcm" template access GHASH using the library API instead of
crypto_ahash.  This is much simpler and more efficient, especially given
that all GHASH implementations are synchronous and CPU-based anyway.

Note that this allows "ghash" to be removed from the crypto_ahash (and
crypto_shash) API, which a later commit will do.

This mirrors the similar cleanup that was done with POLYVAL.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 crypto/Kconfig                       |   2 +-
 crypto/gcm.c                         | 413 +++++----------------------
 crypto/testmgr.c                     |  10 +-
 drivers/crypto/starfive/jh7110-aes.c |   2 +-
 4 files changed, 85 insertions(+), 342 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 5627b3691561..13ccf5ac2f1a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -792,11 +792,11 @@ config CRYPTO_CCM
 
 config CRYPTO_GCM
 	tristate "GCM (Galois/Counter Mode) and GMAC (GCM MAC)"
 	select CRYPTO_CTR
 	select CRYPTO_AEAD
-	select CRYPTO_GHASH
+	select CRYPTO_LIB_GF128HASH
 	select CRYPTO_MANAGER
 	help
 	  GCM (Galois/Counter Mode) authenticated encryption mode and GMAC
 	  (GCM Message Authentication Code) (NIST SP800-38D)
 
diff --git a/crypto/gcm.c b/crypto/gcm.c
index e1e878d37410..5f16b237b3c5 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -3,31 +3,28 @@
  * GCM: Galois/Counter Mode.
  *
  * Copyright (c) 2007 Nokia Siemens Networks - Mikko Herranen <mh1@iki.fi>
  */
 
-#include <crypto/gf128mul.h>
 #include <crypto/internal/aead.h>
 #include <crypto/internal/skcipher.h>
-#include <crypto/internal/hash.h>
 #include <crypto/scatterwalk.h>
 #include <crypto/gcm.h>
-#include <crypto/hash.h>
+#include <crypto/gf128hash.h>
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 
 struct gcm_instance_ctx {
 	struct crypto_skcipher_spawn ctr;
-	struct crypto_ahash_spawn ghash;
 };
 
 struct crypto_gcm_ctx {
 	struct crypto_skcipher *ctr;
-	struct crypto_ahash *ghash;
+	struct ghash_key ghash;
 };
 
 struct crypto_rfc4106_ctx {
 	struct crypto_aead *child;
 	u8 nonce[4];
@@ -50,35 +47,19 @@ struct crypto_rfc4543_ctx {
 
 struct crypto_rfc4543_req_ctx {
 	struct aead_request subreq;
 };
 
-struct crypto_gcm_ghash_ctx {
-	unsigned int cryptlen;
-	struct scatterlist *src;
-	int (*complete)(struct aead_request *req, u32 flags);
-};
-
 struct crypto_gcm_req_priv_ctx {
 	u8 iv[16];
 	u8 auth_tag[16];
 	u8 iauth_tag[16];
 	struct scatterlist src[3];
 	struct scatterlist dst[3];
-	struct scatterlist sg;
-	struct crypto_gcm_ghash_ctx ghash_ctx;
-	union {
-		struct ahash_request ahreq;
-		struct skcipher_request skreq;
-	} u;
+	struct skcipher_request skreq; /* Must be last */
 };
 
-static struct {
-	u8 buf[16];
-	struct scatterlist sg;
-} *gcm_zeroes;
-
 static inline struct crypto_gcm_req_priv_ctx *crypto_gcm_reqctx(
 	struct aead_request *req)
 {
 	unsigned long align = crypto_aead_alignmask(crypto_aead_reqtfm(req));
 
@@ -87,14 +68,13 @@ static inline struct crypto_gcm_req_priv_ctx *crypto_gcm_reqctx(
 
 static int crypto_gcm_setkey(struct crypto_aead *aead, const u8 *key,
 			     unsigned int keylen)
 {
 	struct crypto_gcm_ctx *ctx = crypto_aead_ctx(aead);
-	struct crypto_ahash *ghash = ctx->ghash;
 	struct crypto_skcipher *ctr = ctx->ctr;
 	struct {
-		be128 hash;
+		u8 h[GHASH_BLOCK_SIZE];
 		u8 iv[16];
 
 		struct crypto_wait wait;
 
 		struct scatterlist sg[1];
@@ -113,29 +93,26 @@ static int crypto_gcm_setkey(struct crypto_aead *aead, const u8 *key,
 		       GFP_KERNEL);
 	if (!data)
 		return -ENOMEM;
 
 	crypto_init_wait(&data->wait);
-	sg_init_one(data->sg, &data->hash, sizeof(data->hash));
+	sg_init_one(data->sg, data->h, sizeof(data->h));
 	skcipher_request_set_tfm(&data->req, ctr);
 	skcipher_request_set_callback(&data->req, CRYPTO_TFM_REQ_MAY_SLEEP |
 						  CRYPTO_TFM_REQ_MAY_BACKLOG,
 				      crypto_req_done,
 				      &data->wait);
 	skcipher_request_set_crypt(&data->req, data->sg, data->sg,
-				   sizeof(data->hash), data->iv);
+				   sizeof(data->h), data->iv);
 
 	err = crypto_wait_req(crypto_skcipher_encrypt(&data->req),
 							&data->wait);
 
 	if (err)
 		goto out;
 
-	crypto_ahash_clear_flags(ghash, CRYPTO_TFM_REQ_MASK);
-	crypto_ahash_set_flags(ghash, crypto_aead_get_flags(aead) &
-			       CRYPTO_TFM_REQ_MASK);
-	err = crypto_ahash_setkey(ghash, (u8 *)&data->hash, sizeof(be128));
+	ghash_preparekey(&ctx->ghash, data->h);
 out:
 	kfree_sensitive(data);
 	return err;
 }
 
@@ -174,288 +151,106 @@ static void crypto_gcm_init_crypt(struct aead_request *req,
 				  unsigned int cryptlen)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
 	struct crypto_gcm_ctx *ctx = crypto_aead_ctx(aead);
 	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct skcipher_request *skreq = &pctx->u.skreq;
+	struct skcipher_request *skreq = &pctx->skreq;
 	struct scatterlist *dst;
 
 	dst = req->src == req->dst ? pctx->src : pctx->dst;
 
 	skcipher_request_set_tfm(skreq, ctx->ctr);
 	skcipher_request_set_crypt(skreq, pctx->src, dst,
 				     cryptlen + sizeof(pctx->auth_tag),
 				     pctx->iv);
 }
 
-static inline unsigned int gcm_remain(unsigned int len)
-{
-	len &= 0xfU;
-	return len ? 16 - len : 0;
-}
-
-static void gcm_hash_len_done(void *data, int err);
-
-static int gcm_hash_update(struct aead_request *req,
-			   crypto_completion_t compl,
-			   struct scatterlist *src,
-			   unsigned int len, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct ahash_request *ahreq = &pctx->u.ahreq;
-
-	ahash_request_set_callback(ahreq, flags, compl, req);
-	ahash_request_set_crypt(ahreq, src, NULL, len);
-
-	return crypto_ahash_update(ahreq);
-}
-
-static int gcm_hash_remain(struct aead_request *req,
-			   unsigned int remain,
-			   crypto_completion_t compl, u32 flags)
-{
-	return gcm_hash_update(req, compl, &gcm_zeroes->sg, remain, flags);
-}
-
-static int gcm_hash_len(struct aead_request *req, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct ahash_request *ahreq = &pctx->u.ahreq;
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
-	be128 lengths;
-
-	lengths.a = cpu_to_be64(req->assoclen * 8);
-	lengths.b = cpu_to_be64(gctx->cryptlen * 8);
-	memcpy(pctx->iauth_tag, &lengths, 16);
-	sg_init_one(&pctx->sg, pctx->iauth_tag, 16);
-	ahash_request_set_callback(ahreq, flags, gcm_hash_len_done, req);
-	ahash_request_set_crypt(ahreq, &pctx->sg,
-				pctx->iauth_tag, sizeof(lengths));
-
-	return crypto_ahash_finup(ahreq);
-}
-
-static int gcm_hash_len_continue(struct aead_request *req, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
-
-	return gctx->complete(req, flags);
-}
-
-static void gcm_hash_len_done(void *data, int err)
-{
-	struct aead_request *req = data;
-
-	if (err)
-		goto out;
-
-	err = gcm_hash_len_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
-
-out:
-	aead_request_complete(req, err);
-}
-
-static int gcm_hash_crypt_remain_continue(struct aead_request *req, u32 flags)
-{
-	return gcm_hash_len(req, flags) ?:
-	       gcm_hash_len_continue(req, flags);
-}
-
-static void gcm_hash_crypt_remain_done(void *data, int err)
-{
-	struct aead_request *req = data;
-
-	if (err)
-		goto out;
-
-	err = gcm_hash_crypt_remain_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
-
-out:
-	aead_request_complete(req, err);
-}
-
-static int gcm_hash_crypt_continue(struct aead_request *req, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
-	unsigned int remain;
-
-	remain = gcm_remain(gctx->cryptlen);
-	if (remain)
-		return gcm_hash_remain(req, remain,
-				       gcm_hash_crypt_remain_done, flags) ?:
-		       gcm_hash_crypt_remain_continue(req, flags);
-
-	return gcm_hash_crypt_remain_continue(req, flags);
-}
-
-static void gcm_hash_crypt_done(void *data, int err)
-{
-	struct aead_request *req = data;
-
-	if (err)
-		goto out;
-
-	err = gcm_hash_crypt_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
-
-out:
-	aead_request_complete(req, err);
-}
-
-static int gcm_hash_assoc_remain_continue(struct aead_request *req, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
-
-	if (gctx->cryptlen)
-		return gcm_hash_update(req, gcm_hash_crypt_done,
-				       gctx->src, gctx->cryptlen, flags) ?:
-		       gcm_hash_crypt_continue(req, flags);
-
-	return gcm_hash_crypt_remain_continue(req, flags);
-}
-
-static void gcm_hash_assoc_remain_done(void *data, int err)
-{
-	struct aead_request *req = data;
-
-	if (err)
-		goto out;
-
-	err = gcm_hash_assoc_remain_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
-
-out:
-	aead_request_complete(req, err);
-}
-
-static int gcm_hash_assoc_continue(struct aead_request *req, u32 flags)
+static void ghash_update_sg_and_pad(struct ghash_ctx *ghash,
+				    struct scatterlist *sg, unsigned int len)
 {
-	unsigned int remain;
+	static const u8 zeroes[GHASH_BLOCK_SIZE];
 
-	remain = gcm_remain(req->assoclen);
-	if (remain)
-		return gcm_hash_remain(req, remain,
-				       gcm_hash_assoc_remain_done, flags) ?:
-		       gcm_hash_assoc_remain_continue(req, flags);
+	if (len) {
+		unsigned int pad_len = -len % GHASH_BLOCK_SIZE;
+		struct scatter_walk walk;
 
-	return gcm_hash_assoc_remain_continue(req, flags);
-}
+		scatterwalk_start(&walk, sg);
+		do {
+			unsigned int n = scatterwalk_next(&walk, len);
 
-static void gcm_hash_assoc_done(void *data, int err)
-{
-	struct aead_request *req = data;
+			ghash_update(ghash, walk.addr, n);
+			scatterwalk_done_src(&walk, n);
+			len -= n;
+		} while (len);
 
-	if (err)
-		goto out;
-
-	err = gcm_hash_assoc_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
-
-out:
-	aead_request_complete(req, err);
-}
-
-static int gcm_hash_init_continue(struct aead_request *req, u32 flags)
-{
-	if (req->assoclen)
-		return gcm_hash_update(req, gcm_hash_assoc_done,
-				       req->src, req->assoclen, flags) ?:
-		       gcm_hash_assoc_continue(req, flags);
-
-	return gcm_hash_assoc_remain_continue(req, flags);
+		if (pad_len)
+			ghash_update(ghash, zeroes, pad_len);
+	}
 }
 
-static void gcm_hash_init_done(void *data, int err)
+static void gcm_hash(struct aead_request *req, struct scatterlist *ctext,
+		     unsigned int datalen, u8 out[GHASH_BLOCK_SIZE])
 {
-	struct aead_request *req = data;
-
-	if (err)
-		goto out;
+	const struct crypto_gcm_ctx *ctx =
+		crypto_aead_ctx(crypto_aead_reqtfm(req));
+	__be64 lengths[2] = {
+		cpu_to_be64(8 * (u64)req->assoclen),
+		cpu_to_be64(8 * (u64)datalen),
+	};
+	struct ghash_ctx ghash;
 
-	err = gcm_hash_init_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
+	ghash_init(&ghash, &ctx->ghash);
 
-out:
-	aead_request_complete(req, err);
-}
+	/* Associated data, then zero-padding to the next 16-byte boundary */
+	ghash_update_sg_and_pad(&ghash, req->src, req->assoclen);
 
-static int gcm_hash(struct aead_request *req, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct ahash_request *ahreq = &pctx->u.ahreq;
-	struct crypto_gcm_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req));
+	/* Ciphertext, then zero-padding to the next 16-byte boundary */
+	ghash_update_sg_and_pad(&ghash, ctext, datalen);
 
-	ahash_request_set_tfm(ahreq, ctx->ghash);
+	/* Lengths block */
+	ghash_update(&ghash, (const u8 *)lengths, sizeof(lengths));
 
-	ahash_request_set_callback(ahreq, flags, gcm_hash_init_done, req);
-	return crypto_ahash_init(ahreq) ?:
-	       gcm_hash_init_continue(req, flags);
+	ghash_final(&ghash, out);
 }
 
-static int gcm_enc_copy_hash(struct aead_request *req, u32 flags)
+static int gcm_add_auth_tag(struct aead_request *req)
 {
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
-	u8 *auth_tag = pctx->auth_tag;
-
-	crypto_xor(auth_tag, pctx->iauth_tag, 16);
-	scatterwalk_map_and_copy(auth_tag, req->dst,
-				 req->assoclen + req->cryptlen,
-				 crypto_aead_authsize(aead), 1);
-	return 0;
-}
-
-static int gcm_encrypt_continue(struct aead_request *req, u32 flags)
-{
 	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
 
-	gctx->src = sg_next(req->src == req->dst ? pctx->src : pctx->dst);
-	gctx->cryptlen = req->cryptlen;
-	gctx->complete = gcm_enc_copy_hash;
-
-	return gcm_hash(req, flags);
+	gcm_hash(req, sg_next(req->src == req->dst ? pctx->src : pctx->dst),
+		 req->cryptlen, pctx->iauth_tag);
+	crypto_xor(pctx->auth_tag, pctx->iauth_tag, 16);
+	memcpy_to_sglist(req->dst, req->assoclen + req->cryptlen,
+			 pctx->auth_tag, crypto_aead_authsize(aead));
+	return 0;
 }
 
 static void gcm_encrypt_done(void *data, int err)
 {
 	struct aead_request *req = data;
 
 	if (err)
 		goto out;
 
-	err = gcm_encrypt_continue(req, 0);
-	if (err == -EINPROGRESS)
-		return;
+	err = gcm_add_auth_tag(req);
 
 out:
 	aead_request_complete(req, err);
 }
 
 static int crypto_gcm_encrypt(struct aead_request *req)
 {
 	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct skcipher_request *skreq = &pctx->u.skreq;
+	struct skcipher_request *skreq = &pctx->skreq;
 	u32 flags = aead_request_flags(req);
 
 	crypto_gcm_init_common(req);
 	crypto_gcm_init_crypt(req, req->cryptlen);
 	skcipher_request_set_callback(skreq, flags, gcm_encrypt_done, req);
 
-	return crypto_skcipher_encrypt(skreq) ?:
-	       gcm_encrypt_continue(req, flags);
+	return crypto_skcipher_encrypt(skreq) ?: gcm_add_auth_tag(req);
 }
 
 static int crypto_gcm_verify(struct aead_request *req)
 {
 	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
@@ -479,106 +274,71 @@ static void gcm_decrypt_done(void *data, int err)
 		err = crypto_gcm_verify(req);
 
 	aead_request_complete(req, err);
 }
 
-static int gcm_dec_hash_continue(struct aead_request *req, u32 flags)
-{
-	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct skcipher_request *skreq = &pctx->u.skreq;
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
-
-	crypto_gcm_init_crypt(req, gctx->cryptlen);
-	skcipher_request_set_callback(skreq, flags, gcm_decrypt_done, req);
-	return crypto_skcipher_decrypt(skreq) ?: crypto_gcm_verify(req);
-}
-
 static int crypto_gcm_decrypt(struct aead_request *req)
 {
 	struct crypto_aead *aead = crypto_aead_reqtfm(req);
 	struct crypto_gcm_req_priv_ctx *pctx = crypto_gcm_reqctx(req);
-	struct crypto_gcm_ghash_ctx *gctx = &pctx->ghash_ctx;
-	unsigned int authsize = crypto_aead_authsize(aead);
-	unsigned int cryptlen = req->cryptlen;
-	u32 flags = aead_request_flags(req);
-
-	cryptlen -= authsize;
+	struct skcipher_request *skreq = &pctx->skreq;
+	unsigned int datalen = req->cryptlen - crypto_aead_authsize(aead);
 
 	crypto_gcm_init_common(req);
 
-	gctx->src = sg_next(pctx->src);
-	gctx->cryptlen = cryptlen;
-	gctx->complete = gcm_dec_hash_continue;
+	gcm_hash(req, sg_next(pctx->src), datalen, pctx->iauth_tag);
 
-	return gcm_hash(req, flags);
+	crypto_gcm_init_crypt(req, datalen);
+	skcipher_request_set_callback(skreq, aead_request_flags(req),
+				      gcm_decrypt_done, req);
+	return crypto_skcipher_decrypt(skreq) ?: crypto_gcm_verify(req);
 }
 
 static int crypto_gcm_init_tfm(struct crypto_aead *tfm)
 {
 	struct aead_instance *inst = aead_alg_instance(tfm);
 	struct gcm_instance_ctx *ictx = aead_instance_ctx(inst);
 	struct crypto_gcm_ctx *ctx = crypto_aead_ctx(tfm);
 	struct crypto_skcipher *ctr;
-	struct crypto_ahash *ghash;
 	unsigned long align;
-	int err;
-
-	ghash = crypto_spawn_ahash(&ictx->ghash);
-	if (IS_ERR(ghash))
-		return PTR_ERR(ghash);
 
 	ctr = crypto_spawn_skcipher(&ictx->ctr);
-	err = PTR_ERR(ctr);
 	if (IS_ERR(ctr))
-		goto err_free_hash;
+		return PTR_ERR(ctr);
 
 	ctx->ctr = ctr;
-	ctx->ghash = ghash;
 
 	align = crypto_aead_alignmask(tfm);
 	align &= ~(crypto_tfm_ctx_alignment() - 1);
 	crypto_aead_set_reqsize(tfm,
-		align + offsetof(struct crypto_gcm_req_priv_ctx, u) +
-		max(sizeof(struct skcipher_request) +
-		    crypto_skcipher_reqsize(ctr),
-		    sizeof(struct ahash_request) +
-		    crypto_ahash_reqsize(ghash)));
-
+				align + sizeof(struct crypto_gcm_req_priv_ctx) +
+					crypto_skcipher_reqsize(ctr));
 	return 0;
-
-err_free_hash:
-	crypto_free_ahash(ghash);
-	return err;
 }
 
 static void crypto_gcm_exit_tfm(struct crypto_aead *tfm)
 {
 	struct crypto_gcm_ctx *ctx = crypto_aead_ctx(tfm);
 
-	crypto_free_ahash(ctx->ghash);
 	crypto_free_skcipher(ctx->ctr);
 }
 
 static void crypto_gcm_free(struct aead_instance *inst)
 {
 	struct gcm_instance_ctx *ctx = aead_instance_ctx(inst);
 
 	crypto_drop_skcipher(&ctx->ctr);
-	crypto_drop_ahash(&ctx->ghash);
 	kfree(inst);
 }
 
 static int crypto_gcm_create_common(struct crypto_template *tmpl,
-				    struct rtattr **tb,
-				    const char *ctr_name,
-				    const char *ghash_name)
+				    struct rtattr **tb, const char *ctr_name)
 {
 	struct skcipher_alg_common *ctr;
 	u32 mask;
 	struct aead_instance *inst;
 	struct gcm_instance_ctx *ctx;
-	struct hash_alg_common *ghash;
 	int err;
 
 	err = crypto_check_attr_type(tb, CRYPTO_ALG_TYPE_AEAD, &mask);
 	if (err)
 		return err;
@@ -586,21 +346,10 @@ static int crypto_gcm_create_common(struct crypto_template *tmpl,
 	inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
 	if (!inst)
 		return -ENOMEM;
 	ctx = aead_instance_ctx(inst);
 
-	err = crypto_grab_ahash(&ctx->ghash, aead_crypto_instance(inst),
-				ghash_name, 0, mask);
-	if (err)
-		goto err_free_inst;
-	ghash = crypto_spawn_ahash_alg(&ctx->ghash);
-
-	err = -EINVAL;
-	if (strcmp(ghash->base.cra_name, "ghash") != 0 ||
-	    ghash->digestsize != 16)
-		goto err_free_inst;
-
 	err = crypto_grab_skcipher(&ctx->ctr, aead_crypto_instance(inst),
 				   ctr_name, 0, mask);
 	if (err)
 		goto err_free_inst;
 	ctr = crypto_spawn_skcipher_alg_common(&ctx->ctr);
@@ -615,17 +364,15 @@ static int crypto_gcm_create_common(struct crypto_template *tmpl,
 	if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME,
 		     "gcm(%s", ctr->base.cra_name + 4) >= CRYPTO_MAX_ALG_NAME)
 		goto err_free_inst;
 
 	if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME,
-		     "gcm_base(%s,%s)", ctr->base.cra_driver_name,
-		     ghash->base.cra_driver_name) >=
-	    CRYPTO_MAX_ALG_NAME)
+		     "gcm_base(%s,ghash-lib)",
+		     ctr->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
 		goto err_free_inst;
 
-	inst->alg.base.cra_priority = (ghash->base.cra_priority +
-				       ctr->base.cra_priority) / 2;
+	inst->alg.base.cra_priority = ctr->base.cra_priority;
 	inst->alg.base.cra_blocksize = 1;
 	inst->alg.base.cra_alignmask = ctr->base.cra_alignmask;
 	inst->alg.base.cra_ctxsize = sizeof(struct crypto_gcm_ctx);
 	inst->alg.ivsize = GCM_AES_IV_SIZE;
 	inst->alg.chunksize = ctr->chunksize;
@@ -658,11 +405,11 @@ static int crypto_gcm_create(struct crypto_template *tmpl, struct rtattr **tb)
 
 	if (snprintf(ctr_name, CRYPTO_MAX_ALG_NAME, "ctr(%s)", cipher_name) >=
 	    CRYPTO_MAX_ALG_NAME)
 		return -ENAMETOOLONG;
 
-	return crypto_gcm_create_common(tmpl, tb, ctr_name, "ghash");
+	return crypto_gcm_create_common(tmpl, tb, ctr_name);
 }
 
 static int crypto_gcm_base_create(struct crypto_template *tmpl,
 				  struct rtattr **tb)
 {
@@ -675,11 +422,20 @@ static int crypto_gcm_base_create(struct crypto_template *tmpl,
 
 	ghash_name = crypto_attr_alg_name(tb[2]);
 	if (IS_ERR(ghash_name))
 		return PTR_ERR(ghash_name);
 
-	return crypto_gcm_create_common(tmpl, tb, ctr_name, ghash_name);
+	/*
+	 * Originally this parameter allowed requesting a specific
+	 * implementation of GHASH.  This is no longer supported.  Now the best
+	 * implementation of GHASH is just always used.
+	 */
+	if (strcmp(ghash_name, "ghash") != 0 &&
+	    strcmp(ghash_name, "ghash-lib") != 0)
+		return -EINVAL;
+
+	return crypto_gcm_create_common(tmpl, tb, ctr_name);
 }
 
 static int crypto_rfc4106_setkey(struct crypto_aead *parent, const u8 *key,
 				 unsigned int keylen)
 {
@@ -1094,29 +850,16 @@ static struct crypto_template crypto_gcm_tmpls[] = {
 	},
 };
 
 static int __init crypto_gcm_module_init(void)
 {
-	int err;
-
-	gcm_zeroes = kzalloc_obj(*gcm_zeroes);
-	if (!gcm_zeroes)
-		return -ENOMEM;
-
-	sg_init_one(&gcm_zeroes->sg, gcm_zeroes->buf, sizeof(gcm_zeroes->buf));
-
-	err = crypto_register_templates(crypto_gcm_tmpls,
-					ARRAY_SIZE(crypto_gcm_tmpls));
-	if (err)
-		kfree(gcm_zeroes);
-
-	return err;
+	return crypto_register_templates(crypto_gcm_tmpls,
+					 ARRAY_SIZE(crypto_gcm_tmpls));
 }
 
 static void __exit crypto_gcm_module_exit(void)
 {
-	kfree(gcm_zeroes);
 	crypto_unregister_templates(crypto_gcm_tmpls,
 				    ARRAY_SIZE(crypto_gcm_tmpls));
 }
 
 module_init(crypto_gcm_module_init);
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index fec950f1628b..0b0ad358e091 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -4963,26 +4963,26 @@ static const struct alg_test_desc alg_test_descs[] = {
 			.kpp = __VECS(ffdhe8192_dh_tv_template)
 		}
 	}, {
 #endif /* CONFIG_CRYPTO_DH_RFC7919_GROUPS */
 		.alg = "gcm(aes)",
-		.generic_driver = "gcm_base(ctr(aes-lib),ghash-generic)",
+		.generic_driver = "gcm_base(ctr(aes-lib),ghash-lib)",
 		.test = alg_test_aead,
 		.fips_allowed = 1,
 		.suite = {
 			.aead = __VECS(aes_gcm_tv_template)
 		}
 	}, {
 		.alg = "gcm(aria)",
-		.generic_driver = "gcm_base(ctr(aria-generic),ghash-generic)",
+		.generic_driver = "gcm_base(ctr(aria-generic),ghash-lib)",
 		.test = alg_test_aead,
 		.suite = {
 			.aead = __VECS(aria_gcm_tv_template)
 		}
 	}, {
 		.alg = "gcm(sm4)",
-		.generic_driver = "gcm_base(ctr(sm4-generic),ghash-generic)",
+		.generic_driver = "gcm_base(ctr(sm4-generic),ghash-lib)",
 		.test = alg_test_aead,
 		.suite = {
 			.aead = __VECS(sm4_gcm_tv_template)
 		}
 	}, {
@@ -5312,11 +5312,11 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.cipher = __VECS(sm4_ctr_rfc3686_tv_template)
 		}
 	}, {
 		.alg = "rfc4106(gcm(aes))",
-		.generic_driver = "rfc4106(gcm_base(ctr(aes-lib),ghash-generic))",
+		.generic_driver = "rfc4106(gcm_base(ctr(aes-lib),ghash-lib))",
 		.test = alg_test_aead,
 		.fips_allowed = 1,
 		.suite = {
 			.aead = {
 				____VECS(aes_gcm_rfc4106_tv_template),
@@ -5336,11 +5336,11 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.aad_iv = 1,
 			}
 		}
 	}, {
 		.alg = "rfc4543(gcm(aes))",
-		.generic_driver = "rfc4543(gcm_base(ctr(aes-lib),ghash-generic))",
+		.generic_driver = "rfc4543(gcm_base(ctr(aes-lib),ghash-lib))",
 		.test = alg_test_aead,
 		.suite = {
 			.aead = {
 				____VECS(aes_gcm_rfc4543_tv_template),
 				.einval_allowed = 1,
diff --git a/drivers/crypto/starfive/jh7110-aes.c b/drivers/crypto/starfive/jh7110-aes.c
index 2e2d97d17e6c..a0713aa21250 100644
--- a/drivers/crypto/starfive/jh7110-aes.c
+++ b/drivers/crypto/starfive/jh7110-aes.c
@@ -1006,11 +1006,11 @@ static int starfive_aes_ccm_init_tfm(struct crypto_aead *tfm)
 	return starfive_aes_aead_init_tfm(tfm, "ccm_base(ctr(aes-lib),cbcmac-aes-lib)");
 }
 
 static int starfive_aes_gcm_init_tfm(struct crypto_aead *tfm)
 {
-	return starfive_aes_aead_init_tfm(tfm, "gcm_base(ctr(aes-lib),ghash-generic)");
+	return starfive_aes_aead_init_tfm(tfm, "gcm_base(ctr(aes-lib),ghash-lib)");
 }
 
 static struct skcipher_engine_alg skcipher_algs[] = {
 {
 	.base.init			= starfive_aes_ecb_init_tfm,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 16/19] crypto: ghash - Remove ghash from crypto_shash API
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (14 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 15/19] crypto: gcm - Use GHASH library instead of crypto_ahash Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 17/19] lib/crypto: gf128mul: Remove unused 4k_lle functions Eric Biggers
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Now that there are no users of the "ghash" crypto_shash algorithm,
remove it.  GHASH remains supported via the library API.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 crypto/Kconfig         |   7 --
 crypto/Makefile        |   1 -
 crypto/ghash-generic.c | 162 -----------------------------------------
 crypto/tcrypt.c        |   9 ---
 crypto/testmgr.c       |   6 --
 crypto/testmgr.h       | 109 ---------------------------
 6 files changed, 294 deletions(-)
 delete mode 100644 crypto/ghash-generic.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 13ccf5ac2f1a..efb482ea192d 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -886,17 +886,10 @@ config CRYPTO_CMAC
 	select CRYPTO_MANAGER
 	help
 	  CMAC (Cipher-based Message Authentication Code) authentication
 	  mode (NIST SP800-38B and IETF RFC4493)
 
-config CRYPTO_GHASH
-	tristate "GHASH"
-	select CRYPTO_HASH
-	select CRYPTO_LIB_GF128MUL
-	help
-	  GCM GHASH function (NIST SP800-38D)
-
 config CRYPTO_HMAC
 	tristate "HMAC (Keyed-Hash MAC)"
 	select CRYPTO_HASH
 	select CRYPTO_MANAGER
 	help
diff --git a/crypto/Makefile b/crypto/Makefile
index 04e269117589..17f4fca9b9e5 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -169,11 +169,10 @@ CFLAGS_jitterentropy.o = -O0
 KASAN_SANITIZE_jitterentropy.o = n
 UBSAN_SANITIZE_jitterentropy.o = n
 jitterentropy_rng-y := jitterentropy.o jitterentropy-kcapi.o
 obj-$(CONFIG_CRYPTO_JITTERENTROPY_TESTINTERFACE) += jitterentropy-testing.o
 obj-$(CONFIG_CRYPTO_BENCHMARK) += tcrypt.o
-obj-$(CONFIG_CRYPTO_GHASH) += ghash-generic.o
 obj-$(CONFIG_CRYPTO_USER_API) += af_alg.o
 obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
 obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
 obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
 obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
diff --git a/crypto/ghash-generic.c b/crypto/ghash-generic.c
deleted file mode 100644
index e5803c249c12..000000000000
--- a/crypto/ghash-generic.c
+++ /dev/null
@@ -1,162 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * GHASH: hash function for GCM (Galois/Counter Mode).
- *
- * Copyright (c) 2007 Nokia Siemens Networks - Mikko Herranen <mh1@iki.fi>
- * Copyright (c) 2009 Intel Corp.
- *   Author: Huang Ying <ying.huang@intel.com>
- */
-
-/*
- * GHASH is a keyed hash function used in GCM authentication tag generation.
- *
- * The original GCM paper [1] presents GHASH as a function GHASH(H, A, C) which
- * takes a 16-byte hash key H, additional authenticated data A, and a ciphertext
- * C.  It formats A and C into a single byte string X, interprets X as a
- * polynomial over GF(2^128), and evaluates this polynomial at the point H.
- *
- * However, the NIST standard for GCM [2] presents GHASH as GHASH(H, X) where X
- * is the already-formatted byte string containing both A and C.
- *
- * "ghash" in the Linux crypto API uses the 'X' (pre-formatted) convention,
- * since the API supports only a single data stream per hash.  Thus, the
- * formatting of 'A' and 'C' is done in the "gcm" template, not in "ghash".
- *
- * The reason "ghash" is separate from "gcm" is to allow "gcm" to use an
- * accelerated "ghash" when a standalone accelerated "gcm(aes)" is unavailable.
- * It is generally inappropriate to use "ghash" for other purposes, since it is
- * an "ε-almost-XOR-universal hash function", not a cryptographic hash function.
- * It can only be used securely in crypto modes specially designed to use it.
- *
- * [1] The Galois/Counter Mode of Operation (GCM)
- *     (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.694.695&rep=rep1&type=pdf)
- * [2] Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC
- *     (https://csrc.nist.gov/publications/detail/sp/800-38d/final)
- */
-
-#include <crypto/gf128mul.h>
-#include <crypto/ghash.h>
-#include <crypto/internal/hash.h>
-#include <crypto/utils.h>
-#include <linux/err.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-static int ghash_init(struct shash_desc *desc)
-{
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memset(dctx, 0, sizeof(*dctx));
-
-	return 0;
-}
-
-static int ghash_setkey(struct crypto_shash *tfm,
-			const u8 *key, unsigned int keylen)
-{
-	struct ghash_ctx *ctx = crypto_shash_ctx(tfm);
-	be128 k;
-
-	if (keylen != GHASH_BLOCK_SIZE)
-		return -EINVAL;
-
-	if (ctx->gf128)
-		gf128mul_free_4k(ctx->gf128);
-
-	BUILD_BUG_ON(sizeof(k) != GHASH_BLOCK_SIZE);
-	memcpy(&k, key, GHASH_BLOCK_SIZE); /* avoid violating alignment rules */
-	ctx->gf128 = gf128mul_init_4k_lle(&k);
-	memzero_explicit(&k, GHASH_BLOCK_SIZE);
-
-	if (!ctx->gf128)
-		return -ENOMEM;
-
-	return 0;
-}
-
-static int ghash_update(struct shash_desc *desc,
-			 const u8 *src, unsigned int srclen)
-{
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-	struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
-	u8 *dst = dctx->buffer;
-
-	do {
-		crypto_xor(dst, src, GHASH_BLOCK_SIZE);
-		gf128mul_4k_lle((be128 *)dst, ctx->gf128);
-		src += GHASH_BLOCK_SIZE;
-		srclen -= GHASH_BLOCK_SIZE;
-	} while (srclen >= GHASH_BLOCK_SIZE);
-
-	return srclen;
-}
-
-static void ghash_flush(struct shash_desc *desc, const u8 *src,
-			unsigned int len)
-{
-	struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-	u8 *dst = dctx->buffer;
-
-	if (len) {
-		crypto_xor(dst, src, len);
-		gf128mul_4k_lle((be128 *)dst, ctx->gf128);
-	}
-}
-
-static int ghash_finup(struct shash_desc *desc, const u8 *src,
-		       unsigned int len, u8 *dst)
-{
-	struct ghash_desc_ctx *dctx = shash_desc_ctx(desc);
-	u8 *buf = dctx->buffer;
-
-	ghash_flush(desc, src, len);
-	memcpy(dst, buf, GHASH_BLOCK_SIZE);
-
-	return 0;
-}
-
-static void ghash_exit_tfm(struct crypto_tfm *tfm)
-{
-	struct ghash_ctx *ctx = crypto_tfm_ctx(tfm);
-	if (ctx->gf128)
-		gf128mul_free_4k(ctx->gf128);
-}
-
-static struct shash_alg ghash_alg = {
-	.digestsize	= GHASH_DIGEST_SIZE,
-	.init		= ghash_init,
-	.update		= ghash_update,
-	.finup		= ghash_finup,
-	.setkey		= ghash_setkey,
-	.descsize	= sizeof(struct ghash_desc_ctx),
-	.base		= {
-		.cra_name		= "ghash",
-		.cra_driver_name	= "ghash-generic",
-		.cra_priority		= 100,
-		.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize		= GHASH_BLOCK_SIZE,
-		.cra_ctxsize		= sizeof(struct ghash_ctx),
-		.cra_module		= THIS_MODULE,
-		.cra_exit		= ghash_exit_tfm,
-	},
-};
-
-static int __init ghash_mod_init(void)
-{
-	return crypto_register_shash(&ghash_alg);
-}
-
-static void __exit ghash_mod_exit(void)
-{
-	crypto_unregister_shash(&ghash_alg);
-}
-
-module_init(ghash_mod_init);
-module_exit(ghash_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("GHASH hash function");
-MODULE_ALIAS_CRYPTO("ghash");
-MODULE_ALIAS_CRYPTO("ghash-generic");
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index aded37546137..1773f5f71351 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1648,14 +1648,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 
 	case 45:
 		ret = min(ret, tcrypt_test("rfc4309(ccm(aes))"));
 		break;
 
-	case 46:
-		ret = min(ret, tcrypt_test("ghash"));
-		break;
-
 	case 48:
 		ret = min(ret, tcrypt_test("sha3-224"));
 		break;
 
 	case 49:
@@ -2249,15 +2245,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
 		fallthrough;
 	case 317:
 		test_hash_speed("blake2b-512", sec, generic_hash_speed_template);
 		if (mode > 300 && mode < 400) break;
 		fallthrough;
-	case 318:
-		klen = 16;
-		test_hash_speed("ghash", sec, generic_hash_speed_template);
-		if (mode > 300 && mode < 400) break;
-		fallthrough;
 	case 319:
 		test_hash_speed("crc32c", sec, generic_hash_speed_template);
 		if (mode > 300 && mode < 400) break;
 		fallthrough;
 	case 322:
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 0b0ad358e091..dd01f86dd6fe 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -4983,16 +4983,10 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.generic_driver = "gcm_base(ctr(sm4-generic),ghash-lib)",
 		.test = alg_test_aead,
 		.suite = {
 			.aead = __VECS(sm4_gcm_tv_template)
 		}
-	}, {
-		.alg = "ghash",
-		.test = alg_test_hash,
-		.suite = {
-			.hash = __VECS(ghash_tv_template)
-		}
 	}, {
 		.alg = "hctr2(aes)",
 		.generic_driver = "hctr2_base(xctr(aes-lib),polyval-lib)",
 		.test = alg_test_skcipher,
 		.suite = {
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 1c69c11c0cdb..a3274abacfde 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -6181,119 +6181,10 @@ static const struct hash_testvec wp256_tv_template[] = {
 			  "\x8A\x7A\x5A\x52\xDE\xEE\x65\x62"
 			  "\x07\xC5\x62\xF9\x88\xE9\x5C\x69",
 	},
 };
 
-static const struct hash_testvec ghash_tv_template[] =
-{
-	{
-		.key	= "\xdf\xa6\xbf\x4d\xed\x81\xdb\x03"
-			  "\xff\xca\xff\x95\xf8\x30\xf0\x61",
-		.ksize	= 16,
-		.plaintext = "\x95\x2b\x2a\x56\xa5\x60\x04a\xc0"
-			     "\xb3\x2b\x66\x56\xa0\x5b\x40\xb6",
-		.psize	= 16,
-		.digest	= "\xda\x53\xeb\x0a\xd2\xc5\x5b\xb6"
-			  "\x4f\xc4\x80\x2c\xc3\xfe\xda\x60",
-	}, {
-		.key	= "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b"
-			  "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b",
-		.ksize	= 16,
-		.plaintext = "what do ya want for nothing?",
-		.psize	= 28,
-		.digest	= "\x3e\x1f\x5c\x4d\x65\xf0\xef\xce"
-			  "\x0d\x61\x06\x27\x66\x51\xd5\xe2",
-	}, {
-		.key	= "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa"
-			  "\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa",
-		.ksize	= 16,
-		.plaintext = "\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd"
-			"\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd"
-			"\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd"
-			"\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd",
-		.psize	= 50,
-		.digest	= "\xfb\x49\x8a\x36\xe1\x96\xe1\x96"
-			  "\xe1\x96\xe1\x96\xe1\x96\xe1\x96",
-	}, {
-		.key	= "\xda\x53\xeb\x0a\xd2\xc5\x5b\xb6"
-			  "\x4f\xc4\x80\x2c\xc3\xfe\xda\x60",
-		.ksize	= 16,
-		.plaintext = "\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd"
-			"\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd"
-			"\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd"
-			"\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd\xcd",
-		.psize	= 50,
-		.digest	= "\x2b\x5c\x0c\x7f\x52\xd1\x60\xc2"
-			  "\x49\xed\x6e\x32\x7a\xa9\xbe\x08",
-	}, {
-		.key	= "\x95\x2b\x2a\x56\xa5\x60\x04a\xc0"
-			  "\xb3\x2b\x66\x56\xa0\x5b\x40\xb6",
-		.ksize	= 16,
-		.plaintext = "Test With Truncation",
-		.psize	= 20,
-		.digest	= "\xf8\x94\x87\x2a\x4b\x63\x99\x28"
-			  "\x23\xf7\x93\xf7\x19\xf5\x96\xd9",
-	}, {
-		.key	= "\x0a\x1b\x2c\x3d\x4e\x5f\x64\x71"
-			"\x82\x93\xa4\xb5\xc6\xd7\xe8\xf9",
-		.ksize	= 16,
-		.plaintext = "\x56\x6f\x72\x20\x6c\x61\x75\x74"
-			"\x65\x72\x20\x4c\x61\x75\x73\x63"
-			"\x68\x65\x6e\x20\x75\x6e\x64\x20"
-			"\x53\x74\x61\x75\x6e\x65\x6e\x20"
-			"\x73\x65\x69\x20\x73\x74\x69\x6c"
-			"\x6c\x2c\x0a\x64\x75\x20\x6d\x65"
-			"\x69\x6e\x20\x74\x69\x65\x66\x74"
-			"\x69\x65\x66\x65\x73\x20\x4c\x65"
-			"\x62\x65\x6e\x3b\x0a\x64\x61\x73"
-			"\x73\x20\x64\x75\x20\x77\x65\x69"
-			"\xc3\x9f\x74\x20\x77\x61\x73\x20"
-			"\x64\x65\x72\x20\x57\x69\x6e\x64"
-			"\x20\x64\x69\x72\x20\x77\x69\x6c"
-			"\x6c\x2c\x0a\x65\x68\x20\x6e\x6f"
-			"\x63\x68\x20\x64\x69\x65\x20\x42"
-			"\x69\x72\x6b\x65\x6e\x20\x62\x65"
-			"\x62\x65\x6e\x2e\x0a\x0a\x55\x6e"
-			"\x64\x20\x77\x65\x6e\x6e\x20\x64"
-			"\x69\x72\x20\x65\x69\x6e\x6d\x61"
-			"\x6c\x20\x64\x61\x73\x20\x53\x63"
-			"\x68\x77\x65\x69\x67\x65\x6e\x20"
-			"\x73\x70\x72\x61\x63\x68\x2c\x0a"
-			"\x6c\x61\x73\x73\x20\x64\x65\x69"
-			"\x6e\x65\x20\x53\x69\x6e\x6e\x65"
-			"\x20\x62\x65\x73\x69\x65\x67\x65"
-			"\x6e\x2e\x0a\x4a\x65\x64\x65\x6d"
-			"\x20\x48\x61\x75\x63\x68\x65\x20"
-			"\x67\x69\x62\x74\x20\x64\x69\x63"
-			"\x68\x2c\x20\x67\x69\x62\x20\x6e"
-			"\x61\x63\x68\x2c\x0a\x65\x72\x20"
-			"\x77\x69\x72\x64\x20\x64\x69\x63"
-			"\x68\x20\x6c\x69\x65\x62\x65\x6e"
-			"\x20\x75\x6e\x64\x20\x77\x69\x65"
-			"\x67\x65\x6e\x2e\x0a\x0a\x55\x6e"
-			"\x64\x20\x64\x61\x6e\x6e\x20\x6d"
-			"\x65\x69\x6e\x65\x20\x53\x65\x65"
-			"\x6c\x65\x20\x73\x65\x69\x74\x20"
-			"\x77\x65\x69\x74\x2c\x20\x73\x65"
-			"\x69\x20\x77\x65\x69\x74\x2c\x0a"
-			"\x64\x61\x73\x73\x20\x64\x69\x72"
-			"\x20\x64\x61\x73\x20\x4c\x65\x62"
-			"\x65\x6e\x20\x67\x65\x6c\x69\x6e"
-			"\x67\x65\x2c\x0a\x62\x72\x65\x69"
-			"\x74\x65\x20\x64\x69\x63\x68\x20"
-			"\x77\x69\x65\x20\x65\x69\x6e\x20"
-			"\x46\x65\x69\x65\x72\x6b\x6c\x65"
-			"\x69\x64\x0a\xc3\xbc\x62\x65\x72"
-			"\x20\x64\x69\x65\x20\x73\x69\x6e"
-			"\x6e\x65\x6e\x64\x65\x6e\x20\x44"
-			"\x69\x6e\x67\x65\x2e\x2e\x2e\x0a",
-		.psize	= 400,
-		.digest = "\xad\xb1\xc1\xe9\x56\x70\x31\x1d"
-			"\xbb\x5b\xdf\x5e\x70\x72\x1a\x57",
-	},
-};
-
 /*
  * HMAC-MD5 test vectors from RFC2202
  * (These need to be fixed to not use strlen).
  */
 static const struct hash_testvec hmac_md5_tv_template[] =
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 17/19] lib/crypto: gf128mul: Remove unused 4k_lle functions
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (15 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 16/19] crypto: ghash - Remove ghash from crypto_shash API Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 18/19] lib/crypto: gf128hash: Remove unused content from ghash.h Eric Biggers
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Remove the 4k_lle multiplication functions and the associated
gf128mul_table_le data table.  Their only user was the generic
implementation of GHASH, which has now been changed to use a different
implementation based on standard integer multiplication.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 include/crypto/gf128mul.h | 17 ++-------
 lib/crypto/gf128mul.c     | 73 +--------------------------------------
 2 files changed, 4 insertions(+), 86 deletions(-)

diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
index b0853f7cada0..6ed2a8351902 100644
--- a/include/crypto/gf128mul.h
+++ b/include/crypto/gf128mul.h
@@ -213,29 +213,18 @@ static inline void gf128mul_x_ble(le128 *r, const le128 *x)
 
 	r->a = cpu_to_le64((a << 1) | (b >> 63));
 	r->b = cpu_to_le64((b << 1) ^ _tt);
 }
 
-/* 4k table optimization */
-
-struct gf128mul_4k {
-	be128 t[256];
-};
-
-struct gf128mul_4k *gf128mul_init_4k_lle(const be128 *g);
-void gf128mul_4k_lle(be128 *a, const struct gf128mul_4k *t);
 void gf128mul_x8_ble(le128 *r, const le128 *x);
-static inline void gf128mul_free_4k(struct gf128mul_4k *t)
-{
-	kfree_sensitive(t);
-}
-
 
 /* 64k table optimization, implemented for bbe */
 
 struct gf128mul_64k {
-	struct gf128mul_4k *t[16];
+	struct {
+		be128 t[256];
+	} *t[16];
 };
 
 /* First initialize with the constant factor with which you
  * want to multiply and then call gf128mul_64k_bbe with the other
  * factor in the first argument, and the table in the second.
diff --git a/lib/crypto/gf128mul.c b/lib/crypto/gf128mul.c
index e5a727b15f07..7ebf07ce1168 100644
--- a/lib/crypto/gf128mul.c
+++ b/lib/crypto/gf128mul.c
@@ -125,31 +125,13 @@
 	(i & 0x20 ? 0x3840 : 0) ^ (i & 0x10 ? 0x1c20 : 0) ^ \
 	(i & 0x08 ? 0x0e10 : 0) ^ (i & 0x04 ? 0x0708 : 0) ^ \
 	(i & 0x02 ? 0x0384 : 0) ^ (i & 0x01 ? 0x01c2 : 0) \
 )
 
-static const u16 gf128mul_table_le[256] = gf128mul_dat(xda_le);
 static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be);
 
-/*
- * The following functions multiply a field element by x^8 in
- * the polynomial field representation.  They use 64-bit word operations
- * to gain speed but compensate for machine endianness and hence work
- * correctly on both styles of machine.
- */
-
-static void gf128mul_x8_lle(be128 *x)
-{
-	u64 a = be64_to_cpu(x->a);
-	u64 b = be64_to_cpu(x->b);
-	u64 _tt = gf128mul_table_le[b & 0xff];
-
-	x->b = cpu_to_be64((b >> 8) | (a << 56));
-	x->a = cpu_to_be64((a >> 8) ^ (_tt << 48));
-}
-
-/* time invariant version of gf128mul_x8_lle */
+/* A table-less implementation of multiplying by x^8 */
 static void gf128mul_x8_lle_ti(be128 *x)
 {
 	u64 a = be64_to_cpu(x->a);
 	u64 b = be64_to_cpu(x->b);
 	u64 _tt = xda_le(b & 0xff); /* avoid table lookup */
@@ -303,60 +285,7 @@ void gf128mul_64k_bbe(be128 *a, const struct gf128mul_64k *t)
 		be128_xor(r, r, &t->t[i]->t[ap[15 - i]]);
 	*a = *r;
 }
 EXPORT_SYMBOL(gf128mul_64k_bbe);
 
-/*      This version uses 4k bytes of table space.
-    A 16 byte buffer has to be multiplied by a 16 byte key
-    value in GF(2^128).  If we consider a GF(2^128) value in a
-    single byte, we can construct a table of the 256 16 byte
-    values that result from the 256 values of this byte.
-    This requires 4096 bytes. If we take the highest byte in
-    the buffer and use this table to get the result, we then
-    have to multiply by x^120 to get the final value. For the
-    next highest byte the result has to be multiplied by x^112
-    and so on. But we can do this by accumulating the result
-    in an accumulator starting with the result for the top
-    byte.  We repeatedly multiply the accumulator value by
-    x^8 and then add in (i.e. xor) the 16 bytes of the next
-    lower byte in the buffer, stopping when we reach the
-    lowest byte. This requires a 4096 byte table.
-*/
-struct gf128mul_4k *gf128mul_init_4k_lle(const be128 *g)
-{
-	struct gf128mul_4k *t;
-	int j, k;
-
-	t = kzalloc_obj(*t);
-	if (!t)
-		goto out;
-
-	t->t[128] = *g;
-	for (j = 64; j > 0; j >>= 1)
-		gf128mul_x_lle(&t->t[j], &t->t[j+j]);
-
-	for (j = 2; j < 256; j += j)
-		for (k = 1; k < j; ++k)
-			be128_xor(&t->t[j + k], &t->t[j], &t->t[k]);
-
-out:
-	return t;
-}
-EXPORT_SYMBOL(gf128mul_init_4k_lle);
-
-void gf128mul_4k_lle(be128 *a, const struct gf128mul_4k *t)
-{
-	u8 *ap = (u8 *)a;
-	be128 r[1];
-	int i = 15;
-
-	*r = t->t[ap[15]];
-	while (i--) {
-		gf128mul_x8_lle(r);
-		be128_xor(r, r, &t->t[ap[i]]);
-	}
-	*a = *r;
-}
-EXPORT_SYMBOL(gf128mul_4k_lle);
-
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Functions for multiplying elements of GF(2^128)");
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 18/19] lib/crypto: gf128hash: Remove unused content from ghash.h
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (16 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 17/19] lib/crypto: gf128mul: Remove unused 4k_lle functions Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-19  6:17 ` [PATCH 19/19] lib/crypto: aesgcm: Use GHASH library API Eric Biggers
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Now that the structures in <crypto/ghash.h> are no longer used, remove
them.  Since this leaves <crypto/ghash.h> as just containing constants,
include it from <crypto/gf128hash.h> to deduplicate these definitions.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 include/crypto/gf128hash.h |  3 +--
 include/crypto/ghash.h     | 12 ------------
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/include/crypto/gf128hash.h b/include/crypto/gf128hash.h
index 0bc649d01e12..41c557d55965 100644
--- a/include/crypto/gf128hash.h
+++ b/include/crypto/gf128hash.h
@@ -6,15 +6,14 @@
  */
 
 #ifndef _CRYPTO_GF128HASH_H
 #define _CRYPTO_GF128HASH_H
 
+#include <crypto/ghash.h>
 #include <linux/string.h>
 #include <linux/types.h>
 
-#define GHASH_BLOCK_SIZE	16
-#define GHASH_DIGEST_SIZE	16
 #define POLYVAL_BLOCK_SIZE	16
 #define POLYVAL_DIGEST_SIZE	16
 
 /**
  * struct polyval_elem - An element of the POLYVAL finite field
diff --git a/include/crypto/ghash.h b/include/crypto/ghash.h
index 043d938e9a2c..d187e5af9925 100644
--- a/include/crypto/ghash.h
+++ b/include/crypto/ghash.h
@@ -4,21 +4,9 @@
  */
 
 #ifndef __CRYPTO_GHASH_H__
 #define __CRYPTO_GHASH_H__
 
-#include <linux/types.h>
-
 #define GHASH_BLOCK_SIZE	16
 #define GHASH_DIGEST_SIZE	16
 
-struct gf128mul_4k;
-
-struct ghash_ctx {
-	struct gf128mul_4k *gf128;
-};
-
-struct ghash_desc_ctx {
-	u8 buffer[GHASH_BLOCK_SIZE];
-};
-
 #endif
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 19/19] lib/crypto: aesgcm: Use GHASH library API
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (17 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 18/19] lib/crypto: gf128hash: Remove unused content from ghash.h Eric Biggers
@ 2026-03-19  6:17 ` Eric Biggers
  2026-03-23 14:14 ` [PATCH 00/19] GHASH library Ard Biesheuvel
  2026-03-24  0:50 ` Eric Biggers
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-19  6:17 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86,
	Eric Biggers

Make the AES-GCM library use the GHASH library instead of directly
calling gf128mul_lle().  This allows the architecture-optimized GHASH
implementations to be used, or the improved generic implementation if no
architecture-optimized implementation is usable.

Note: this means that <crypto/gcm.h> no longer needs to include
<crypto/gf128mul.h>.  Remove that inclusion, and include
<crypto/gf128mul.h> explicitly from arch/x86/crypto/aesni-intel_glue.c
which previously was relying on the transitive inclusion.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/x86/crypto/aesni-intel_glue.c |  1 +
 include/crypto/gcm.h               |  4 +--
 lib/crypto/Kconfig                 |  2 +-
 lib/crypto/aesgcm.c                | 55 +++++++++++++++---------------
 4 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index e6c38d1d8a92..f522fff9231e 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -23,10 +23,11 @@
 #include <linux/err.h>
 #include <crypto/algapi.h>
 #include <crypto/aes.h>
 #include <crypto/b128ops.h>
 #include <crypto/gcm.h>
+#include <crypto/gf128mul.h>
 #include <crypto/xts.h>
 #include <asm/cpu_device_id.h>
 #include <asm/simd.h>
 #include <crypto/scatterwalk.h>
 #include <crypto/internal/aead.h>
diff --git a/include/crypto/gcm.h b/include/crypto/gcm.h
index b524e47bd4d0..1d5f39ff1dc4 100644
--- a/include/crypto/gcm.h
+++ b/include/crypto/gcm.h
@@ -2,11 +2,11 @@
 #define _CRYPTO_GCM_H
 
 #include <linux/errno.h>
 
 #include <crypto/aes.h>
-#include <crypto/gf128mul.h>
+#include <crypto/gf128hash.h>
 
 #define GCM_AES_IV_SIZE 12
 #define GCM_RFC4106_IV_SIZE 8
 #define GCM_RFC4543_IV_SIZE 8
 
@@ -63,11 +63,11 @@ static inline int crypto_ipsec_check_assoclen(unsigned int assoclen)
 
 	return 0;
 }
 
 struct aesgcm_ctx {
-	be128			ghash_key;
+	struct ghash_key	ghash_key;
 	struct aes_enckey	aes_key;
 	unsigned int		authsize;
 };
 
 int aesgcm_expandkey(struct aesgcm_ctx *ctx, const u8 *key,
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index a39e7707e9ee..32fafe245f47 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -39,11 +39,11 @@ config CRYPTO_LIB_AES_CBC_MACS
 	  <crypto/aes-cbc-macs.h>.
 
 config CRYPTO_LIB_AESGCM
 	tristate
 	select CRYPTO_LIB_AES
-	select CRYPTO_LIB_GF128MUL
+	select CRYPTO_LIB_GF128HASH
 	select CRYPTO_LIB_UTILS
 
 config CRYPTO_LIB_ARC4
 	tristate
 
diff --git a/lib/crypto/aesgcm.c b/lib/crypto/aesgcm.c
index 02f5b5f32c76..8c7e74d2d147 100644
--- a/lib/crypto/aesgcm.c
+++ b/lib/crypto/aesgcm.c
@@ -3,13 +3,12 @@
  * Minimal library implementation of GCM
  *
  * Copyright 2022 Google LLC
  */
 
-#include <crypto/algapi.h>
 #include <crypto/gcm.h>
-#include <crypto/ghash.h>
+#include <crypto/utils.h>
 #include <linux/export.h>
 #include <linux/module.h>
 #include <asm/irqflags.h>
 
 static void aesgcm_encrypt_block(const struct aes_enckey *key, void *dst,
@@ -43,37 +42,26 @@ static void aesgcm_encrypt_block(const struct aes_enckey *key, void *dst,
  * that are not permitted by the GCM specification.
  */
 int aesgcm_expandkey(struct aesgcm_ctx *ctx, const u8 *key,
 		     unsigned int keysize, unsigned int authsize)
 {
-	u8 kin[AES_BLOCK_SIZE] = {};
+	u8 h[AES_BLOCK_SIZE] = {};
 	int ret;
 
 	ret = crypto_gcm_check_authsize(authsize) ?:
 	      aes_prepareenckey(&ctx->aes_key, key, keysize);
 	if (ret)
 		return ret;
 
 	ctx->authsize = authsize;
-	aesgcm_encrypt_block(&ctx->aes_key, &ctx->ghash_key, kin);
-
+	aesgcm_encrypt_block(&ctx->aes_key, h, h);
+	ghash_preparekey(&ctx->ghash_key, h);
+	memzero_explicit(h, sizeof(h));
 	return 0;
 }
 EXPORT_SYMBOL(aesgcm_expandkey);
 
-static void aesgcm_ghash(be128 *ghash, const be128 *key, const void *src,
-			 int len)
-{
-	while (len > 0) {
-		crypto_xor((u8 *)ghash, src, min(len, GHASH_BLOCK_SIZE));
-		gf128mul_lle(ghash, key);
-
-		src += GHASH_BLOCK_SIZE;
-		len -= GHASH_BLOCK_SIZE;
-	}
-}
-
 /**
  * aesgcm_mac - Generates the authentication tag using AES-GCM algorithm.
  * @ctx: The data structure that will hold the AES-GCM key schedule
  * @src: The input source data.
  * @src_len: Length of the source data.
@@ -86,24 +74,37 @@ static void aesgcm_ghash(be128 *ghash, const be128 *key, const void *src,
  * and an output buffer for the authentication tag.
  */
 static void aesgcm_mac(const struct aesgcm_ctx *ctx, const u8 *src, int src_len,
 		       const u8 *assoc, int assoc_len, __be32 *ctr, u8 *authtag)
 {
-	be128 tail = { cpu_to_be64(assoc_len * 8), cpu_to_be64(src_len * 8) };
-	u8 buf[AES_BLOCK_SIZE];
-	be128 ghash = {};
+	static const u8 zeroes[GHASH_BLOCK_SIZE];
+	__be64 tail[2] = {
+		cpu_to_be64((u64)assoc_len * 8),
+		cpu_to_be64((u64)src_len * 8),
+	};
+	struct ghash_ctx ghash;
+	u8 ghash_out[AES_BLOCK_SIZE];
+	u8 enc_ctr[AES_BLOCK_SIZE];
+
+	ghash_init(&ghash, &ctx->ghash_key);
+
+	ghash_update(&ghash, assoc, assoc_len);
+	ghash_update(&ghash, zeroes, -assoc_len & (GHASH_BLOCK_SIZE - 1));
 
-	aesgcm_ghash(&ghash, &ctx->ghash_key, assoc, assoc_len);
-	aesgcm_ghash(&ghash, &ctx->ghash_key, src, src_len);
-	aesgcm_ghash(&ghash, &ctx->ghash_key, &tail, sizeof(tail));
+	ghash_update(&ghash, src, src_len);
+	ghash_update(&ghash, zeroes, -src_len & (GHASH_BLOCK_SIZE - 1));
+
+	ghash_update(&ghash, (const u8 *)&tail, sizeof(tail));
+
+	ghash_final(&ghash, ghash_out);
 
 	ctr[3] = cpu_to_be32(1);
-	aesgcm_encrypt_block(&ctx->aes_key, buf, ctr);
-	crypto_xor_cpy(authtag, buf, (u8 *)&ghash, ctx->authsize);
+	aesgcm_encrypt_block(&ctx->aes_key, enc_ctr, ctr);
+	crypto_xor_cpy(authtag, ghash_out, enc_ctr, ctx->authsize);
 
-	memzero_explicit(&ghash, sizeof(ghash));
-	memzero_explicit(buf, sizeof(buf));
+	memzero_explicit(ghash_out, sizeof(ghash_out));
+	memzero_explicit(enc_ctr, sizeof(enc_ctr));
 }
 
 static void aesgcm_crypt(const struct aesgcm_ctx *ctx, u8 *dst, const u8 *src,
 			 int len, __be32 *ctr)
 {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 00/19] GHASH library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (18 preceding siblings ...)
  2026-03-19  6:17 ` [PATCH 19/19] lib/crypto: aesgcm: Use GHASH library API Eric Biggers
@ 2026-03-23 14:14 ` Ard Biesheuvel
  2026-03-24  0:50 ` Eric Biggers
  20 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2026-03-23 14:14 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto
  Cc: linux-kernel, Jason A . Donenfeld, Herbert Xu, linux-arm-kernel,
	linuxppc-dev, linux-riscv, linux-s390, x86



On Thu, 19 Mar 2026, at 07:17, Eric Biggers wrote:
> This series is targeting libcrypto-next.  It can also be retrieved from:
>
>     git fetch 
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git 
> ghash-lib-v1
>
> This series migrates the standalone GHASH code to lib/crypto/, then
> converts the "gcm" template and AES-GCM library code to use it.  (GHASH
> is the universal hash function used by GCM mode.)  As was the case with
> POLYVAL and Poly1305 as well, the library is a much better fit for it.
>
> Since GHASH and POLYVAL are closely related and it often makes sense to
> implement one in terms of the other, the existing "polyval" library
> module is renamed to "gf128hash" and the GHASH support is added to it.
>
> The generic implementation of GHASH is also replaced with a better one
> utilizing the existing polyval_mul_generic().
>
> Note that some GHASH implementations, often faster ones using more
> recent CPU features, still exist in arch/*/crypto/ as internal
> components of AES-GCM implementations.  Those are left as-is for now.
> The goal with this GHASH library is just to provide parity with the
> existing standalone GHASH support, which is used when a full
> implementation of AES-GCM (or ${someothercipher}-GCM, if another block
> cipher is being used) is unavailable.  Migrating the
> architecture-optimized AES-GCM code to lib/crypto/ will be a next step.
>
> Eric Biggers (19):
>   lib/crypto: gf128hash: Rename polyval module to gf128hash
>   lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL
>     functions
>   lib/crypto: gf128hash: Add GHASH support
>   lib/crypto: tests: Add KUnit tests for GHASH
>   crypto: arm/ghash - Make the "ghash" crypto_shash NEON-only
>   crypto: arm/ghash - Move NEON GHASH assembly into its own file
>   lib/crypto: arm/ghash: Migrate optimized code into library
>   crypto: arm64/ghash - Move NEON GHASH assembly into its own file
>   lib/crypto: arm64/ghash: Migrate optimized code into library
>   crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized
>   lib/crypto: powerpc/ghash: Migrate optimized code into library
>   lib/crypto: riscv/ghash: Migrate optimized code into library
>   lib/crypto: s390/ghash: Migrate optimized code into library
>   lib/crypto: x86/ghash: Migrate optimized code into library
>   crypto: gcm - Use GHASH library instead of crypto_ahash
>   crypto: ghash - Remove ghash from crypto_shash API
>   lib/crypto: gf128mul: Remove unused 4k_lle functions
>   lib/crypto: gf128hash: Remove unused content from ghash.h
>   lib/crypto: aesgcm: Use GHASH library API
>

Acked-by: Ard Biesheuvel <ardb@kernel.org>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 00/19] GHASH library
  2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
                   ` (19 preceding siblings ...)
  2026-03-23 14:14 ` [PATCH 00/19] GHASH library Ard Biesheuvel
@ 2026-03-24  0:50 ` Eric Biggers
  20 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2026-03-24  0:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-s390, x86

On Wed, Mar 18, 2026 at 11:17:01PM -0700, Eric Biggers wrote:
> This series is targeting libcrypto-next.  It can also be retrieved from:
> 
>     git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git ghash-lib-v1
> 
> This series migrates the standalone GHASH code to lib/crypto/, then
> converts the "gcm" template and AES-GCM library code to use it.  (GHASH
> is the universal hash function used by GCM mode.)  As was the case with
> POLYVAL and Poly1305 as well, the library is a much better fit for it.
> 
> Since GHASH and POLYVAL are closely related and it often makes sense to
> implement one in terms of the other, the existing "polyval" library
> module is renamed to "gf128hash" and the GHASH support is added to it.
> 
> The generic implementation of GHASH is also replaced with a better one
> utilizing the existing polyval_mul_generic().
> 
> Note that some GHASH implementations, often faster ones using more
> recent CPU features, still exist in arch/*/crypto/ as internal
> components of AES-GCM implementations.  Those are left as-is for now.
> The goal with this GHASH library is just to provide parity with the
> existing standalone GHASH support, which is used when a full
> implementation of AES-GCM (or ${someothercipher}-GCM, if another block
> cipher is being used) is unavailable.  Migrating the
> architecture-optimized AES-GCM code to lib/crypto/ will be a next step.
> 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next

As usual, the s390 code will need to be tested by someone who has the
privilege of access to a z/Architecture mainframe.  That is the only way
to test that code, given that the s390 community has not yet updated
QEMU to support the CPACF_KIMD_GHASH instruction.

From another review pass I also folded in some trivial cleanups that
don't seem worth sending a v2 for unless something else comes up.
Removed a definition I forgot to remove, dropped unnecessary rename of
'h' to 'k', improved consistency in a couple places, etc.

diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index c74066d430fa..eaf2932ceaf5 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -35,10 +35,6 @@ struct arm_ghash_key {
 	u64			h[4][2];
 };
 
-struct arm_ghash_desc_ctx {
-	u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
-};
-
 struct gcm_aes_ctx {
 	struct aes_enckey	aes_key;
 	u8			nonce[RFC4106_NONCE_SIZE];
diff --git a/lib/crypto/arm/gf128hash.h b/lib/crypto/arm/gf128hash.h
index cb929bed29d5..c33c8cbe51fe 100644
--- a/lib/crypto/arm/gf128hash.h
+++ b/lib/crypto/arm/gf128hash.h
@@ -12,7 +12,7 @@
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_neon);
 
 void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
-			   const u8 *src, const struct polyval_elem *k);
+			   const u8 *src, const struct polyval_elem *h);
 
 #define ghash_blocks_arch ghash_blocks_arch
 static void ghash_blocks_arch(struct polyval_elem *acc,
diff --git a/lib/crypto/arm/ghash-neon-core.S b/lib/crypto/arm/ghash-neon-core.S
index bf423fb06a75..eeffd12504a9 100644
--- a/lib/crypto/arm/ghash-neon-core.S
+++ b/lib/crypto/arm/ghash-neon-core.S
@@ -181,7 +181,7 @@
 	/*
 	 * void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
 	 *			      const u8 *src,
-	 *			      const struct polyval_elem *k)
+	 *			      const struct polyval_elem *h)
 	 */
 ENTRY(pmull_ghash_update_p8)
 	vld1.64		{SHASH}, [r3]
diff --git a/lib/crypto/arm64/gf128hash.h b/lib/crypto/arm64/gf128hash.h
index d5ef1b1b77e1..b2c85585b758 100644
--- a/lib/crypto/arm64/gf128hash.h
+++ b/lib/crypto/arm64/gf128hash.h
@@ -12,14 +12,14 @@
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_asimd);
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_pmull);
 
+asmlinkage void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
+				      const u8 *src,
+				      const struct polyval_elem *h);
 asmlinkage void polyval_mul_pmull(struct polyval_elem *a,
 				  const struct polyval_elem *b);
 asmlinkage void polyval_blocks_pmull(struct polyval_elem *acc,
 				     const struct polyval_key *key,
 				     const u8 *data, size_t nblocks);
-asmlinkage void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
-				      const u8 *src,
-				      const struct polyval_elem *k);
 
 #define polyval_preparekey_arch polyval_preparekey_arch
 static void polyval_preparekey_arch(struct polyval_key *key,
@@ -91,8 +91,8 @@ static void ghash_blocks_arch(struct polyval_elem *acc,
 	if (static_branch_likely(&have_asimd) && may_use_simd()) {
 		do {
 			/* Allow rescheduling every 4 KiB. */
-			size_t n =
-				min_t(size_t, nblocks, 4096 / GHASH_BLOCK_SIZE);
+			size_t n = min_t(size_t, nblocks,
+					 4096 / GHASH_BLOCK_SIZE);
 
 			scoped_ksimd()
 				pmull_ghash_update_p8(n, acc, data, &key->h);
diff --git a/lib/crypto/arm64/ghash-neon-core.S b/lib/crypto/arm64/ghash-neon-core.S
index eadd6da47247..85b20fcd98fe 100644
--- a/lib/crypto/arm64/ghash-neon-core.S
+++ b/lib/crypto/arm64/ghash-neon-core.S
@@ -180,7 +180,7 @@
 	/*
 	 * void pmull_ghash_update_p8(size_t blocks, struct polyval_elem *dg,
 	 *			      const u8 *src,
-	 *			      const struct polyval_elem *k)
+	 *			      const struct polyval_elem *h)
 	 */
 SYM_FUNC_START(pmull_ghash_update_p8)
 	ld1		{SHASH.2d}, [x3]
diff --git a/lib/crypto/riscv/ghash-riscv64-zvkg.S b/lib/crypto/riscv/ghash-riscv64-zvkg.S
index 2839ff1a990c..6a2a2f2bc7c8 100644
--- a/lib/crypto/riscv/ghash-riscv64-zvkg.S
+++ b/lib/crypto/riscv/ghash-riscv64-zvkg.S
@@ -55,6 +55,8 @@
 // void ghash_zvkg(u8 accumulator[GHASH_BLOCK_SIZE],
 //		   const u8 key[GHASH_BLOCK_SIZE],
 //		   const u8 *data, size_t nblocks);
+//
+// |nblocks| must be nonzero.
 SYM_FUNC_START(ghash_zvkg)
 	vsetivli	zero, 4, e32, m1, ta, ma
 	vle32.v		v1, (ACCUMULATOR)
diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig
index 279ff1a339be..5b60d5c3644b 100644
--- a/lib/crypto/tests/Kconfig
+++ b/lib/crypto/tests/Kconfig
@@ -41,7 +41,7 @@ config CRYPTO_LIB_GHASH_KUNIT_TEST
 	default KUNIT_ALL_TESTS
 	select CRYPTO_LIB_BENCHMARK_VISIBLE
 	help
-	  KUnit tests for GHASH library functions.
+	  KUnit tests for the GHASH library functions.
 
 config CRYPTO_LIB_MD5_KUNIT_TEST
 	tristate "KUnit tests for MD5" if !KUNIT_ALL_TESTS


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-24  0:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  6:17 [PATCH 00/19] GHASH library Eric Biggers
2026-03-19  6:17 ` [PATCH 01/19] lib/crypto: gf128hash: Rename polyval module to gf128hash Eric Biggers
2026-03-19  6:17 ` [PATCH 02/19] lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL functions Eric Biggers
2026-03-19  6:17 ` [PATCH 03/19] lib/crypto: gf128hash: Add GHASH support Eric Biggers
2026-03-19  6:17 ` [PATCH 04/19] lib/crypto: tests: Add KUnit tests for GHASH Eric Biggers
2026-03-19  6:17 ` [PATCH 05/19] crypto: arm/ghash - Make the "ghash" crypto_shash NEON-only Eric Biggers
2026-03-19  6:17 ` [PATCH 06/19] crypto: arm/ghash - Move NEON GHASH assembly into its own file Eric Biggers
2026-03-19  6:17 ` [PATCH 07/19] lib/crypto: arm/ghash: Migrate optimized code into library Eric Biggers
2026-03-19  6:17 ` [PATCH 08/19] crypto: arm64/ghash - Move NEON GHASH assembly into its own file Eric Biggers
2026-03-19  6:17 ` [PATCH 09/19] lib/crypto: arm64/ghash: Migrate optimized code into library Eric Biggers
2026-03-19  6:17 ` [PATCH 10/19] crypto: arm64/aes-gcm - Rename struct ghash_key and make fixed-sized Eric Biggers
2026-03-19  6:17 ` [PATCH 11/19] lib/crypto: powerpc/ghash: Migrate optimized code into library Eric Biggers
2026-03-19  6:17 ` [PATCH 12/19] lib/crypto: riscv/ghash: " Eric Biggers
2026-03-19  6:17 ` [PATCH 13/19] lib/crypto: s390/ghash: " Eric Biggers
2026-03-19  6:17 ` [PATCH 14/19] lib/crypto: x86/ghash: " Eric Biggers
2026-03-19  6:17 ` [PATCH 15/19] crypto: gcm - Use GHASH library instead of crypto_ahash Eric Biggers
2026-03-19  6:17 ` [PATCH 16/19] crypto: ghash - Remove ghash from crypto_shash API Eric Biggers
2026-03-19  6:17 ` [PATCH 17/19] lib/crypto: gf128mul: Remove unused 4k_lle functions Eric Biggers
2026-03-19  6:17 ` [PATCH 18/19] lib/crypto: gf128hash: Remove unused content from ghash.h Eric Biggers
2026-03-19  6:17 ` [PATCH 19/19] lib/crypto: aesgcm: Use GHASH library API Eric Biggers
2026-03-23 14:14 ` [PATCH 00/19] GHASH library Ard Biesheuvel
2026-03-24  0:50 ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox