linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/15] SHA-3 library
@ 2025-10-26  5:50 Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
                   ` (17 more replies)
  0 siblings, 18 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

This series is targeting libcrypto-next.  It can also be retrieved from:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git sha3-lib-v2

This series adds SHA-3 support to lib/crypto/.  This includes support
for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
and also support for the extendable-output functions SHAKE128 and
SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.

The architecture-optimized SHA-3 code for arm64 and s390 is migrated
into lib/crypto/.  (The existing s390 code couldn't really be reused, so
really I rewrote it from scratch.)  This makes the SHA-3 library
functions be accelerated on these architectures.

Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
algorithms are reimplemented on top of the library API.

If the s390 folks could re-test the s390 optimized SHA-3 code (by
enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
would be helpful.  QEMU doesn't support the instructions it uses.  Also,
it would be helpful to provide the benchmark output from just before
"lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
functions".  Then we can verify that each change is useful.

Changed in v2:
  - Added missing selection of CRYPTO_LIB_SHA3 from CRYPTO_SHA3.
  - Fixed a bug where incorrect SHAKE output was produced if a
    zero-length squeeze was followed by a nonzero-length squeeze.
  - Improved the SHAKE tests.
  - Utilized the one-shot SHA-3 digest instructions on s390.
  - Split the s390 changes into several patches.
  - Folded some of my patches into David's.
  - Dropped some unnecessary changes from the first 2 patches.
  - Lots more cleanups, mainly to "lib/crypto: sha3: Add SHA-3 support".

Changed in v1 (vs. first 5 patches of David's v6 patchset):
  - Migrated the arm64 and s390 code into lib/crypto/
  - Simplified the library API
  - Added FIPS test
  - Many other fixes and improvements

The first 5 patches are derived from David's v6 patchset
(https://lore.kernel.org/linux-crypto/20251017144311.817771-1-dhowells@redhat.com/).
Earlier changelogs can be found there.

David Howells (5):
  crypto: s390/sha3 - Rename conflicting functions
  crypto: arm64/sha3 - Rename conflicting function
  lib/crypto: sha3: Add SHA-3 support
  lib/crypto: sha3: Move SHA3 Iota step mapping into round function
  lib/crypto: tests: Add SHA3 kunit tests

Eric Biggers (10):
  lib/crypto: tests: Add additional SHAKE tests
  lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
  crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library
  lib/crypto: arm64/sha3: Migrate optimized code into library
  lib/crypto: s390/sha3: Add optimized Keccak functions
  lib/crypto: sha3: Support arch overrides of one-shot digest functions
  lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
  crypto: jitterentropy - Use default sha3 implementation
  crypto: sha3 - Reimplement using library API
  crypto: s390/sha3 - Remove superseded SHA-3 code

 Documentation/crypto/index.rst                |   1 +
 Documentation/crypto/sha3.rst                 | 130 ++++++
 arch/arm64/configs/defconfig                  |   2 +-
 arch/arm64/crypto/Kconfig                     |  11 -
 arch/arm64/crypto/Makefile                    |   3 -
 arch/arm64/crypto/sha3-ce-glue.c              | 151 -------
 arch/s390/configs/debug_defconfig             |   3 +-
 arch/s390/configs/defconfig                   |   3 +-
 arch/s390/crypto/Kconfig                      |  20 -
 arch/s390/crypto/Makefile                     |   2 -
 arch/s390/crypto/sha.h                        |  51 ---
 arch/s390/crypto/sha3_256_s390.c              | 157 -------
 arch/s390/crypto/sha3_512_s390.c              | 157 -------
 arch/s390/crypto/sha_common.c                 | 117 -----
 crypto/Kconfig                                |   1 +
 crypto/Makefile                               |   2 +-
 crypto/jitterentropy-kcapi.c                  |  12 +-
 crypto/sha3.c                                 | 166 +++++++
 crypto/sha3_generic.c                         | 290 ------------
 crypto/testmgr.c                              |   8 +
 include/crypto/sha3.h                         | 306 ++++++++++++-
 lib/crypto/Kconfig                            |  13 +
 lib/crypto/Makefile                           |  10 +
 .../crypto/arm64}/sha3-ce-core.S              |  67 +--
 lib/crypto/arm64/sha3.h                       |  62 +++
 lib/crypto/fips.h                             |   7 +
 lib/crypto/s390/sha3.h                        | 151 +++++++
 lib/crypto/sha3.c                             | 411 +++++++++++++++++
 lib/crypto/tests/Kconfig                      |  11 +
 lib/crypto/tests/Makefile                     |   1 +
 lib/crypto/tests/sha3-testvecs.h              | 249 +++++++++++
 lib/crypto/tests/sha3_kunit.c                 | 422 ++++++++++++++++++
 scripts/crypto/gen-fips-testvecs.py           |   4 +
 scripts/crypto/gen-hash-testvecs.py           |  27 +-
 34 files changed, 2012 insertions(+), 1016 deletions(-)
 create mode 100644 Documentation/crypto/sha3.rst
 delete mode 100644 arch/arm64/crypto/sha3-ce-glue.c
 delete mode 100644 arch/s390/crypto/sha.h
 delete mode 100644 arch/s390/crypto/sha3_256_s390.c
 delete mode 100644 arch/s390/crypto/sha3_512_s390.c
 delete mode 100644 arch/s390/crypto/sha_common.c
 create mode 100644 crypto/sha3.c
 delete mode 100644 crypto/sha3_generic.c
 rename {arch/arm64/crypto => lib/crypto/arm64}/sha3-ce-core.S (84%)
 create mode 100644 lib/crypto/arm64/sha3.h
 create mode 100644 lib/crypto/s390/sha3.h
 create mode 100644 lib/crypto/sha3.c
 create mode 100644 lib/crypto/tests/sha3-testvecs.h
 create mode 100644 lib/crypto/tests/sha3_kunit.c

base-commit: e3068492d0016d0ea9a1ff07dbfa624d2ec773ca
-- 
2.51.1.dirty



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function Eric Biggers
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

From: David Howells <dhowells@redhat.com>

Rename the s390 sha3_*_init() functions to have an "s390_" prefix to
avoid a name conflict with the upcoming SHA-3 library functions.

Note: this code will be superseded later.  This commit simply keeps the
kernel building for the initial introduction of the library.

Signed-off-by: David Howells <dhowells@redhat.com>
[EB: dropped unnecessary rename of import and export functions, and
     improved commit message]
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/s390/crypto/sha3_256_s390.c | 10 +++++-----
 arch/s390/crypto/sha3_512_s390.c | 10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/s390/crypto/sha3_256_s390.c b/arch/s390/crypto/sha3_256_s390.c
index 03bb4f4bab701..7415d56649a52 100644
--- a/arch/s390/crypto/sha3_256_s390.c
+++ b/arch/s390/crypto/sha3_256_s390.c
@@ -17,11 +17,11 @@
 #include <linux/module.h>
 #include <linux/string.h>
 
 #include "sha.h"
 
-static int sha3_256_init(struct shash_desc *desc)
+static int s390_sha3_256_init(struct shash_desc *desc)
 {
 	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
 
 	sctx->first_message_part = test_facility(86);
 	if (!sctx->first_message_part)
@@ -77,11 +77,11 @@ static int sha3_224_import(struct shash_desc *desc, const void *in)
 	return 0;
 }
 
 static struct shash_alg sha3_256_alg = {
 	.digestsize	=	SHA3_256_DIGEST_SIZE,	   /* = 32 */
-	.init		=	sha3_256_init,
+	.init		=	s390_sha3_256_init,
 	.update		=	s390_sha_update_blocks,
 	.finup		=	s390_sha_finup,
 	.export		=	sha3_256_export,
 	.import		=	sha3_256_import,
 	.descsize	=	S390_SHA_CTX_SIZE,
@@ -94,22 +94,22 @@ static struct shash_alg sha3_256_alg = {
 		.cra_blocksize	 =	SHA3_256_BLOCK_SIZE,
 		.cra_module	 =	THIS_MODULE,
 	}
 };
 
-static int sha3_224_init(struct shash_desc *desc)
+static int s390_sha3_224_init(struct shash_desc *desc)
 {
 	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
 
-	sha3_256_init(desc);
+	s390_sha3_256_init(desc);
 	sctx->func = CPACF_KIMD_SHA3_224;
 	return 0;
 }
 
 static struct shash_alg sha3_224_alg = {
 	.digestsize	=	SHA3_224_DIGEST_SIZE,
-	.init		=	sha3_224_init,
+	.init		=	s390_sha3_224_init,
 	.update		=	s390_sha_update_blocks,
 	.finup		=	s390_sha_finup,
 	.export		=	sha3_256_export, /* same as for 256 */
 	.import		=	sha3_224_import, /* function code different! */
 	.descsize	=	S390_SHA_CTX_SIZE,
diff --git a/arch/s390/crypto/sha3_512_s390.c b/arch/s390/crypto/sha3_512_s390.c
index a5c9690eecb19..ff6ee55844005 100644
--- a/arch/s390/crypto/sha3_512_s390.c
+++ b/arch/s390/crypto/sha3_512_s390.c
@@ -16,11 +16,11 @@
 #include <linux/module.h>
 #include <linux/string.h>
 
 #include "sha.h"
 
-static int sha3_512_init(struct shash_desc *desc)
+static int s390_sha3_512_init(struct shash_desc *desc)
 {
 	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
 
 	sctx->first_message_part = test_facility(86);
 	if (!sctx->first_message_part)
@@ -76,11 +76,11 @@ static int sha3_384_import(struct shash_desc *desc, const void *in)
 	return 0;
 }
 
 static struct shash_alg sha3_512_alg = {
 	.digestsize	=	SHA3_512_DIGEST_SIZE,
-	.init		=	sha3_512_init,
+	.init		=	s390_sha3_512_init,
 	.update		=	s390_sha_update_blocks,
 	.finup		=	s390_sha_finup,
 	.export		=	sha3_512_export,
 	.import		=	sha3_512_import,
 	.descsize	=	S390_SHA_CTX_SIZE,
@@ -95,22 +95,22 @@ static struct shash_alg sha3_512_alg = {
 	}
 };
 
 MODULE_ALIAS_CRYPTO("sha3-512");
 
-static int sha3_384_init(struct shash_desc *desc)
+static int s390_sha3_384_init(struct shash_desc *desc)
 {
 	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
 
-	sha3_512_init(desc);
+	s390_sha3_512_init(desc);
 	sctx->func = CPACF_KIMD_SHA3_384;
 	return 0;
 }
 
 static struct shash_alg sha3_384_alg = {
 	.digestsize	=	SHA3_384_DIGEST_SIZE,
-	.init		=	sha3_384_init,
+	.init		=	s390_sha3_384_init,
 	.update		=	s390_sha_update_blocks,
 	.finup		=	s390_sha_finup,
 	.export		=	sha3_512_export, /* same as for 512 */
 	.import		=	sha3_384_import, /* function code different! */
 	.descsize	=	S390_SHA_CTX_SIZE,
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support Eric Biggers
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

From: David Howells <dhowells@redhat.com>

Rename the arm64 sha3_update() function to have an "arm64_" prefix to
avoid a name conflict with the upcoming SHA-3 library.

Note: this code will be superseded later.  This commit simply keeps the
kernel building for the initial introduction of the library.

Signed-off-by: David Howells <dhowells@redhat.com>
[EB: dropped unnecessary rename of sha3_finup(), and improved commit
     message]
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm64/crypto/sha3-ce-glue.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c
index b4f1001046c9a..f5c8302349337 100644
--- a/arch/arm64/crypto/sha3-ce-glue.c
+++ b/arch/arm64/crypto/sha3-ce-glue.c
@@ -29,12 +29,12 @@ MODULE_ALIAS_CRYPTO("sha3-384");
 MODULE_ALIAS_CRYPTO("sha3-512");
 
 asmlinkage int sha3_ce_transform(u64 *st, const u8 *data, int blocks,
 				 int md_len);
 
-static int sha3_update(struct shash_desc *desc, const u8 *data,
-		       unsigned int len)
+static int arm64_sha3_update(struct shash_desc *desc, const u8 *data,
+			     unsigned int len)
 {
 	struct sha3_state *sctx = shash_desc_ctx(desc);
 	struct crypto_shash *tfm = desc->tfm;
 	unsigned int bs, ds;
 	int blocks;
@@ -88,11 +88,11 @@ static int sha3_finup(struct shash_desc *desc, const u8 *src, unsigned int len,
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize		= SHA3_224_DIGEST_SIZE,
 	.init			= crypto_sha3_init,
-	.update			= sha3_update,
+	.update			= arm64_sha3_update,
 	.finup			= sha3_finup,
 	.descsize		= SHA3_STATE_SIZE,
 	.base.cra_name		= "sha3-224",
 	.base.cra_driver_name	= "sha3-224-ce",
 	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
@@ -100,11 +100,11 @@ static struct shash_alg algs[] = { {
 	.base.cra_module	= THIS_MODULE,
 	.base.cra_priority	= 200,
 }, {
 	.digestsize		= SHA3_256_DIGEST_SIZE,
 	.init			= crypto_sha3_init,
-	.update			= sha3_update,
+	.update			= arm64_sha3_update,
 	.finup			= sha3_finup,
 	.descsize		= SHA3_STATE_SIZE,
 	.base.cra_name		= "sha3-256",
 	.base.cra_driver_name	= "sha3-256-ce",
 	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
@@ -112,11 +112,11 @@ static struct shash_alg algs[] = { {
 	.base.cra_module	= THIS_MODULE,
 	.base.cra_priority	= 200,
 }, {
 	.digestsize		= SHA3_384_DIGEST_SIZE,
 	.init			= crypto_sha3_init,
-	.update			= sha3_update,
+	.update			= arm64_sha3_update,
 	.finup			= sha3_finup,
 	.descsize		= SHA3_STATE_SIZE,
 	.base.cra_name		= "sha3-384",
 	.base.cra_driver_name	= "sha3-384-ce",
 	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
@@ -124,11 +124,11 @@ static struct shash_alg algs[] = { {
 	.base.cra_module	= THIS_MODULE,
 	.base.cra_priority	= 200,
 }, {
 	.digestsize		= SHA3_512_DIGEST_SIZE,
 	.init			= crypto_sha3_init,
-	.update			= sha3_update,
+	.update			= arm64_sha3_update,
 	.finup			= sha3_finup,
 	.descsize		= SHA3_STATE_SIZE,
 	.base.cra_name		= "sha3-512",
 	.base.cra_driver_name	= "sha3-512-ce",
 	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function Eric Biggers
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

From: David Howells <dhowells@redhat.com>

Add SHA-3 support to lib/crypto/.  All six algorithms in the SHA-3
family are supported: four digests (SHA3-224, SHA3-256, SHA3-384, and
SHA3-512) and two extendable-output functions (SHAKE128 and SHAKE256).

The SHAKE algorithms will be required for ML-DSA.

Signed-off-by: David Howells <dhowells@redhat.com>
[EB: simplified the API to use fewer types and functions, fixed bug that
     sometimes caused incorrect SHAKE output, cleaned up the
     documentation, dropped an ad-hoc test that was inconsistent with
     the rest of lib/crypto/, and many other cleanups]
Co-developed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 Documentation/crypto/index.rst |   1 +
 Documentation/crypto/sha3.rst  | 119 +++++++++++
 include/crypto/sha3.h          | 308 +++++++++++++++++++++++++++-
 lib/crypto/Kconfig             |   7 +
 lib/crypto/Makefile            |   5 +
 lib/crypto/sha3.c              | 359 +++++++++++++++++++++++++++++++++
 6 files changed, 796 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/crypto/sha3.rst
 create mode 100644 lib/crypto/sha3.c

diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst
index 100b47d049c04..4ee667c446f99 100644
--- a/Documentation/crypto/index.rst
+++ b/Documentation/crypto/index.rst
@@ -25,5 +25,6 @@ for cryptographic use cases, as well as programming examples.
    api
    api-samples
    descore-readme
    device_drivers/index
    krb5
+   sha3
diff --git a/Documentation/crypto/sha3.rst b/Documentation/crypto/sha3.rst
new file mode 100644
index 0000000000000..b705e70691d7b
--- /dev/null
+++ b/Documentation/crypto/sha3.rst
@@ -0,0 +1,119 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+==========================
+SHA-3 Algorithm Collection
+==========================
+
+.. contents::
+
+Overview
+========
+
+The SHA-3 family of algorithms, as specified in NIST FIPS-202 [1]_, contains six
+algorithms based on the Keccak sponge function.  The differences between them
+are: the "rate" (how much of the state buffer gets updated with new data between
+invocations of the Keccak function and analogous to the "block size"), what
+domain separation suffix gets appended to the input data, and how much output
+data is extracted at the end.  The Keccak sponge function is designed such that
+arbitrary amounts of output can be obtained for certain algorithms.
+
+Four digest algorithms are provided:
+
+ - SHA3-224
+ - SHA3-256
+ - SHA3-384
+ - SHA3-512
+
+Additionally, two Extendable-Output Functions (XOFs) are provided:
+
+ - SHAKE128
+ - SHAKE256
+
+The SHA-3 library API supports all six of these algorithms.  The four digest
+algorithms are also supported by the crypto_shash and crypto_ahash APIs.
+
+This document describes the SHA-3 library API.
+
+
+Digests
+=======
+
+The following functions compute SHA-3 digests::
+
+	void sha3_224(const u8 *in, size_t in_len, u8 out[SHA3_224_DIGEST_SIZE]);
+	void sha3_256(const u8 *in, size_t in_len, u8 out[SHA3_256_DIGEST_SIZE]);
+	void sha3_384(const u8 *in, size_t in_len, u8 out[SHA3_384_DIGEST_SIZE]);
+	void sha3_512(const u8 *in, size_t in_len, u8 out[SHA3_512_DIGEST_SIZE]);
+
+For users that need to pass in data incrementally, an incremental API is also
+provided.  The incremental API uses the following struct::
+
+	struct sha3_ctx { ... };
+
+Initialization is done with one of::
+
+	void sha3_224_init(struct sha3_ctx *ctx);
+	void sha3_256_init(struct sha3_ctx *ctx);
+	void sha3_384_init(struct sha3_ctx *ctx);
+	void sha3_512_init(struct sha3_ctx *ctx);
+
+Input data is then added with any number of calls to::
+
+	void sha3_update(struct sha3_ctx *ctx, const u8 *in, size_t in_len);
+
+Finally, the digest is generated using::
+
+	void sha3_final(struct sha3_ctx *ctx, u8 *out);
+
+which also zeroizes the context.  The length of the digest is determined by the
+initialization function that was called.
+
+
+Extendable-Output Functions
+===========================
+
+The following functions compute the SHA-3 extendable-output functions (XOFs)::
+
+	void shake128(const u8 *in, size_t in_len, u8 *out, size_t out_len);
+	void shake256(const u8 *in, size_t in_len, u8 *out, size_t out_len);
+
+For users that need to provide the input data incrementally and/or receive the
+output data incrementally, an incremental API is also provided.  The incremental
+API uses the following struct::
+
+	struct shake_ctx { ... };
+
+Initialization is done with one of::
+
+	void shake128_init(struct shake_ctx *ctx);
+	void shake256_init(struct shake_ctx *ctx);
+
+Input data is then added with any number of calls to::
+
+	void shake_update(struct shake_ctx *ctx, const u8 *in, size_t in_len);
+
+Finally, the output data is extracted with any number of calls to::
+
+	void shake_squeeze(struct shake_ctx *ctx, u8 *out, size_t out_len);
+
+and telling it how much data should be extracted.  Note that performing multiple
+squeezes, with the output laid consecutively in a buffer, gets exactly the same
+output as doing a single squeeze for the combined amount over the same buffer.
+
+More input data cannot be added after squeezing has started.
+
+Once all the desired output has been extracted, zeroize the context::
+
+	void shake_zeroize_ctx(struct shake_ctx *ctx);
+
+
+References
+==========
+
+.. [1] https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
+
+
+API Function Reference
+======================
+
+.. kernel-doc:: include/crypto/sha3.h
diff --git a/include/crypto/sha3.h b/include/crypto/sha3.h
index 41e1b83a6d918..a7503dfc1a044 100644
--- a/include/crypto/sha3.h
+++ b/include/crypto/sha3.h
@@ -1,13 +1,16 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Common values for SHA-3 algorithms
+ *
+ * See also Documentation/crypto/sha3.rst
  */
 #ifndef __CRYPTO_SHA3_H__
 #define __CRYPTO_SHA3_H__
 
 #include <linux/types.h>
+#include <linux/string.h>
 
 #define SHA3_224_DIGEST_SIZE	(224 / 8)
 #define SHA3_224_BLOCK_SIZE	(200 - 2 * SHA3_224_DIGEST_SIZE)
 #define SHA3_224_EXPORT_SIZE	SHA3_STATE_SIZE + SHA3_224_BLOCK_SIZE + 1
 
@@ -21,16 +24,315 @@
 
 #define SHA3_512_DIGEST_SIZE	(512 / 8)
 #define SHA3_512_BLOCK_SIZE	(200 - 2 * SHA3_512_DIGEST_SIZE)
 #define SHA3_512_EXPORT_SIZE	SHA3_STATE_SIZE + SHA3_512_BLOCK_SIZE + 1
 
+/*
+ * SHAKE128 and SHAKE256 actually have variable output size, but this is used to
+ * calculate the block size (rate) analogously to the above.
+ */
+#define SHAKE128_DEFAULT_SIZE	(128 / 8)
+#define SHAKE128_BLOCK_SIZE	(200 - 2 * SHAKE128_DEFAULT_SIZE)
+#define SHAKE256_DEFAULT_SIZE	(256 / 8)
+#define SHAKE256_BLOCK_SIZE	(200 - 2 * SHAKE256_DEFAULT_SIZE)
+
 #define SHA3_STATE_SIZE		200
 
 struct shash_desc;
 
+int crypto_sha3_init(struct shash_desc *desc);
+
+/*
+ * State for the Keccak-f[1600] permutation: 25 64-bit words.
+ *
+ * We usually keep the state words as little-endian, to make absorbing and
+ * squeezing easier.  (It means that absorbing and squeezing can just treat the
+ * state as a byte array.)  The state words are converted to native-endian only
+ * temporarily by implementations of the permutation that need native-endian
+ * words.  Of course, that conversion is a no-op on little-endian machines.
+ */
 struct sha3_state {
-	u64		st[SHA3_STATE_SIZE / 8];
+	union {
+		u64 st[SHA3_STATE_SIZE / 8]; /* temporarily retained for compatibility purposes */
+
+		__le64 words[SHA3_STATE_SIZE / 8];
+		u8 bytes[SHA3_STATE_SIZE];
+
+		u64 native_words[SHA3_STATE_SIZE / 8]; /* see comment above */
+	};
 };
 
-int crypto_sha3_init(struct shash_desc *desc);
+/* Internal context, shared by the digests (SHA3-*) and the XOFs (SHAKE*) */
+struct __sha3_ctx {
+	struct sha3_state state;
+	u8 digest_size;		/* Digests only: the digest size in bytes */
+	u8 block_size;		/* Block size in bytes */
+	u8 absorb_offset;	/* Index of next state byte to absorb into */
+	u8 squeeze_offset;	/* XOFs only: index of next state byte to extract */
+};
+
+void __sha3_update(struct __sha3_ctx *ctx, const u8 *in, size_t in_len);
+
+/** Context for SHA3-224, SHA3-256, SHA3-384, or SHA3-512 */
+struct sha3_ctx {
+	struct __sha3_ctx ctx;
+};
+
+/**
+ * Zeroize a sha3_ctx.  This is already called by sha3_final().  Call this
+ * explicitly when abandoning a context without calling sha3_final().
+ */
+static inline void sha3_zeroize_ctx(struct sha3_ctx *ctx)
+{
+	memzero_explicit(ctx, sizeof(*ctx));
+}
+
+/** Context for SHAKE128 or SHAKE256 */
+struct shake_ctx {
+	struct __sha3_ctx ctx;
+};
+
+/** Zeroize a shake_ctx.  Call this after the last squeeze. */
+static inline void shake_zeroize_ctx(struct shake_ctx *ctx)
+{
+	memzero_explicit(ctx, sizeof(*ctx));
+}
+
+/**
+ * sha3_224_init() - Initialize a context for SHA3-224
+ * @ctx: The context to initialize
+ *
+ * This begins a new SHA3-224 message digest computation.
+ *
+ * Context: Any context.
+ */
+static inline void sha3_224_init(struct sha3_ctx *ctx)
+{
+	*ctx = (struct sha3_ctx){
+		.ctx.digest_size = SHA3_224_DIGEST_SIZE,
+		.ctx.block_size = SHA3_224_BLOCK_SIZE,
+	};
+}
+
+/**
+ * sha3_256_init() - Initialize a context for SHA3-256
+ * @ctx: The context to initialize
+ *
+ * This begins a new SHA3-256 message digest computation.
+ *
+ * Context: Any context.
+ */
+static inline void sha3_256_init(struct sha3_ctx *ctx)
+{
+	*ctx = (struct sha3_ctx){
+		.ctx.digest_size = SHA3_256_DIGEST_SIZE,
+		.ctx.block_size = SHA3_256_BLOCK_SIZE,
+	};
+}
+
+/**
+ * sha3_384_init() - Initialize a context for SHA3-384
+ * @ctx: The context to initialize
+ *
+ * This begins a new SHA3-384 message digest computation.
+ *
+ * Context: Any context.
+ */
+static inline void sha3_384_init(struct sha3_ctx *ctx)
+{
+	*ctx = (struct sha3_ctx){
+		.ctx.digest_size = SHA3_384_DIGEST_SIZE,
+		.ctx.block_size = SHA3_384_BLOCK_SIZE,
+	};
+}
+
+/**
+ * sha3_512_init() - Initialize a context for SHA3-512
+ * @ctx: The context to initialize
+ *
+ * This begins a new SHA3-512 message digest computation.
+ *
+ * Context: Any context.
+ */
+static inline void sha3_512_init(struct sha3_ctx *ctx)
+{
+	*ctx = (struct sha3_ctx){
+		.ctx.digest_size = SHA3_512_DIGEST_SIZE,
+		.ctx.block_size = SHA3_512_BLOCK_SIZE,
+	};
+}
+
+/**
+ * sha3_update() - Update a SHA-3 digest context with input data
+ * @ctx: The context to update; must have been initialized
+ * @in: The input data
+ * @in_len: Length of the input data in bytes
+ *
+ * This can be called any number of times to add data to a SHA3-224, SHA3-256,
+ * SHA3-384, or SHA3-512 digest (depending on which init function was called).
+ *
+ * Context: Any context.
+ */
+static inline void sha3_update(struct sha3_ctx *ctx,
+			       const u8 *in, size_t in_len)
+{
+	__sha3_update(&ctx->ctx, in, in_len);
+}
+
+/**
+ * sha3_final() - Finish computing a SHA-3 message digest
+ * @ctx: The context to finalize; must have been initialized
+ * @out: (output) The resulting SHA3-224, SHA3-256, SHA3-384, or SHA3-512
+ *	 message digest, matching the init function that was called.  Note that
+ *	 the size differs for each one; see SHA3_*_DIGEST_SIZE.
+ *
+ * After finishing, this zeroizes @ctx.  So the caller does not need to do it.
+ *
+ * Context: Any context.
+ */
+void sha3_final(struct sha3_ctx *ctx, u8 *out);
+
+/**
+ * shake128_init() - Initialize a context for SHAKE128
+ * @ctx: The context to initialize
+ *
+ * This begins a new SHAKE128 extendable-output function (XOF) computation.
+ *
+ * Context: Any context.
+ */
+static inline void shake128_init(struct shake_ctx *ctx)
+{
+	*ctx = (struct shake_ctx){
+		.ctx.block_size = SHAKE128_BLOCK_SIZE,
+	};
+}
+
+/**
+ * shake256_init() - Initialize a context for SHAKE256
+ * @ctx: The context to initialize
+ *
+ * This begins a new SHAKE256 extendable-output function (XOF) computation.
+ *
+ * Context: Any context.
+ */
+static inline void shake256_init(struct shake_ctx *ctx)
+{
+	*ctx = (struct shake_ctx){
+		.ctx.block_size = SHAKE256_BLOCK_SIZE,
+	};
+}
+
+/**
+ * shake_update() - Update a SHAKE context with input data
+ * @ctx: The context to update; must have been initialized
+ * @in: The input data
+ * @in_len: Length of the input data in bytes
+ *
+ * This can be called any number of times to add more input data to SHAKE128 or
+ * SHAKE256.  This cannot be called after squeezing has begun.
+ *
+ * Context: Any context.
+ */
+static inline void shake_update(struct shake_ctx *ctx,
+				const u8 *in, size_t in_len)
+{
+	__sha3_update(&ctx->ctx, in, in_len);
+}
+
+/**
+ * shake_squeeze() - Generate output from SHAKE128 or SHAKE256
+ * @ctx: The context to squeeze; must have been initialized
+ * @out: Where to write the resulting output data
+ * @out_len: The amount of data to extract to @out in bytes
+ *
+ * This may be called multiple times.  A number of consecutive squeezes laid
+ * end-to-end will yield the same output as one big squeeze generating the same
+ * total amount of output.  More input cannot be provided after squeezing has
+ * begun.  After the last squeeze, call shake_zeroize_ctx().
+ *
+ * Context: Any context.
+ */
+void shake_squeeze(struct shake_ctx *ctx, u8 *out, size_t out_len);
+
+/**
+ * sha3_224() - Compute SHA3-224 digest in one shot
+ * @in: The input data to be digested
+ * @in_len: Length of the input data in bytes
+ * @out: The buffer into which the digest will be stored
+ *
+ * Convenience function that computes a SHA3-224 digest.  Use this instead of
+ * the incremental API if you're able to provide all the input at once.
+ *
+ * Context: Any context.
+ */
+void sha3_224(const u8 *in, size_t in_len, u8 out[SHA3_224_DIGEST_SIZE]);
+
+/**
+ * sha3_256() - Compute SHA3-256 digest in one shot
+ * @in: The input data to be digested
+ * @in_len: Length of the input data in bytes
+ * @out: The buffer into which the digest will be stored
+ *
+ * Convenience function that computes a SHA3-256 digest.  Use this instead of
+ * the incremental API if you're able to provide all the input at once.
+ *
+ * Context: Any context.
+ */
+void sha3_256(const u8 *in, size_t in_len, u8 out[SHA3_256_DIGEST_SIZE]);
+
+/**
+ * sha3_384() - Compute SHA3-384 digest in one shot
+ * @in: The input data to be digested
+ * @in_len: Length of the input data in bytes
+ * @out: The buffer into which the digest will be stored
+ *
+ * Convenience function that computes a SHA3-384 digest.  Use this instead of
+ * the incremental API if you're able to provide all the input at once.
+ *
+ * Context: Any context.
+ */
+void sha3_384(const u8 *in, size_t in_len, u8 out[SHA3_384_DIGEST_SIZE]);
+
+/**
+ * sha3_512() - Compute SHA3-512 digest in one shot
+ * @in: The input data to be digested
+ * @in_len: Length of the input data in bytes
+ * @out: The buffer into which the digest will be stored
+ *
+ * Convenience function that computes a SHA3-512 digest.  Use this instead of
+ * the incremental API if you're able to provide all the input at once.
+ *
+ * Context: Any context.
+ */
+void sha3_512(const u8 *in, size_t in_len, u8 out[SHA3_512_DIGEST_SIZE]);
+
+/**
+ * shake128() - Compute SHAKE128 in one shot
+ * @in: The input data to be used
+ * @in_len: Length of the input data in bytes
+ * @out: The buffer into which the output will be stored
+ * @out_len: Length of the output to produce in bytes
+ *
+ * Convenience function that computes SHAKE128 in one shot.  Use this instead of
+ * the incremental API if you're able to provide all the input at once as well
+ * as receive all the output at once.  All output lengths are supported.
+ *
+ * Context: Any context.
+ */
+void shake128(const u8 *in, size_t in_len, u8 *out, size_t out_len);
+
+/**
+ * shake256() - Compute SHAKE256 in one shot
+ * @in: The input data to be used
+ * @in_len: Length of the input data in bytes
+ * @out: The buffer into which the output will be stored
+ * @out_len: Length of the output to produce in bytes
+ *
+ * Convenience function that computes SHAKE256 in one shot.  Use this instead of
+ * the incremental API if you're able to provide all the input at once as well
+ * as receive all the output at once.  All output lengths are supported.
+ *
+ * Context: Any context.
+ */
+void shake256(const u8 *in, size_t in_len, u8 *out, size_t out_len);
 
-#endif
+#endif /* __CRYPTO_SHA3_H__ */
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 280b888153bf0..a05f5a349cd8c 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -193,10 +193,17 @@ config CRYPTO_LIB_SHA512_ARCH
 	default y if RISCV && 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
 	default y if S390
 	default y if SPARC64
 	default y if X86_64
 
+config CRYPTO_LIB_SHA3
+	tristate
+	select CRYPTO_LIB_UTILS
+	help
+	  The SHA3 library functions.  Select this if your module uses any of
+	  the functions from <crypto/sha3.h>.
+
 config CRYPTO_LIB_SM3
 	tristate
 
 source "lib/crypto/tests/Kconfig"
 
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index bc26777d08e97..0cfdb511f32b6 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -276,10 +276,15 @@ libsha512-$(CONFIG_X86) += x86/sha512-ssse3-asm.o \
 			   x86/sha512-avx2-asm.o
 endif # CONFIG_CRYPTO_LIB_SHA512_ARCH
 
 ################################################################################
 
+obj-$(CONFIG_CRYPTO_LIB_SHA3) += libsha3.o
+libsha3-y := sha3.o
+
+################################################################################
+
 obj-$(CONFIG_MPILIB) += mpi/
 
 obj-$(CONFIG_CRYPTO_SELFTESTS_FULL)		+= simd.o
 
 obj-$(CONFIG_CRYPTO_LIB_SM3)			+= libsm3.o
diff --git a/lib/crypto/sha3.c b/lib/crypto/sha3.c
new file mode 100644
index 0000000000000..049be8414de26
--- /dev/null
+++ b/lib/crypto/sha3.c
@@ -0,0 +1,359 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * SHA-3, as specified in
+ * https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
+ *
+ * SHA-3 code by Jeff Garzik <jeff@garzik.org>
+ *               Ard Biesheuvel <ard.biesheuvel@linaro.org>
+ *               David Howells <dhowells@redhat.com>
+ *
+ * See also Documentation/crypto/sha3.rst
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <crypto/sha3.h>
+#include <crypto/utils.h>
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/unaligned.h>
+
+/*
+ * On some 32-bit architectures, such as h8300, GCC ends up using over 1 KB of
+ * stack if the round calculation gets inlined into the loop in
+ * sha3_keccakf_generic().  On the other hand, on 64-bit architectures with
+ * plenty of [64-bit wide] general purpose registers, not inlining it severely
+ * hurts performance.  So let's use 64-bitness as a heuristic to decide whether
+ * to inline or not.
+ */
+#ifdef CONFIG_64BIT
+#define SHA3_INLINE inline
+#else
+#define SHA3_INLINE noinline
+#endif
+
+#define SHA3_KECCAK_ROUNDS 24
+
+static const u64 sha3_keccakf_rndc[SHA3_KECCAK_ROUNDS] = {
+	0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL,
+	0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL,
+	0x8000000080008081ULL, 0x8000000000008009ULL, 0x000000000000008aULL,
+	0x0000000000000088ULL, 0x0000000080008009ULL, 0x000000008000000aULL,
+	0x000000008000808bULL, 0x800000000000008bULL, 0x8000000000008089ULL,
+	0x8000000000008003ULL, 0x8000000000008002ULL, 0x8000000000000080ULL,
+	0x000000000000800aULL, 0x800000008000000aULL, 0x8000000080008081ULL,
+	0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL
+};
+
+/*
+ * Perform a single round of Keccak mixing.
+ */
+static SHA3_INLINE void sha3_keccakf_one_round_generic(u64 st[25])
+{
+	u64 t[5], tt, bc[5];
+
+	/* Theta */
+	bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20];
+	bc[1] = st[1] ^ st[6] ^ st[11] ^ st[16] ^ st[21];
+	bc[2] = st[2] ^ st[7] ^ st[12] ^ st[17] ^ st[22];
+	bc[3] = st[3] ^ st[8] ^ st[13] ^ st[18] ^ st[23];
+	bc[4] = st[4] ^ st[9] ^ st[14] ^ st[19] ^ st[24];
+
+	t[0] = bc[4] ^ rol64(bc[1], 1);
+	t[1] = bc[0] ^ rol64(bc[2], 1);
+	t[2] = bc[1] ^ rol64(bc[3], 1);
+	t[3] = bc[2] ^ rol64(bc[4], 1);
+	t[4] = bc[3] ^ rol64(bc[0], 1);
+
+	st[0] ^= t[0];
+
+	/* Rho Pi */
+	tt = st[1];
+	st[ 1] = rol64(st[ 6] ^ t[1], 44);
+	st[ 6] = rol64(st[ 9] ^ t[4], 20);
+	st[ 9] = rol64(st[22] ^ t[2], 61);
+	st[22] = rol64(st[14] ^ t[4], 39);
+	st[14] = rol64(st[20] ^ t[0], 18);
+	st[20] = rol64(st[ 2] ^ t[2], 62);
+	st[ 2] = rol64(st[12] ^ t[2], 43);
+	st[12] = rol64(st[13] ^ t[3], 25);
+	st[13] = rol64(st[19] ^ t[4],  8);
+	st[19] = rol64(st[23] ^ t[3], 56);
+	st[23] = rol64(st[15] ^ t[0], 41);
+	st[15] = rol64(st[ 4] ^ t[4], 27);
+	st[ 4] = rol64(st[24] ^ t[4], 14);
+	st[24] = rol64(st[21] ^ t[1],  2);
+	st[21] = rol64(st[ 8] ^ t[3], 55);
+	st[ 8] = rol64(st[16] ^ t[1], 45);
+	st[16] = rol64(st[ 5] ^ t[0], 36);
+	st[ 5] = rol64(st[ 3] ^ t[3], 28);
+	st[ 3] = rol64(st[18] ^ t[3], 21);
+	st[18] = rol64(st[17] ^ t[2], 15);
+	st[17] = rol64(st[11] ^ t[1], 10);
+	st[11] = rol64(st[ 7] ^ t[2],  6);
+	st[ 7] = rol64(st[10] ^ t[0],  3);
+	st[10] = rol64(    tt ^ t[1],  1);
+
+	/* Chi */
+	bc[ 0] = ~st[ 1] & st[ 2];
+	bc[ 1] = ~st[ 2] & st[ 3];
+	bc[ 2] = ~st[ 3] & st[ 4];
+	bc[ 3] = ~st[ 4] & st[ 0];
+	bc[ 4] = ~st[ 0] & st[ 1];
+	st[ 0] ^= bc[ 0];
+	st[ 1] ^= bc[ 1];
+	st[ 2] ^= bc[ 2];
+	st[ 3] ^= bc[ 3];
+	st[ 4] ^= bc[ 4];
+
+	bc[ 0] = ~st[ 6] & st[ 7];
+	bc[ 1] = ~st[ 7] & st[ 8];
+	bc[ 2] = ~st[ 8] & st[ 9];
+	bc[ 3] = ~st[ 9] & st[ 5];
+	bc[ 4] = ~st[ 5] & st[ 6];
+	st[ 5] ^= bc[ 0];
+	st[ 6] ^= bc[ 1];
+	st[ 7] ^= bc[ 2];
+	st[ 8] ^= bc[ 3];
+	st[ 9] ^= bc[ 4];
+
+	bc[ 0] = ~st[11] & st[12];
+	bc[ 1] = ~st[12] & st[13];
+	bc[ 2] = ~st[13] & st[14];
+	bc[ 3] = ~st[14] & st[10];
+	bc[ 4] = ~st[10] & st[11];
+	st[10] ^= bc[ 0];
+	st[11] ^= bc[ 1];
+	st[12] ^= bc[ 2];
+	st[13] ^= bc[ 3];
+	st[14] ^= bc[ 4];
+
+	bc[ 0] = ~st[16] & st[17];
+	bc[ 1] = ~st[17] & st[18];
+	bc[ 2] = ~st[18] & st[19];
+	bc[ 3] = ~st[19] & st[15];
+	bc[ 4] = ~st[15] & st[16];
+	st[15] ^= bc[ 0];
+	st[16] ^= bc[ 1];
+	st[17] ^= bc[ 2];
+	st[18] ^= bc[ 3];
+	st[19] ^= bc[ 4];
+
+	bc[ 0] = ~st[21] & st[22];
+	bc[ 1] = ~st[22] & st[23];
+	bc[ 2] = ~st[23] & st[24];
+	bc[ 3] = ~st[24] & st[20];
+	bc[ 4] = ~st[20] & st[21];
+	st[20] ^= bc[ 0];
+	st[21] ^= bc[ 1];
+	st[22] ^= bc[ 2];
+	st[23] ^= bc[ 3];
+	st[24] ^= bc[ 4];
+}
+
+/* Generic implementation of the Keccak-f[1600] permutation */
+static void sha3_keccakf_generic(struct sha3_state *state)
+{
+	/*
+	 * Temporarily convert the state words from little-endian to native-
+	 * endian so that they can be operated on.  Note that on little-endian
+	 * machines this conversion is a no-op and is optimized out.
+	 */
+
+	for (int i = 0; i < ARRAY_SIZE(state->words); i++)
+		state->native_words[i] = le64_to_cpu(state->words[i]);
+
+	for (int round = 0; round < SHA3_KECCAK_ROUNDS; round++) {
+		sha3_keccakf_one_round_generic(state->native_words);
+		/* Iota */
+		state->native_words[0] ^= sha3_keccakf_rndc[round];
+	}
+
+	for (int i = 0; i < ARRAY_SIZE(state->words); i++)
+		state->words[i] = cpu_to_le64(state->native_words[i]);
+}
+
+/*
+ * Generic implementation of absorbing the given nonzero number of full blocks
+ * into the sponge function Keccak[r=8*block_size, c=1600-8*block_size].
+ */
+static void __maybe_unused
+sha3_absorb_blocks_generic(struct sha3_state *state, const u8 *data,
+			   size_t nblocks, size_t block_size)
+{
+	do {
+		for (size_t i = 0; i < block_size; i += 8)
+			state->words[i / 8] ^= get_unaligned((__le64 *)&data[i]);
+		sha3_keccakf_generic(state);
+		data += block_size;
+	} while (--nblocks);
+}
+
+#ifdef CONFIG_CRYPTO_LIB_SHA3_ARCH
+#include "sha3.h" /* $(SRCARCH)/sha3.h */
+#else
+#define sha3_keccakf		sha3_keccakf_generic
+#define sha3_absorb_blocks	sha3_absorb_blocks_generic
+#endif
+
+void __sha3_update(struct __sha3_ctx *ctx, const u8 *in, size_t in_len)
+{
+	const size_t block_size = ctx->block_size;
+	size_t absorb_offset = ctx->absorb_offset;
+
+	/* Warn if squeezing has already begun. */
+	WARN_ON_ONCE(absorb_offset >= block_size);
+
+	if (absorb_offset && absorb_offset + in_len >= block_size) {
+		crypto_xor(&ctx->state.bytes[absorb_offset], in,
+			   block_size - absorb_offset);
+		in += block_size - absorb_offset;
+		in_len -= block_size - absorb_offset;
+		sha3_keccakf(&ctx->state);
+		absorb_offset = 0;
+	}
+
+	if (in_len >= block_size) {
+		size_t nblocks = in_len / block_size;
+
+		sha3_absorb_blocks(&ctx->state, in, nblocks, block_size);
+		in += nblocks * block_size;
+		in_len -= nblocks * block_size;
+	}
+
+	if (in_len) {
+		crypto_xor(&ctx->state.bytes[absorb_offset], in, in_len);
+		absorb_offset += in_len;
+	}
+	ctx->absorb_offset = absorb_offset;
+}
+EXPORT_SYMBOL_GPL(__sha3_update);
+
+void sha3_final(struct sha3_ctx *sha3_ctx, u8 *out)
+{
+	struct __sha3_ctx *ctx = &sha3_ctx->ctx;
+
+	ctx->state.bytes[ctx->absorb_offset] ^= 0x06;
+	ctx->state.bytes[ctx->block_size - 1] ^= 0x80;
+	sha3_keccakf(&ctx->state);
+	memcpy(out, ctx->state.bytes, ctx->digest_size);
+	sha3_zeroize_ctx(sha3_ctx);
+}
+EXPORT_SYMBOL_GPL(sha3_final);
+
+void shake_squeeze(struct shake_ctx *shake_ctx, u8 *out, size_t out_len)
+{
+	struct __sha3_ctx *ctx = &shake_ctx->ctx;
+	const size_t block_size = ctx->block_size;
+	size_t squeeze_offset = ctx->squeeze_offset;
+
+	if (ctx->absorb_offset < block_size) {
+		/* First squeeze: */
+
+		/* Add the domain separation suffix and padding. */
+		ctx->state.bytes[ctx->absorb_offset] ^= 0x1f;
+		ctx->state.bytes[block_size - 1] ^= 0x80;
+
+		/* Indicate that squeezing has begun. */
+		ctx->absorb_offset = block_size;
+
+		/*
+		 * Indicate that no output is pending yet, i.e. sha3_keccakf()
+		 * will need to be called before the first copy.
+		 */
+		squeeze_offset = block_size;
+	}
+	while (out_len) {
+		if (squeeze_offset == block_size) {
+			sha3_keccakf(&ctx->state);
+			squeeze_offset = 0;
+		}
+		size_t copy = min(out_len, block_size - squeeze_offset);
+
+		memcpy(out, &ctx->state.bytes[squeeze_offset], copy);
+		out += copy;
+		out_len -= copy;
+		squeeze_offset += copy;
+	}
+	ctx->squeeze_offset = squeeze_offset;
+}
+EXPORT_SYMBOL_GPL(shake_squeeze);
+
+void sha3_224(const u8 *in, size_t in_len, u8 out[SHA3_224_DIGEST_SIZE])
+{
+	struct sha3_ctx ctx;
+
+	sha3_224_init(&ctx);
+	sha3_update(&ctx, in, in_len);
+	sha3_final(&ctx, out);
+}
+EXPORT_SYMBOL_GPL(sha3_224);
+
+void sha3_256(const u8 *in, size_t in_len, u8 out[SHA3_256_DIGEST_SIZE])
+{
+	struct sha3_ctx ctx;
+
+	sha3_256_init(&ctx);
+	sha3_update(&ctx, in, in_len);
+	sha3_final(&ctx, out);
+}
+EXPORT_SYMBOL_GPL(sha3_256);
+
+void sha3_384(const u8 *in, size_t in_len, u8 out[SHA3_384_DIGEST_SIZE])
+{
+	struct sha3_ctx ctx;
+
+	sha3_384_init(&ctx);
+	sha3_update(&ctx, in, in_len);
+	sha3_final(&ctx, out);
+}
+EXPORT_SYMBOL_GPL(sha3_384);
+
+void sha3_512(const u8 *in, size_t in_len, u8 out[SHA3_512_DIGEST_SIZE])
+{
+	struct sha3_ctx ctx;
+
+	sha3_512_init(&ctx);
+	sha3_update(&ctx, in, in_len);
+	sha3_final(&ctx, out);
+}
+EXPORT_SYMBOL_GPL(sha3_512);
+
+void shake128(const u8 *in, size_t in_len, u8 *out, size_t out_len)
+{
+	struct shake_ctx ctx;
+
+	shake128_init(&ctx);
+	shake_update(&ctx, in, in_len);
+	shake_squeeze(&ctx, out, out_len);
+	shake_zeroize_ctx(&ctx);
+}
+EXPORT_SYMBOL_GPL(shake128);
+
+void shake256(const u8 *in, size_t in_len, u8 *out, size_t out_len)
+{
+	struct shake_ctx ctx;
+
+	shake256_init(&ctx);
+	shake_update(&ctx, in, in_len);
+	shake_squeeze(&ctx, out, out_len);
+	shake_zeroize_ctx(&ctx);
+}
+EXPORT_SYMBOL_GPL(shake256);
+
+#ifdef sha3_mod_init_arch
+static int __init sha3_mod_init(void)
+{
+	sha3_mod_init_arch();
+	return 0;
+}
+subsys_initcall(sha3_mod_init);
+
+static void __exit sha3_mod_exit(void)
+{
+}
+module_exit(sha3_mod_exit);
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("SHA-3 library functions");
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (2 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests Eric Biggers
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

From: David Howells <dhowells@redhat.com>

In crypto/sha3_generic.c, the keccakf() function calls keccakf_round()
to do four of Keccak-f's five step mappings.  However, it does not do
the Iota step mapping - presumably because that is dependent on round
number, whereas Theta, Rho, Pi and Chi are not.

Note that the keccakf_round() function needs to be explicitly
non-inlined on certain architectures as gcc's produced output will (or
used to) use over 1KiB of stack space if inlined.

Now, this code was copied more or less verbatim into lib/crypto/sha3.c,
so that has the same aesthetic issue.  Fix this there by passing the
round number into sha3_keccakf_one_round_generic() and doing the Iota
step mapping there.

crypto/sha3_generic.c is left untouched as that will be converted to use
lib/crypto/sha3.c at some point.

Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/sha3.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/crypto/sha3.c b/lib/crypto/sha3.c
index 049be8414de26..ee7a2ca92b2c5 100644
--- a/lib/crypto/sha3.c
+++ b/lib/crypto/sha3.c
@@ -46,11 +46,11 @@ static const u64 sha3_keccakf_rndc[SHA3_KECCAK_ROUNDS] = {
 };
 
 /*
  * Perform a single round of Keccak mixing.
  */
-static SHA3_INLINE void sha3_keccakf_one_round_generic(u64 st[25])
+static SHA3_INLINE void sha3_keccakf_one_round_generic(u64 st[25], int round)
 {
 	u64 t[5], tt, bc[5];
 
 	/* Theta */
 	bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20];
@@ -147,10 +147,13 @@ static SHA3_INLINE void sha3_keccakf_one_round_generic(u64 st[25])
 	st[20] ^= bc[ 0];
 	st[21] ^= bc[ 1];
 	st[22] ^= bc[ 2];
 	st[23] ^= bc[ 3];
 	st[24] ^= bc[ 4];
+
+	/* Iota */
+	st[0] ^= sha3_keccakf_rndc[round];
 }
 
 /* Generic implementation of the Keccak-f[1600] permutation */
 static void sha3_keccakf_generic(struct sha3_state *state)
 {
@@ -161,15 +164,12 @@ static void sha3_keccakf_generic(struct sha3_state *state)
 	 */
 
 	for (int i = 0; i < ARRAY_SIZE(state->words); i++)
 		state->native_words[i] = le64_to_cpu(state->words[i]);
 
-	for (int round = 0; round < SHA3_KECCAK_ROUNDS; round++) {
-		sha3_keccakf_one_round_generic(state->native_words);
-		/* Iota */
-		state->native_words[0] ^= sha3_keccakf_rndc[round];
-	}
+	for (int round = 0; round < SHA3_KECCAK_ROUNDS; round++)
+		sha3_keccakf_one_round_generic(state->native_words, round);
 
 	for (int i = 0; i < ARRAY_SIZE(state->words); i++)
 		state->words[i] = cpu_to_le64(state->native_words[i]);
 }
 
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (3 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests Eric Biggers
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

From: David Howells <dhowells@redhat.com>

Add a SHA3 kunit test suite, providing the following:

 (*) A simple test of each of SHA3-224, SHA3-256, SHA3-384, SHA3-512,
     SHAKE128 and SHAKE256.

 (*) NIST 0- and 1600-bit test vectors for SHAKE128 and SHAKE256.

 (*) Output tiling (multiple squeezing) tests for SHAKE256.

 (*) Standard hash template test for SHA3-256.  To make this possible,
     gen-hash-testvecs.py is modified to support sha3-256.

 (*) Standard benchmark test for SHA3-256.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
[EB: dropped some unnecessary changes to gen-hash-testvecs.py, moved
     addition of Testing section in doc file into this commit, and
     other small cleanups]
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 Documentation/crypto/sha3.rst       |   6 +
 lib/crypto/tests/Kconfig            |  11 +
 lib/crypto/tests/Makefile           |   1 +
 lib/crypto/tests/sha3-testvecs.h    | 231 +++++++++++++++++++
 lib/crypto/tests/sha3_kunit.c       | 344 ++++++++++++++++++++++++++++
 scripts/crypto/gen-hash-testvecs.py |   4 +-
 6 files changed, 596 insertions(+), 1 deletion(-)
 create mode 100644 lib/crypto/tests/sha3-testvecs.h
 create mode 100644 lib/crypto/tests/sha3_kunit.c

diff --git a/Documentation/crypto/sha3.rst b/Documentation/crypto/sha3.rst
index b705e70691d7b..f8c484feaa291 100644
--- a/Documentation/crypto/sha3.rst
+++ b/Documentation/crypto/sha3.rst
@@ -105,10 +105,16 @@ More input data cannot be added after squeezing has started.
 Once all the desired output has been extracted, zeroize the context::
 
 	void shake_zeroize_ctx(struct shake_ctx *ctx);
 
 
+Testing
+=======
+
+To test the SHA-3 code, use sha3_kunit (CONFIG_CRYPTO_LIB_SHA3_KUNIT_TEST).
+
+
 References
 ==========
 
 .. [1] https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
 
diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig
index 2ebfd681bae4d..140afd1714bab 100644
--- a/lib/crypto/tests/Kconfig
+++ b/lib/crypto/tests/Kconfig
@@ -79,10 +79,21 @@ config CRYPTO_LIB_SHA512_KUNIT_TEST
 	select CRYPTO_LIB_SHA512
 	help
 	  KUnit tests for the SHA-384 and SHA-512 cryptographic hash functions
 	  and their corresponding HMACs.
 
+config CRYPTO_LIB_SHA3_KUNIT_TEST
+	tristate "KUnit tests for SHA-3" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS || CRYPTO_SELFTESTS
+	select CRYPTO_LIB_BENCHMARK_VISIBLE
+	select CRYPTO_LIB_SHA3
+	help
+	  KUnit tests for the SHA3 cryptographic hash and XOF functions,
+	  including SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128 and
+	  SHAKE256.
+
 config CRYPTO_LIB_BENCHMARK_VISIBLE
 	bool
 
 config CRYPTO_LIB_BENCHMARK
 	bool "Include benchmarks in KUnit tests for cryptographic functions"
diff --git a/lib/crypto/tests/Makefile b/lib/crypto/tests/Makefile
index f21a48a4415d0..f7d1392dc8475 100644
--- a/lib/crypto/tests/Makefile
+++ b/lib/crypto/tests/Makefile
@@ -6,5 +6,6 @@ obj-$(CONFIG_CRYPTO_LIB_CURVE25519_KUNIT_TEST) += curve25519_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_MD5_KUNIT_TEST) += md5_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_POLY1305_KUNIT_TEST) += poly1305_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_SHA1_KUNIT_TEST) += sha1_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_SHA256_KUNIT_TEST) += sha224_kunit.o sha256_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_SHA512_KUNIT_TEST) += sha384_kunit.o sha512_kunit.o
+obj-$(CONFIG_CRYPTO_LIB_SHA3_KUNIT_TEST) += sha3_kunit.o
diff --git a/lib/crypto/tests/sha3-testvecs.h b/lib/crypto/tests/sha3-testvecs.h
new file mode 100644
index 0000000000000..9c4c403cc6e06
--- /dev/null
+++ b/lib/crypto/tests/sha3-testvecs.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* This file was generated by: ./scripts/crypto/gen-hash-testvecs.py sha3-256 */
+
+static const struct {
+	size_t data_len;
+	u8 digest[SHA3_256_DIGEST_SIZE];
+} hash_testvecs[] = {
+	{
+		.data_len = 0,
+		.digest = {
+			0xa7, 0xff, 0xc6, 0xf8, 0xbf, 0x1e, 0xd7, 0x66,
+			0x51, 0xc1, 0x47, 0x56, 0xa0, 0x61, 0xd6, 0x62,
+			0xf5, 0x80, 0xff, 0x4d, 0xe4, 0x3b, 0x49, 0xfa,
+			0x82, 0xd8, 0x0a, 0x4b, 0x80, 0xf8, 0x43, 0x4a,
+		},
+	},
+	{
+		.data_len = 1,
+		.digest = {
+			0x11, 0x03, 0xe7, 0x84, 0x51, 0x50, 0x86, 0x35,
+			0x71, 0x8a, 0x70, 0xe3, 0xc4, 0x26, 0x7b, 0x21,
+			0x02, 0x13, 0xa0, 0x81, 0xe8, 0xe6, 0x14, 0x25,
+			0x07, 0x34, 0xe5, 0xc5, 0x40, 0x06, 0xf2, 0x8b,
+		},
+	},
+	{
+		.data_len = 2,
+		.digest = {
+			0x2f, 0x6f, 0x6d, 0x47, 0x48, 0x52, 0x11, 0xb9,
+			0xe4, 0x3d, 0xc8, 0x71, 0xcf, 0xb2, 0xee, 0xae,
+			0x5b, 0xf4, 0x12, 0x84, 0x5b, 0x1c, 0xec, 0x6c,
+			0xc1, 0x66, 0x88, 0xaa, 0xc3, 0x40, 0xbd, 0x7e,
+		},
+	},
+	{
+		.data_len = 3,
+		.digest = {
+			0xec, 0x02, 0xe8, 0x81, 0x4f, 0x84, 0x41, 0x69,
+			0x06, 0xd8, 0xdc, 0x1d, 0x01, 0x78, 0xd7, 0xcb,
+			0x39, 0xdf, 0xd3, 0x12, 0x1c, 0x99, 0xfd, 0xf3,
+			0x5c, 0x83, 0xc9, 0xc2, 0x7a, 0x7b, 0x6a, 0x05,
+		},
+	},
+	{
+		.data_len = 16,
+		.digest = {
+			0xff, 0x6f, 0xc3, 0x41, 0xc3, 0x5f, 0x34, 0x6d,
+			0xa7, 0xdf, 0x3e, 0xc2, 0x8b, 0x29, 0xb6, 0xf1,
+			0xf8, 0x67, 0xfd, 0xcd, 0xb1, 0x9f, 0x38, 0x08,
+			0x1d, 0x8d, 0xd9, 0xc2, 0x43, 0x66, 0x18, 0x6c,
+		},
+	},
+	{
+		.data_len = 32,
+		.digest = {
+			0xe4, 0xb1, 0x06, 0x17, 0xf8, 0x8b, 0x91, 0x95,
+			0xe7, 0x57, 0x66, 0xac, 0x08, 0xb2, 0x03, 0x3e,
+			0xf7, 0x84, 0x1f, 0xe3, 0x25, 0xa3, 0x11, 0xd2,
+			0x11, 0xa4, 0x78, 0x74, 0x2a, 0x43, 0x20, 0xa5,
+		},
+	},
+	{
+		.data_len = 48,
+		.digest = {
+			0xeb, 0x57, 0x5f, 0x20, 0xa3, 0x6b, 0xc7, 0xb4,
+			0x66, 0x2a, 0xa0, 0x30, 0x3b, 0x52, 0x00, 0xc9,
+			0xce, 0x6a, 0xd8, 0x1e, 0xbe, 0xed, 0xa1, 0xd1,
+			0xbe, 0x63, 0xc7, 0xe1, 0xe2, 0x66, 0x67, 0x0c,
+		},
+	},
+	{
+		.data_len = 49,
+		.digest = {
+			0xf0, 0x67, 0xad, 0x66, 0xbe, 0xec, 0x5a, 0xfd,
+			0x29, 0xd2, 0x4f, 0x1d, 0xb2, 0x24, 0xb8, 0x90,
+			0x05, 0x28, 0x0e, 0x66, 0x67, 0x74, 0x2d, 0xee,
+			0x66, 0x25, 0x11, 0xd1, 0x76, 0xa2, 0xfc, 0x3a,
+		},
+	},
+	{
+		.data_len = 63,
+		.digest = {
+			0x57, 0x56, 0x21, 0xb3, 0x2d, 0x2d, 0xe1, 0x9d,
+			0xbf, 0x2c, 0x82, 0xa8, 0xad, 0x7e, 0x6c, 0x46,
+			0xfb, 0x30, 0xeb, 0xce, 0xcf, 0xed, 0x2d, 0x65,
+			0xe7, 0xe4, 0x96, 0x69, 0xe0, 0x48, 0xd2, 0xb6,
+		},
+	},
+	{
+		.data_len = 64,
+		.digest = {
+			0x7b, 0xba, 0x67, 0x15, 0xe5, 0x21, 0xc4, 0x69,
+			0xd3, 0xef, 0x5c, 0x97, 0x9f, 0x5b, 0xba, 0x9c,
+			0xfa, 0x55, 0x64, 0xec, 0xb5, 0x37, 0x53, 0x1b,
+			0x3f, 0x4c, 0x0a, 0xed, 0x51, 0x98, 0x2b, 0x52,
+		},
+	},
+	{
+		.data_len = 65,
+		.digest = {
+			0x44, 0xb6, 0x6b, 0x83, 0x09, 0x83, 0x55, 0x83,
+			0xde, 0x1f, 0xcc, 0x33, 0xef, 0xdc, 0x05, 0xbb,
+			0x3b, 0x63, 0x76, 0x45, 0xe4, 0x8e, 0x14, 0x7a,
+			0x2d, 0xae, 0x90, 0xce, 0x68, 0xc3, 0xa4, 0xf2,
+		},
+	},
+	{
+		.data_len = 127,
+		.digest = {
+			0x50, 0x3e, 0x99, 0x4e, 0x28, 0x2b, 0xc9, 0xf4,
+			0xf5, 0xeb, 0x2b, 0x16, 0x04, 0x2d, 0xf5, 0xbe,
+			0xc0, 0x91, 0x41, 0x2a, 0x8e, 0x69, 0x5e, 0x39,
+			0x53, 0x2c, 0xc1, 0x18, 0xa5, 0xeb, 0xd8, 0xda,
+		},
+	},
+	{
+		.data_len = 128,
+		.digest = {
+			0x90, 0x0b, 0xa6, 0x92, 0x84, 0x30, 0xaf, 0xee,
+			0x38, 0x59, 0x83, 0x83, 0xe9, 0xfe, 0xab, 0x86,
+			0x79, 0x1b, 0xcd, 0xe7, 0x0a, 0x0f, 0x58, 0x53,
+			0x36, 0xab, 0x12, 0xe1, 0x5c, 0x97, 0xc1, 0xfb,
+		},
+	},
+	{
+		.data_len = 129,
+		.digest = {
+			0x2b, 0x52, 0x1e, 0x54, 0xbe, 0x38, 0x4c, 0x3e,
+			0x73, 0x37, 0x18, 0xf5, 0x25, 0x2c, 0xc8, 0xc7,
+			0xda, 0x7e, 0xb6, 0x47, 0x9d, 0xf4, 0x46, 0xce,
+			0xfa, 0x80, 0x20, 0x6b, 0xbd, 0xfd, 0x2a, 0xd8,
+		},
+	},
+	{
+		.data_len = 256,
+		.digest = {
+			0x45, 0xf0, 0xf5, 0x9b, 0xd9, 0x91, 0x26, 0xd5,
+			0x91, 0x3b, 0xf8, 0x87, 0x8b, 0x34, 0x02, 0x31,
+			0x64, 0xab, 0xf4, 0x1c, 0x6e, 0x34, 0x72, 0xdf,
+			0x32, 0x6d, 0xe5, 0xd2, 0x67, 0x5e, 0x86, 0x93,
+		},
+	},
+	{
+		.data_len = 511,
+		.digest = {
+			0xb3, 0xaf, 0x71, 0x64, 0xfa, 0xd4, 0xf1, 0x07,
+			0x38, 0xef, 0x04, 0x8e, 0x89, 0xf4, 0x02, 0xd2,
+			0xa5, 0xaf, 0x3b, 0xf5, 0x67, 0x56, 0xcf, 0xa9,
+			0x8e, 0x43, 0xf5, 0xb5, 0xe3, 0x91, 0x8e, 0xe7,
+		},
+	},
+	{
+		.data_len = 513,
+		.digest = {
+			0x51, 0xac, 0x0a, 0x65, 0xb7, 0x96, 0x20, 0xcf,
+			0x88, 0xf6, 0x97, 0x35, 0x89, 0x0d, 0x31, 0x0f,
+			0xbe, 0x17, 0xbe, 0x62, 0x03, 0x67, 0xc0, 0xee,
+			0x4f, 0xc1, 0xe3, 0x7f, 0x6f, 0xab, 0xac, 0xb4,
+		},
+	},
+	{
+		.data_len = 1000,
+		.digest = {
+			0x7e, 0xea, 0xa8, 0xd7, 0xde, 0x20, 0x1b, 0x58,
+			0x24, 0xd8, 0x26, 0x40, 0x36, 0x5f, 0x3f, 0xaa,
+			0xe5, 0x5a, 0xea, 0x98, 0x58, 0xd4, 0xd6, 0xfc,
+			0x20, 0x4c, 0x5c, 0x4f, 0xaf, 0x56, 0xc7, 0xc3,
+		},
+	},
+	{
+		.data_len = 3333,
+		.digest = {
+			0x61, 0xb1, 0xb1, 0x3e, 0x0e, 0x7e, 0x90, 0x3d,
+			0x31, 0x54, 0xbd, 0xc9, 0x0d, 0x53, 0x62, 0xf1,
+			0xcd, 0x18, 0x80, 0xf9, 0x91, 0x75, 0x41, 0xb3,
+			0x51, 0x39, 0x57, 0xa7, 0xa8, 0x1e, 0xfb, 0xc9,
+		},
+	},
+	{
+		.data_len = 4096,
+		.digest = {
+			0xab, 0x29, 0xda, 0x10, 0xc4, 0x11, 0x2d, 0x5c,
+			0xd1, 0xce, 0x1c, 0x95, 0xfa, 0xc6, 0xc7, 0xb0,
+			0x1b, 0xd1, 0xdc, 0x6f, 0xa0, 0x9d, 0x1b, 0x23,
+			0xfb, 0x6e, 0x90, 0x97, 0xd0, 0x75, 0x44, 0x7a,
+		},
+	},
+	{
+		.data_len = 4128,
+		.digest = {
+			0x02, 0x45, 0x95, 0xf4, 0x19, 0xb5, 0x93, 0x29,
+			0x90, 0xf2, 0x63, 0x3f, 0x89, 0xe8, 0xa5, 0x31,
+			0x76, 0xf2, 0x89, 0x79, 0x66, 0xd3, 0x96, 0xdf,
+			0x33, 0xd1, 0xa6, 0x17, 0x73, 0xb1, 0xd0, 0x45,
+		},
+	},
+	{
+		.data_len = 4160,
+		.digest = {
+			0xd1, 0x8e, 0x22, 0xea, 0x44, 0x87, 0x6e, 0x9d,
+			0xfb, 0x36, 0x02, 0x20, 0x63, 0xb7, 0x69, 0x45,
+			0x25, 0x41, 0x69, 0xe0, 0x9b, 0x87, 0xcf, 0xa3,
+			0x51, 0xbb, 0xfc, 0x8d, 0xf7, 0x29, 0xa7, 0xea,
+		},
+	},
+	{
+		.data_len = 4224,
+		.digest = {
+			0x11, 0x86, 0x7d, 0x84, 0xf9, 0x8c, 0x6e, 0xc4,
+			0x64, 0x36, 0xc6, 0xf3, 0x42, 0x92, 0x31, 0x2b,
+			0x1e, 0x12, 0xe6, 0x4d, 0xbe, 0xfa, 0x77, 0x3f,
+			0x89, 0x41, 0x33, 0x58, 0x1c, 0x98, 0x16, 0x0a,
+		},
+	},
+	{
+		.data_len = 16384,
+		.digest = {
+			0xb2, 0xba, 0x0c, 0x8c, 0x9d, 0xbb, 0x1e, 0xb0,
+			0x03, 0xb5, 0xdf, 0x4f, 0xf5, 0x35, 0xdb, 0xec,
+			0x60, 0xf2, 0x5b, 0xb6, 0xd0, 0x49, 0xd3, 0xed,
+			0x55, 0xc0, 0x7a, 0xd7, 0xaf, 0xa1, 0xea, 0x53,
+		},
+	},
+};
+
+static const u8 hash_testvec_consolidated[SHA3_256_DIGEST_SIZE] = {
+	0x3b, 0x33, 0x67, 0xf8, 0xea, 0x92, 0x78, 0x62,
+	0xdd, 0xbe, 0x72, 0x15, 0xbd, 0x6f, 0xfa, 0xe5,
+	0x5e, 0xab, 0x9f, 0xb1, 0xe4, 0x23, 0x7c, 0x2c,
+	0x80, 0xcf, 0x09, 0x75, 0xf8, 0xe2, 0xfa, 0x30,
+};
diff --git a/lib/crypto/tests/sha3_kunit.c b/lib/crypto/tests/sha3_kunit.c
new file mode 100644
index 0000000000000..c267984c4aff1
--- /dev/null
+++ b/lib/crypto/tests/sha3_kunit.c
@@ -0,0 +1,344 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2025 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#include <crypto/sha3.h>
+#include "sha3-testvecs.h"
+
+#define HASH		sha3_256
+#define HASH_CTX	sha3_ctx
+#define HASH_SIZE	SHA3_256_DIGEST_SIZE
+#define HASH_INIT	sha3_256_init
+#define HASH_UPDATE	sha3_update
+#define HASH_FINAL	sha3_final
+#include "hash-test-template.h"
+
+/*
+ * Sample message and the output generated for various algorithms by passing it
+ * into "openssl sha3-224" etc..
+ */
+static const u8 test_sha3_sample[] =
+	"The quick red fox jumped over the lazy brown dog!\n"
+	"The quick red fox jumped over the lazy brown dog!\n"
+	"The quick red fox jumped over the lazy brown dog!\n"
+	"The quick red fox jumped over the lazy brown dog!\n";
+
+static const u8 test_sha3_224[8 + SHA3_224_DIGEST_SIZE + 8] = {
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-before guard */
+	0xd6, 0xe8, 0xd8, 0x80, 0xfa, 0x42, 0x80, 0x70,
+	0x7e, 0x7f, 0xd7, 0xd2, 0xd7, 0x7a, 0x35, 0x65,
+	0xf0, 0x0b, 0x4f, 0x9f, 0x2a, 0x33, 0xca, 0x0a,
+	0xef, 0xa6, 0x4c, 0xb8,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-after guard */
+};
+
+static const u8 test_sha3_256[8 + SHA3_256_DIGEST_SIZE + 8] = {
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-before guard */
+	0xdb, 0x3b, 0xb0, 0xb8, 0x8d, 0x15, 0x78, 0xe5,
+	0x78, 0x76, 0x8e, 0x39, 0x7e, 0x89, 0x86, 0xb9,
+	0x14, 0x3a, 0x1e, 0xe7, 0x96, 0x7c, 0xf3, 0x25,
+	0x70, 0xbd, 0xc3, 0xa9, 0xae, 0x63, 0x71, 0x1d,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-after guard */
+};
+
+static const u8 test_sha3_384[8 + SHA3_384_DIGEST_SIZE + 8] = {
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-before guard */
+	0x2d, 0x4b, 0x29, 0x85, 0x19, 0x94, 0xaa, 0x31,
+	0x9b, 0x04, 0x9d, 0x6e, 0x79, 0x66, 0xc7, 0x56,
+	0x8a, 0x2e, 0x99, 0x84, 0x06, 0xcf, 0x10, 0x2d,
+	0xec, 0xf0, 0x03, 0x04, 0x1f, 0xd5, 0x99, 0x63,
+	0x2f, 0xc3, 0x2b, 0x0d, 0xd9, 0x45, 0xf7, 0xbb,
+	0x0a, 0xc3, 0x46, 0xab, 0xfe, 0x4d, 0x94, 0xc2,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-after guard */
+};
+
+static const u8 test_sha3_512[8 + SHA3_512_DIGEST_SIZE + 8] = {
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-before guard */
+	0xdd, 0x71, 0x3b, 0x44, 0xb6, 0x6c, 0xd7, 0x78,
+	0xe7, 0x93, 0xa1, 0x4c, 0xd7, 0x24, 0x16, 0xf1,
+	0xfd, 0xa2, 0x82, 0x4e, 0xed, 0x59, 0xe9, 0x83,
+	0x15, 0x38, 0x89, 0x7d, 0x39, 0x17, 0x0c, 0xb2,
+	0xcf, 0x12, 0x80, 0x78, 0xa1, 0x78, 0x41, 0xeb,
+	0xed, 0x21, 0x4c, 0xa4, 0x4a, 0x5f, 0x30, 0x1a,
+	0x70, 0x98, 0x4f, 0x14, 0xa2, 0xd1, 0x64, 0x1b,
+	0xc2, 0x0a, 0xff, 0x3b, 0xe8, 0x26, 0x41, 0x8f,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-after guard */
+};
+
+static const u8 test_shake128[8 + SHAKE128_DEFAULT_SIZE + 8] = {
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-before guard */
+	0x41, 0xd6, 0xb8, 0x9c, 0xf8, 0xe8, 0x54, 0xf2,
+	0x5c, 0xde, 0x51, 0x12, 0xaf, 0x9e, 0x0d, 0x91,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-after guard */
+};
+
+static const u8 test_shake256[8 + SHAKE256_DEFAULT_SIZE + 8] = {
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-before guard */
+	0xab, 0x06, 0xd4, 0xf9, 0x8b, 0xfd, 0xb2, 0xc4,
+	0xfe, 0xf1, 0xcc, 0xe2, 0x40, 0x45, 0xdd, 0x15,
+	0xcb, 0xdd, 0x02, 0x8d, 0xb7, 0x9f, 0x1e, 0x67,
+	0xd6, 0x7f, 0x98, 0x5e, 0x1b, 0x19, 0xf8, 0x01,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Write-after guard */
+};
+
+static void test_sha3_224_basic(struct kunit *test)
+{
+	u8 out[8 + SHA3_224_DIGEST_SIZE + 8];
+
+	BUILD_BUG_ON(sizeof(out) != sizeof(test_sha3_224));
+
+	memset(out, 0, sizeof(out));
+	sha3_224(test_sha3_sample, sizeof(test_sha3_sample) - 1, out + 8);
+
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_sha3_224, sizeof(test_sha3_224),
+			       "SHA3-224 gives wrong output");
+}
+
+static void test_sha3_256_basic(struct kunit *test)
+{
+	u8 out[8 + SHA3_256_DIGEST_SIZE + 8];
+
+	BUILD_BUG_ON(sizeof(out) != sizeof(test_sha3_256));
+
+	memset(out, 0, sizeof(out));
+	sha3_256(test_sha3_sample, sizeof(test_sha3_sample) - 1, out + 8);
+
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_sha3_256, sizeof(test_sha3_256),
+			       "SHA3-256 gives wrong output");
+}
+
+static void test_sha3_384_basic(struct kunit *test)
+{
+	u8 out[8 + SHA3_384_DIGEST_SIZE + 8];
+
+	BUILD_BUG_ON(sizeof(out) != sizeof(test_sha3_384));
+
+	memset(out, 0, sizeof(out));
+	sha3_384(test_sha3_sample, sizeof(test_sha3_sample) - 1, out + 8);
+
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_sha3_384, sizeof(test_sha3_384),
+			       "SHA3-384 gives wrong output");
+}
+
+static void test_sha3_512_basic(struct kunit *test)
+{
+	u8 out[8 + SHA3_512_DIGEST_SIZE + 8];
+
+	BUILD_BUG_ON(sizeof(out) != sizeof(test_sha3_512));
+
+	memset(out, 0, sizeof(out));
+	sha3_512(test_sha3_sample, sizeof(test_sha3_sample) - 1, out + 8);
+
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_sha3_512, sizeof(test_sha3_512),
+			       "SHA3-512 gives wrong output");
+}
+
+static void test_shake128_basic(struct kunit *test)
+{
+	u8 out[8 + SHAKE128_DEFAULT_SIZE + 8];
+
+	BUILD_BUG_ON(sizeof(out) != sizeof(test_shake128));
+
+	memset(out, 0, sizeof(out));
+	shake128(test_sha3_sample, sizeof(test_sha3_sample) - 1,
+		 out + 8, sizeof(out) - 16);
+
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake128, sizeof(test_shake128),
+			       "SHAKE128 gives wrong output");
+}
+
+static void test_shake256_basic(struct kunit *test)
+{
+	u8 out[8 + SHAKE256_DEFAULT_SIZE + 8];
+
+	BUILD_BUG_ON(sizeof(out) != sizeof(test_shake256));
+
+	memset(out, 0, sizeof(out));
+	shake256(test_sha3_sample, sizeof(test_sha3_sample) - 1,
+		 out + 8, sizeof(out) - 16);
+
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256, sizeof(test_shake256),
+			       "SHAKE256 gives wrong output");
+}
+
+/*
+ * Usable NIST tests.
+ *
+ * From: https://csrc.nist.gov/projects/cryptographic-standards-and-guidelines/example-values
+ */
+static const u8 test_nist_1600_sample[] = {
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3,
+	0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3, 0xa3
+};
+
+static const u8 test_shake128_nist_0[] = {
+	0x7f, 0x9c, 0x2b, 0xa4, 0xe8, 0x8f, 0x82, 0x7d,
+	0x61, 0x60, 0x45, 0x50, 0x76, 0x05, 0x85, 0x3e
+};
+
+static const u8 test_shake128_nist_1600[] = {
+	0x13, 0x1a, 0xb8, 0xd2, 0xb5, 0x94, 0x94, 0x6b,
+	0x9c, 0x81, 0x33, 0x3f, 0x9b, 0xb6, 0xe0, 0xce,
+};
+
+static const u8 test_shake256_nist_0[] = {
+	0x46, 0xb9, 0xdd, 0x2b, 0x0b, 0xa8, 0x8d, 0x13,
+	0x23, 0x3b, 0x3f, 0xeb, 0x74, 0x3e, 0xeb, 0x24,
+	0x3f, 0xcd, 0x52, 0xea, 0x62, 0xb8, 0x1b, 0x82,
+	0xb5, 0x0c, 0x27, 0x64, 0x6e, 0xd5, 0x76, 0x2f
+};
+
+static const u8 test_shake256_nist_1600[] = {
+	0xcd, 0x8a, 0x92, 0x0e, 0xd1, 0x41, 0xaa, 0x04,
+	0x07, 0xa2, 0x2d, 0x59, 0x28, 0x86, 0x52, 0xe9,
+	0xd9, 0xf1, 0xa7, 0xee, 0x0c, 0x1e, 0x7c, 0x1c,
+	0xa6, 0x99, 0x42, 0x4d, 0xa8, 0x4a, 0x90, 0x4d,
+};
+
+static void test_shake128_nist(struct kunit *test)
+{
+	u8 out[SHAKE128_DEFAULT_SIZE];
+
+	shake128("", 0, out, sizeof(out));
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake128_nist_0, sizeof(out),
+			       "SHAKE128 gives wrong output for NIST.0");
+
+	shake128(test_nist_1600_sample, sizeof(test_nist_1600_sample),
+		 out, sizeof(out));
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake128_nist_1600, sizeof(out),
+			       "SHAKE128 gives wrong output for NIST.1600");
+}
+
+static void test_shake256_nist(struct kunit *test)
+{
+	u8 out[SHAKE256_DEFAULT_SIZE];
+
+	shake256("", 0, out, sizeof(out));
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256_nist_0, sizeof(out),
+			       "SHAKE256 gives wrong output for NIST.0");
+
+	shake256(test_nist_1600_sample, sizeof(test_nist_1600_sample),
+		 out, sizeof(out));
+	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256_nist_1600, sizeof(out),
+			       "SHAKE256 gives wrong output for NIST.1600");
+}
+
+/*
+ * Output tiling test of SHAKE256; equal output tiles barring the last.  A
+ * series of squeezings of the same context should, if laid end-to-end, match a
+ * single squeezing of the combined size.
+ */
+static void test_shake256_tiling(struct kunit *test)
+{
+	struct shake_ctx ctx;
+	u8 out[8 + SHA3_512_DIGEST_SIZE + 8];
+
+	for (int tile_size = 1; tile_size < SHAKE256_DEFAULT_SIZE; tile_size++) {
+		int left = SHAKE256_DEFAULT_SIZE;
+		u8 *p = out + 8;
+
+		memset(out, 0, sizeof(out));
+		shake256_init(&ctx);
+		shake_update(&ctx, test_sha3_sample,
+			     sizeof(test_sha3_sample) - 1);
+		while (left > 0) {
+			int part = umin(tile_size, left);
+
+			shake_squeeze(&ctx, p, part);
+			p += part;
+			left -= part;
+		}
+
+		KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256, sizeof(test_shake256),
+				       "SHAKE tile %u gives wrong output", tile_size);
+	}
+}
+
+/*
+ * Output tiling test of SHAKE256; output tiles getting gradually smaller and
+ * then cycling round to medium sized ones.  A series of squeezings of the same
+ * context should, if laid end-to-end, match a single squeezing of the combined
+ * size.
+ */
+static void test_shake256_tiling2(struct kunit *test)
+{
+	struct shake_ctx ctx;
+	u8 out[8 + SHA3_512_DIGEST_SIZE + 8];
+
+	for (int first_tile_size = 3;
+	     first_tile_size < SHAKE256_DEFAULT_SIZE;
+	     first_tile_size++) {
+		int tile_size = first_tile_size;
+		int left = SHAKE256_DEFAULT_SIZE;
+		u8 *p = out + 8;
+
+		memset(out, 0, sizeof(out));
+		shake256_init(&ctx);
+		shake_update(&ctx, test_sha3_sample,
+			     sizeof(test_sha3_sample) - 1);
+		while (left > 0) {
+			int part = umin(tile_size, left);
+
+			shake_squeeze(&ctx, p, part);
+			p += part;
+			left -= part;
+			tile_size--;
+			if (tile_size < 1)
+				tile_size = 5;
+		}
+
+		KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256, sizeof(test_shake256),
+				       "SHAKE tile %u gives wrong output", tile_size);
+	}
+}
+
+static struct kunit_case sha3_test_cases[] = {
+	HASH_KUNIT_CASES,
+	KUNIT_CASE(test_sha3_224_basic),
+	KUNIT_CASE(test_sha3_256_basic),
+	KUNIT_CASE(test_sha3_384_basic),
+	KUNIT_CASE(test_sha3_512_basic),
+	KUNIT_CASE(test_shake128_basic),
+	KUNIT_CASE(test_shake256_basic),
+	KUNIT_CASE(test_shake128_nist),
+	KUNIT_CASE(test_shake256_nist),
+	KUNIT_CASE(test_shake256_tiling),
+	KUNIT_CASE(test_shake256_tiling2),
+	KUNIT_CASE(benchmark_hash),
+	{},
+};
+
+static struct kunit_suite sha3_test_suite = {
+	.name = "sha3",
+	.test_cases = sha3_test_cases,
+	.suite_init = hash_suite_init,
+	.suite_exit = hash_suite_exit,
+};
+kunit_test_suite(sha3_test_suite);
+
+MODULE_DESCRIPTION("KUnit tests and benchmark for SHA3");
+MODULE_LICENSE("GPL");
diff --git a/scripts/crypto/gen-hash-testvecs.py b/scripts/crypto/gen-hash-testvecs.py
index c5b7985fe7280..47f79602e2903 100755
--- a/scripts/crypto/gen-hash-testvecs.py
+++ b/scripts/crypto/gen-hash-testvecs.py
@@ -85,11 +85,11 @@ def print_c_struct_u8_array_field(name, value):
     print('\t\t},')
 
 def alg_digest_size_const(alg):
     if alg.startswith('blake2'):
         return f'{alg.upper()}_HASH_SIZE'
-    return f'{alg.upper()}_DIGEST_SIZE'
+    return f'{alg.upper().replace('-', '_')}_DIGEST_SIZE'
 
 def gen_unkeyed_testvecs(alg):
     print('')
     print('static const struct {')
     print('\tsize_t data_len;')
@@ -165,7 +165,9 @@ print(f'/* This file was generated by: {sys.argv[0]} {" ".join(sys.argv[1:])} */
 gen_unkeyed_testvecs(alg)
 if alg.startswith('blake2'):
     gen_additional_blake2_testvecs(alg)
 elif alg == 'poly1305':
     gen_additional_poly1305_testvecs()
+elif alg.startswith('sha3-'):
+    pass # no HMAC
 else:
     gen_hmac_testvecs(alg)
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (4 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test Eric Biggers
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Add the following test cases to cover gaps in the SHAKE testing:

    - test_shake_all_lens_up_to_4096()
    - test_shake_multiple_squeezes()
    - test_shake_with_guarded_bufs()

Remove test_shake256_tiling() and test_shake256_tiling2() since they are
superseded by test_shake_multiple_squeezes().  It provides better test
coverage by using randomized testing.  E.g., it's able to generate a
zero-length squeeze followed by a nonzero-length squeeze, which the
first 7 versions of the SHA-3 patchset handled incorrectly.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/tests/sha3-testvecs.h    |  20 ++-
 lib/crypto/tests/sha3_kunit.c       | 186 ++++++++++++++++++++--------
 scripts/crypto/gen-hash-testvecs.py |  27 +++-
 3 files changed, 174 insertions(+), 59 deletions(-)

diff --git a/lib/crypto/tests/sha3-testvecs.h b/lib/crypto/tests/sha3-testvecs.h
index 9c4c403cc6e06..8d614a5fa0c37 100644
--- a/lib/crypto/tests/sha3-testvecs.h
+++ b/lib/crypto/tests/sha3-testvecs.h
@@ -1,7 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
-/* This file was generated by: ./scripts/crypto/gen-hash-testvecs.py sha3-256 */
+/* This file was generated by: ./scripts/crypto/gen-hash-testvecs.py sha3 */
+
+/* SHA3-256 test vectors */
 
 static const struct {
 	size_t data_len;
 	u8 digest[SHA3_256_DIGEST_SIZE];
 } hash_testvecs[] = {
@@ -227,5 +229,21 @@ static const u8 hash_testvec_consolidated[SHA3_256_DIGEST_SIZE] = {
 	0x3b, 0x33, 0x67, 0xf8, 0xea, 0x92, 0x78, 0x62,
 	0xdd, 0xbe, 0x72, 0x15, 0xbd, 0x6f, 0xfa, 0xe5,
 	0x5e, 0xab, 0x9f, 0xb1, 0xe4, 0x23, 0x7c, 0x2c,
 	0x80, 0xcf, 0x09, 0x75, 0xf8, 0xe2, 0xfa, 0x30,
 };
+
+/* SHAKE test vectors */
+
+static const u8 shake128_testvec_consolidated[SHA3_256_DIGEST_SIZE] = {
+	0x89, 0x88, 0x3a, 0x44, 0xec, 0xfe, 0x3c, 0xeb,
+	0x2f, 0x1c, 0x1d, 0xda, 0x9e, 0x36, 0x64, 0xf0,
+	0x85, 0x4c, 0x49, 0x12, 0x76, 0x5a, 0x4d, 0xe7,
+	0xa8, 0xfd, 0xcd, 0xbe, 0x45, 0xb4, 0x6f, 0xb0,
+};
+
+static const u8 shake256_testvec_consolidated[SHA3_256_DIGEST_SIZE] = {
+	0x5a, 0xfd, 0x66, 0x62, 0x5c, 0x37, 0x2b, 0x41,
+	0x77, 0x1c, 0x01, 0x5d, 0x64, 0x7c, 0x63, 0x7a,
+	0x7c, 0x76, 0x9e, 0xa8, 0xd1, 0xb0, 0x8e, 0x02,
+	0x16, 0x9b, 0xfe, 0x0e, 0xb5, 0xd8, 0x6a, 0xb5,
+};
diff --git a/lib/crypto/tests/sha3_kunit.c b/lib/crypto/tests/sha3_kunit.c
index c267984c4aff1..ed5fbe80337fe 100644
--- a/lib/crypto/tests/sha3_kunit.c
+++ b/lib/crypto/tests/sha3_kunit.c
@@ -245,76 +245,153 @@ static void test_shake256_nist(struct kunit *test)
 		 out, sizeof(out));
 	KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256_nist_1600, sizeof(out),
 			       "SHAKE256 gives wrong output for NIST.1600");
 }
 
-/*
- * Output tiling test of SHAKE256; equal output tiles barring the last.  A
- * series of squeezings of the same context should, if laid end-to-end, match a
- * single squeezing of the combined size.
- */
-static void test_shake256_tiling(struct kunit *test)
+static void shake(int alg, const u8 *in, size_t in_len, u8 *out, size_t out_len)
 {
-	struct shake_ctx ctx;
-	u8 out[8 + SHA3_512_DIGEST_SIZE + 8];
+	if (alg == 0)
+		shake128(in, in_len, out, out_len);
+	else
+		shake256(in, in_len, out, out_len);
+}
 
-	for (int tile_size = 1; tile_size < SHAKE256_DEFAULT_SIZE; tile_size++) {
-		int left = SHAKE256_DEFAULT_SIZE;
-		u8 *p = out + 8;
+static void shake_init(struct shake_ctx *ctx, int alg)
+{
+	if (alg == 0)
+		shake128_init(ctx);
+	else
+		shake256_init(ctx);
+}
 
-		memset(out, 0, sizeof(out));
-		shake256_init(&ctx);
-		shake_update(&ctx, test_sha3_sample,
-			     sizeof(test_sha3_sample) - 1);
-		while (left > 0) {
-			int part = umin(tile_size, left);
+/*
+ * Test each of SHAKE128 and SHAKE256 with all input lengths 0 through 4096, for
+ * both input and output.  The input and output lengths cycle through the values
+ * together, so we do 4096 tests total.  To verify all the SHAKE outputs,
+ * compute and verify the SHA3-256 digest of all of them concatenated together.
+ */
+static void test_shake_all_lens_up_to_4096(struct kunit *test)
+{
+	struct sha3_ctx main_ctx;
+	const size_t max_len = 4096;
+	u8 *const in = test_buf;
+	u8 *const out = &test_buf[TEST_BUF_LEN - max_len];
+	u8 main_hash[SHA3_256_DIGEST_SIZE];
+
+	KUNIT_ASSERT_LE(test, 2 * max_len, TEST_BUF_LEN);
+
+	rand_bytes_seeded_from_len(in, max_len);
+	for (int alg = 0; alg < 2; alg++) {
+		sha3_256_init(&main_ctx);
+		for (size_t in_len = 0; in_len <= max_len; in_len++) {
+			size_t out_len = (in_len * 293) % (max_len + 1);
+
+			shake(alg, in, in_len, out, out_len);
+			sha3_update(&main_ctx, out, out_len);
+		}
+		sha3_final(&main_ctx, main_hash);
+		if (alg == 0)
+			KUNIT_ASSERT_MEMEQ_MSG(test, main_hash,
+					       shake128_testvec_consolidated,
+					       sizeof(main_hash),
+					       "shake128() gives wrong output");
+		else
+			KUNIT_ASSERT_MEMEQ_MSG(test, main_hash,
+					       shake256_testvec_consolidated,
+					       sizeof(main_hash),
+					       "shake256() gives wrong output");
+	}
+}
 
-			shake_squeeze(&ctx, p, part);
-			p += part;
-			left -= part;
+/*
+ * Test that a sequence of SHAKE squeezes gives the same output as a single
+ * squeeze of the same total length.
+ */
+static void test_shake_multiple_squeezes(struct kunit *test)
+{
+	const size_t max_len = 512;
+	u8 *ref_out;
+
+	KUNIT_ASSERT_GE(test, TEST_BUF_LEN, 2 * max_len);
+
+	ref_out = kunit_kzalloc(test, max_len, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_NULL(test, ref_out);
+
+	for (int i = 0; i < 2000; i++) {
+		const int alg = rand32() % 2;
+		const size_t in_len = rand_length(max_len);
+		const size_t out_len = rand_length(max_len);
+		const size_t in_offs = rand_offset(max_len - in_len);
+		const size_t out_offs = rand_offset(max_len - out_len);
+		u8 *const in = &test_buf[in_offs];
+		u8 *const out = &test_buf[out_offs];
+		struct shake_ctx ctx;
+		size_t remaining_len, j, num_parts;
+
+		rand_bytes(in, in_len);
+		rand_bytes(out, out_len);
+
+		/* Compute the output using the one-shot function. */
+		shake(alg, in, in_len, ref_out, out_len);
+
+		/* Compute the output using a random sequence of squeezes. */
+		shake_init(&ctx, alg);
+		shake_update(&ctx, in, in_len);
+		remaining_len = out_len;
+		j = 0;
+		num_parts = 0;
+		while (rand_bool()) {
+			size_t part_len = rand_length(remaining_len);
+
+			shake_squeeze(&ctx, &out[j], part_len);
+			num_parts++;
+			j += part_len;
+			remaining_len -= part_len;
+		}
+		if (remaining_len != 0 || rand_bool()) {
+			shake_squeeze(&ctx, &out[j], remaining_len);
+			num_parts++;
 		}
 
-		KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256, sizeof(test_shake256),
-				       "SHAKE tile %u gives wrong output", tile_size);
+		/* Verify that the outputs are the same. */
+		KUNIT_ASSERT_MEMEQ_MSG(
+			test, out, ref_out, out_len,
+			"Multi-squeeze test failed with in_len=%zu in_offs=%zu out_len=%zu out_offs=%zu num_parts=%zu alg=%d",
+			in_len, in_offs, out_len, out_offs, num_parts, alg);
 	}
 }
 
 /*
- * Output tiling test of SHAKE256; output tiles getting gradually smaller and
- * then cycling round to medium sized ones.  A series of squeezings of the same
- * context should, if laid end-to-end, match a single squeezing of the combined
- * size.
+ * Test that SHAKE operations on buffers immediately followed by an unmapped
+ * page work as expected.  This catches out-of-bounds memory accesses even if
+ * they occur in assembly code.
  */
-static void test_shake256_tiling2(struct kunit *test)
+static void test_shake_with_guarded_bufs(struct kunit *test)
 {
-	struct shake_ctx ctx;
-	u8 out[8 + SHA3_512_DIGEST_SIZE + 8];
+	const size_t max_len = 512;
+	u8 *reg_buf;
 
-	for (int first_tile_size = 3;
-	     first_tile_size < SHAKE256_DEFAULT_SIZE;
-	     first_tile_size++) {
-		int tile_size = first_tile_size;
-		int left = SHAKE256_DEFAULT_SIZE;
-		u8 *p = out + 8;
-
-		memset(out, 0, sizeof(out));
-		shake256_init(&ctx);
-		shake_update(&ctx, test_sha3_sample,
-			     sizeof(test_sha3_sample) - 1);
-		while (left > 0) {
-			int part = umin(tile_size, left);
-
-			shake_squeeze(&ctx, p, part);
-			p += part;
-			left -= part;
-			tile_size--;
-			if (tile_size < 1)
-				tile_size = 5;
-		}
+	KUNIT_ASSERT_GE(test, TEST_BUF_LEN, max_len);
 
-		KUNIT_ASSERT_MEMEQ_MSG(test, out, test_shake256, sizeof(test_shake256),
-				       "SHAKE tile %u gives wrong output", tile_size);
+	reg_buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_NULL(test, reg_buf);
+
+	for (int alg = 0; alg < 2; alg++) {
+		for (size_t len = 0; len <= max_len; len++) {
+			u8 *guarded_buf = &test_buf[TEST_BUF_LEN - len];
+
+			rand_bytes(reg_buf, len);
+			memcpy(guarded_buf, reg_buf, len);
+
+			shake(alg, reg_buf, len, reg_buf, len);
+			shake(alg, guarded_buf, len, guarded_buf, len);
+
+			KUNIT_ASSERT_MEMEQ_MSG(
+				test, reg_buf, guarded_buf, len,
+				"Guard page test failed with len=%zu alg=%d",
+				len, alg);
+		}
 	}
 }
 
 static struct kunit_case sha3_test_cases[] = {
 	HASH_KUNIT_CASES,
@@ -324,12 +401,13 @@ static struct kunit_case sha3_test_cases[] = {
 	KUNIT_CASE(test_sha3_512_basic),
 	KUNIT_CASE(test_shake128_basic),
 	KUNIT_CASE(test_shake256_basic),
 	KUNIT_CASE(test_shake128_nist),
 	KUNIT_CASE(test_shake256_nist),
-	KUNIT_CASE(test_shake256_tiling),
-	KUNIT_CASE(test_shake256_tiling2),
+	KUNIT_CASE(test_shake_all_lens_up_to_4096),
+	KUNIT_CASE(test_shake_multiple_squeezes),
+	KUNIT_CASE(test_shake_with_guarded_bufs),
 	KUNIT_CASE(benchmark_hash),
 	{},
 };
 
 static struct kunit_suite sha3_test_suite = {
diff --git a/scripts/crypto/gen-hash-testvecs.py b/scripts/crypto/gen-hash-testvecs.py
index 47f79602e2903..ae2682882cd18 100755
--- a/scripts/crypto/gen-hash-testvecs.py
+++ b/scripts/crypto/gen-hash-testvecs.py
@@ -109,10 +109,22 @@ def gen_unkeyed_testvecs(alg):
         hash_update(ctx, compute_hash(alg, data[:data_len]))
     print_static_u8_array_definition(
             f'hash_testvec_consolidated[{alg_digest_size_const(alg)}]',
             hash_final(ctx))
 
+def gen_additional_sha3_testvecs():
+    max_len = 4096
+    in_data = rand_bytes(max_len)
+    for alg in ['shake128', 'shake256']:
+        ctx = hashlib.new('sha3-256')
+        for in_len in range(max_len + 1):
+            out_len = (in_len * 293) % (max_len + 1)
+            out = hashlib.new(alg, data=in_data[:in_len]).digest(out_len)
+            ctx.update(out)
+        print_static_u8_array_definition(f'{alg}_testvec_consolidated[SHA3_256_DIGEST_SIZE]',
+                                         ctx.digest())
+
 def gen_hmac_testvecs(alg):
     ctx = hmac.new(rand_bytes(32), digestmod=alg)
     data = rand_bytes(4096)
     for data_len in range(len(data) + 1):
         ctx.update(data[:data_len])
@@ -153,21 +165,28 @@ def gen_additional_poly1305_testvecs():
             'poly1305_allones_macofmacs[POLY1305_DIGEST_SIZE]',
             Poly1305(key).update(data).digest())
 
 if len(sys.argv) != 2:
     sys.stderr.write('Usage: gen-hash-testvecs.py ALGORITHM\n')
-    sys.stderr.write('ALGORITHM may be any supported by Python hashlib, or poly1305.\n')
+    sys.stderr.write('ALGORITHM may be any supported by Python hashlib, or poly1305 or sha3.\n')
     sys.stderr.write('Example: gen-hash-testvecs.py sha512\n')
     sys.exit(1)
 
 alg = sys.argv[1]
 print('/* SPDX-License-Identifier: GPL-2.0-or-later */')
 print(f'/* This file was generated by: {sys.argv[0]} {" ".join(sys.argv[1:])} */')
-gen_unkeyed_testvecs(alg)
 if alg.startswith('blake2'):
+    gen_unkeyed_testvecs(alg)
     gen_additional_blake2_testvecs(alg)
 elif alg == 'poly1305':
+    gen_unkeyed_testvecs(alg)
     gen_additional_poly1305_testvecs()
-elif alg.startswith('sha3-'):
-    pass # no HMAC
+elif alg == 'sha3':
+    print()
+    print('/* SHA3-256 test vectors */')
+    gen_unkeyed_testvecs('sha3-256')
+    print()
+    print('/* SHAKE test vectors */')
+    gen_additional_sha3_testvecs()
 else:
+    gen_unkeyed_testvecs(alg)
     gen_hmac_testvecs(alg)
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (5 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library Eric Biggers
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Since the SHA-3 algorithms are FIPS-approved, add the boot-time
self-test which is apparently required.  This closely follows the
corresponding SHA-1, SHA-256, and SHA-512 tests.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 Documentation/crypto/sha3.rst       |  5 +++++
 lib/crypto/fips.h                   |  7 +++++++
 lib/crypto/sha3.c                   | 17 ++++++++++++++++-
 scripts/crypto/gen-fips-testvecs.py |  4 ++++
 4 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/crypto/sha3.rst b/Documentation/crypto/sha3.rst
index f8c484feaa291..37640f295118b 100644
--- a/Documentation/crypto/sha3.rst
+++ b/Documentation/crypto/sha3.rst
@@ -110,10 +110,15 @@ Once all the desired output has been extracted, zeroize the context::
 Testing
 =======
 
 To test the SHA-3 code, use sha3_kunit (CONFIG_CRYPTO_LIB_SHA3_KUNIT_TEST).
 
+Since the SHA-3 algorithms are FIPS-approved, when the kernel is booted in FIPS
+mode the SHA-3 library also performs a simple self-test.  This is purely to meet
+a FIPS requirement.  Normal testing done by kernel developers and integrators
+should use the much more comprehensive KUnit test suite instead.
+
 
 References
 ==========
 
 .. [1] https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
diff --git a/lib/crypto/fips.h b/lib/crypto/fips.h
index 78a1bdd33a151..023410c2e0dbc 100644
--- a/lib/crypto/fips.h
+++ b/lib/crypto/fips.h
@@ -34,5 +34,12 @@ static const u8 fips_test_hmac_sha512_value[] __initconst __maybe_unused = {
 	0x92, 0x7e, 0x3c, 0xbd, 0xb1, 0x3c, 0x49, 0x98,
 	0x44, 0x9c, 0x8f, 0xee, 0x3f, 0x02, 0x71, 0x51,
 	0x57, 0x0b, 0x15, 0x38, 0x95, 0xd8, 0xa3, 0x81,
 	0xba, 0xb3, 0x15, 0x37, 0x5c, 0x6d, 0x57, 0x2b,
 };
+
+static const u8 fips_test_sha3_256_value[] __initconst __maybe_unused = {
+	0x77, 0xc4, 0x8b, 0x69, 0x70, 0x5f, 0x0a, 0xb1,
+	0xb1, 0xa5, 0x82, 0x0a, 0x22, 0x2b, 0x49, 0x31,
+	0xba, 0x9b, 0xb6, 0xaa, 0x32, 0xa7, 0x97, 0x00,
+	0x98, 0xdb, 0xff, 0xe7, 0xc6, 0xde, 0xb5, 0x82,
+};
diff --git a/lib/crypto/sha3.c b/lib/crypto/sha3.c
index ee7a2ca92b2c5..6c94b4ebd0fd1 100644
--- a/lib/crypto/sha3.c
+++ b/lib/crypto/sha3.c
@@ -15,10 +15,11 @@
 #include <crypto/utils.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/unaligned.h>
+#include "fips.h"
 
 /*
  * On some 32-bit architectures, such as h8300, GCC ends up using over 1 KB of
  * stack if the round calculation gets inlined into the loop in
  * sha3_keccakf_generic().  On the other hand, on 64-bit architectures with
@@ -339,14 +340,28 @@ void shake256(const u8 *in, size_t in_len, u8 *out, size_t out_len)
 	shake_squeeze(&ctx, out, out_len);
 	shake_zeroize_ctx(&ctx);
 }
 EXPORT_SYMBOL_GPL(shake256);
 
-#ifdef sha3_mod_init_arch
+#if defined(sha3_mod_init_arch) || defined(CONFIG_CRYPTO_FIPS)
 static int __init sha3_mod_init(void)
 {
+#ifdef sha3_mod_init_arch
 	sha3_mod_init_arch();
+#endif
+	if (fips_enabled) {
+		/*
+		 * FIPS cryptographic algorithm self-test.  As per the FIPS
+		 * Implementation Guidance, testing any SHA-3 algorithm
+		 * satisfies the test requirement for all of them.
+		 */
+		u8 hash[SHA3_256_DIGEST_SIZE];
+
+		sha3_256(fips_test_data, sizeof(fips_test_data), hash);
+		if (memcmp(fips_test_sha3_256_value, hash, sizeof(hash)) != 0)
+			panic("sha3: FIPS self-test failed\n");
+	}
 	return 0;
 }
 subsys_initcall(sha3_mod_init);
 
 static void __exit sha3_mod_exit(void)
diff --git a/scripts/crypto/gen-fips-testvecs.py b/scripts/crypto/gen-fips-testvecs.py
index 2956f88b764ae..db873f88619ac 100755
--- a/scripts/crypto/gen-fips-testvecs.py
+++ b/scripts/crypto/gen-fips-testvecs.py
@@ -3,10 +3,11 @@
 #
 # Script that generates lib/crypto/fips.h
 #
 # Copyright 2025 Google LLC
 
+import hashlib
 import hmac
 
 fips_test_data = b"fips test data\0\0"
 fips_test_key = b"fips test key\0\0\0"
 
@@ -28,5 +29,8 @@ print_static_u8_array_definition("fips_test_key", fips_test_key)
 
 for alg in 'sha1', 'sha256', 'sha512':
     ctx = hmac.new(fips_test_key, digestmod=alg)
     ctx.update(fips_test_data)
     print_static_u8_array_definition(f'fips_test_hmac_{alg}_value', ctx.digest())
+
+print_static_u8_array_definition(f'fips_test_sha3_256_value',
+                                 hashlib.sha3_256(fips_test_data).digest())
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (6 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library Eric Biggers
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

- Use size_t lengths, to match the library.

- Pass the block size instead of digest size, and add support for the
  block size that SHAKE128 uses.  This allows the code to be used with
  SHAKE128 and SHAKE256, which don't have the concept of a digest size.
  SHAKE256 has the same block size as SHA3-256, but SHAKE128 has a
  unique block size.  Thus, there are now 5 supported block sizes.

Don't bother changing the "glue" code arm64_sha3_update() too much, as
it gets deleted when the SHA-3 code is migrated into lib/crypto/ anyway.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm64/crypto/sha3-ce-core.S | 67 ++++++++++++++++----------------
 arch/arm64/crypto/sha3-ce-glue.c | 11 +++---
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S
index 9c77313f5a608..b62bd714839b1 100644
--- a/arch/arm64/crypto/sha3-ce-core.S
+++ b/arch/arm64/crypto/sha3-ce-core.S
@@ -35,11 +35,15 @@
 	.macro	xar, rd, rn, rm, imm6
 	.inst	0xce800000 | .L\rd | (.L\rn << 5) | ((\imm6) << 10) | (.L\rm << 16)
 	.endm
 
 	/*
-	 * int sha3_ce_transform(u64 *st, const u8 *data, int blocks, int dg_size)
+	 * size_t sha3_ce_transform(struct sha3_state *state, const u8 *data,
+	 *			    size_t nblocks, size_t block_size)
+	 *
+	 * block_size is assumed to be one of 72 (SHA3-512), 104 (SHA3-384), 136
+	 * (SHA3-256 and SHAKE256), 144 (SHA3-224), or 168 (SHAKE128).
 	 */
 	.text
 SYM_FUNC_START(sha3_ce_transform)
 	/* load state */
 	add	x8, x0, #32
@@ -49,62 +53,59 @@ SYM_FUNC_START(sha3_ce_transform)
 	ld1	{v12.1d-v15.1d}, [x8], #32
 	ld1	{v16.1d-v19.1d}, [x8], #32
 	ld1	{v20.1d-v23.1d}, [x8], #32
 	ld1	{v24.1d}, [x8]
 
-0:	sub	w2, w2, #1
+0:	sub	x2, x2, #1
 	mov	w8, #24
 	adr_l	x9, .Lsha3_rcon
 
 	/* load input */
 	ld1	{v25.8b-v28.8b}, [x1], #32
-	ld1	{v29.8b-v31.8b}, [x1], #24
+	ld1	{v29.8b}, [x1], #8
 	eor	v0.8b, v0.8b, v25.8b
 	eor	v1.8b, v1.8b, v26.8b
 	eor	v2.8b, v2.8b, v27.8b
 	eor	v3.8b, v3.8b, v28.8b
 	eor	v4.8b, v4.8b, v29.8b
-	eor	v5.8b, v5.8b, v30.8b
-	eor	v6.8b, v6.8b, v31.8b
-
-	tbnz	x3, #6, 2f		// SHA3-512
 
 	ld1	{v25.8b-v28.8b}, [x1], #32
-	ld1	{v29.8b-v30.8b}, [x1], #16
-	eor	 v7.8b,  v7.8b, v25.8b
-	eor	 v8.8b,  v8.8b, v26.8b
-	eor	 v9.8b,  v9.8b, v27.8b
-	eor	v10.8b, v10.8b, v28.8b
-	eor	v11.8b, v11.8b, v29.8b
-	eor	v12.8b, v12.8b, v30.8b
+	eor	v5.8b, v5.8b, v25.8b
+	eor	v6.8b, v6.8b, v26.8b
+	eor	v7.8b, v7.8b, v27.8b
+	eor	v8.8b, v8.8b, v28.8b
+	cmp	x3, #72
+	b.eq	3f	/* SHA3-512 (block_size=72)? */
 
-	tbnz	x3, #4, 1f		// SHA3-384 or SHA3-224
+	ld1	{v25.8b-v28.8b}, [x1], #32
+	eor	v9.8b, v9.8b, v25.8b
+	eor	v10.8b, v10.8b, v26.8b
+	eor	v11.8b, v11.8b, v27.8b
+	eor	v12.8b, v12.8b, v28.8b
+	cmp	x3, #104
+	b.eq	3f	/* SHA3-384 (block_size=104)? */
 
-	// SHA3-256
 	ld1	{v25.8b-v28.8b}, [x1], #32
 	eor	v13.8b, v13.8b, v25.8b
 	eor	v14.8b, v14.8b, v26.8b
 	eor	v15.8b, v15.8b, v27.8b
 	eor	v16.8b, v16.8b, v28.8b
-	b	3f
-
-1:	tbz	x3, #2, 3f		// bit 2 cleared? SHA-384
+	cmp	x3, #144
+	b.lt	3f	/* SHA3-256 or SHAKE256 (block_size=136)? */
+	b.eq	2f	/* SHA3-224 (block_size=144)? */
 
-	// SHA3-224
+	/* SHAKE128 (block_size=168) */
 	ld1	{v25.8b-v28.8b}, [x1], #32
-	ld1	{v29.8b}, [x1], #8
-	eor	v13.8b, v13.8b, v25.8b
-	eor	v14.8b, v14.8b, v26.8b
-	eor	v15.8b, v15.8b, v27.8b
-	eor	v16.8b, v16.8b, v28.8b
-	eor	v17.8b, v17.8b, v29.8b
+	eor	v17.8b, v17.8b, v25.8b
+	eor	v18.8b, v18.8b, v26.8b
+	eor	v19.8b, v19.8b, v27.8b
+	eor	v20.8b, v20.8b, v28.8b
 	b	3f
-
-	// SHA3-512
-2:	ld1	{v25.8b-v26.8b}, [x1], #16
-	eor	 v7.8b,  v7.8b, v25.8b
-	eor	 v8.8b,  v8.8b, v26.8b
+2:
+	/* SHA3-224 (block_size=144) */
+	ld1	{v25.8b}, [x1], #8
+	eor	v17.8b, v17.8b, v25.8b
 
 3:	sub	w8, w8, #1
 
 	eor3	v29.16b,  v4.16b,  v9.16b, v14.16b
 	eor3	v26.16b,  v1.16b,  v6.16b, v11.16b
@@ -183,21 +184,21 @@ SYM_FUNC_START(sha3_ce_transform)
 
 	eor	 v0.16b,  v0.16b, v31.16b
 
 	cbnz	w8, 3b
 	cond_yield 4f, x8, x9
-	cbnz	w2, 0b
+	cbnz	x2, 0b
 
 	/* save state */
 4:	st1	{ v0.1d- v3.1d}, [x0], #32
 	st1	{ v4.1d- v7.1d}, [x0], #32
 	st1	{ v8.1d-v11.1d}, [x0], #32
 	st1	{v12.1d-v15.1d}, [x0], #32
 	st1	{v16.1d-v19.1d}, [x0], #32
 	st1	{v20.1d-v23.1d}, [x0], #32
 	st1	{v24.1d}, [x0]
-	mov	w0, w2
+	mov	x0, x2
 	ret
 SYM_FUNC_END(sha3_ce_transform)
 
 	.section	".rodata", "a"
 	.align		8
diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c
index f5c8302349337..250f4fb76b472 100644
--- a/arch/arm64/crypto/sha3-ce-glue.c
+++ b/arch/arm64/crypto/sha3-ce-glue.c
@@ -26,30 +26,29 @@ MODULE_LICENSE("GPL v2");
 MODULE_ALIAS_CRYPTO("sha3-224");
 MODULE_ALIAS_CRYPTO("sha3-256");
 MODULE_ALIAS_CRYPTO("sha3-384");
 MODULE_ALIAS_CRYPTO("sha3-512");
 
-asmlinkage int sha3_ce_transform(u64 *st, const u8 *data, int blocks,
-				 int md_len);
+asmlinkage size_t sha3_ce_transform(struct sha3_state *state, const u8 *data,
+				    size_t nblocks, size_t block_size);
 
 static int arm64_sha3_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
 	struct sha3_state *sctx = shash_desc_ctx(desc);
 	struct crypto_shash *tfm = desc->tfm;
-	unsigned int bs, ds;
+	unsigned int bs;
 	int blocks;
 
-	ds = crypto_shash_digestsize(tfm);
 	bs = crypto_shash_blocksize(tfm);
 	blocks = len / bs;
 	len -= blocks * bs;
 	do {
 		int rem;
 
 		kernel_neon_begin();
-		rem = sha3_ce_transform(sctx->st, data, blocks, ds);
+		rem = sha3_ce_transform(sctx, data, blocks, bs);
 		kernel_neon_end();
 		data += (blocks - rem) * bs;
 		blocks = rem;
 	} while (blocks);
 	return len;
@@ -72,11 +71,11 @@ static int sha3_finup(struct shash_desc *desc, const u8 *src, unsigned int len,
 	block[len++] = 0x06;
 	memset(block + len, 0, bs - len);
 	block[bs - 1] |= 0x80;
 
 	kernel_neon_begin();
-	sha3_ce_transform(sctx->st, block, 1, ds);
+	sha3_ce_transform(sctx, block, 1, bs);
 	kernel_neon_end();
 	memzero_explicit(block , sizeof(block));
 
 	for (i = 0; i < ds / 8; i++)
 		put_unaligned_le64(sctx->st[i], digest++);
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (7 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions Eric Biggers
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Instead of exposing the arm64-optimized SHA-3 code via arm64-specific
crypto_shash algorithms, instead just implement the sha3_absorb_blocks()
and sha3_keccakf() library functions.  This is much simpler, it makes
the SHA-3 library functions be arm64-optimized, and it fixes the
longstanding issue where the arm64-optimized SHA-3 code was disabled by
default.  SHA-3 still remains available through crypto_shash, but
individual architectures no longer need to handle it.

Note: to see the diff from arch/arm64/crypto/sha3-ce-glue.c to
lib/crypto/arm64/sha3.h, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/arm64/configs/defconfig                  |   2 +-
 arch/arm64/crypto/Kconfig                     |  11 --
 arch/arm64/crypto/Makefile                    |   3 -
 arch/arm64/crypto/sha3-ce-glue.c              | 150 ------------------
 lib/crypto/Kconfig                            |   5 +
 lib/crypto/Makefile                           |   5 +
 .../crypto/arm64}/sha3-ce-core.S              |   0
 lib/crypto/arm64/sha3.h                       |  62 ++++++++
 8 files changed, 73 insertions(+), 165 deletions(-)
 delete mode 100644 arch/arm64/crypto/sha3-ce-glue.c
 rename {arch/arm64/crypto => lib/crypto/arm64}/sha3-ce-core.S (100%)
 create mode 100644 lib/crypto/arm64/sha3.h

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index e3a2d37bd1042..20dd3a39faead 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1781,14 +1781,14 @@ CONFIG_SECURITY=y
 CONFIG_CRYPTO_USER=y
 CONFIG_CRYPTO_CHACHA20=m
 CONFIG_CRYPTO_BENCHMARK=m
 CONFIG_CRYPTO_ECHAINIV=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA3=m
 CONFIG_CRYPTO_ANSI_CPRNG=y
 CONFIG_CRYPTO_USER_API_RNG=m
 CONFIG_CRYPTO_GHASH_ARM64_CE=y
-CONFIG_CRYPTO_SHA3_ARM64=m
 CONFIG_CRYPTO_SM3_ARM64_CE=m
 CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
 CONFIG_CRYPTO_AES_ARM64_BS=m
 CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
 CONFIG_CRYPTO_DEV_SUN8I_CE=m
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 91f3093eee6ab..376d6b50743ff 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -23,21 +23,10 @@ config CRYPTO_NHPOLY1305_NEON
 	  NHPoly1305 hash function (Adiantum)
 
 	  Architecture: arm64 using:
 	  - NEON (Advanced SIMD) extensions
 
-config CRYPTO_SHA3_ARM64
-	tristate "Hash functions: SHA-3 (ARMv8.2 Crypto Extensions)"
-	depends on KERNEL_MODE_NEON
-	select CRYPTO_HASH
-	select CRYPTO_SHA3
-	help
-	  SHA-3 secure hash algorithms (FIPS 202)
-
-	  Architecture: arm64 using:
-	  - ARMv8.2 Crypto Extensions
-
 config CRYPTO_SM3_NEON
 	tristate "Hash functions: SM3 (NEON)"
 	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
 	select CRYPTO_LIB_SM3
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index a8b2cdbe202c1..fd3d590fa1137 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -3,13 +3,10 @@
 # linux/arch/arm64/crypto/Makefile
 #
 # Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
 #
 
-obj-$(CONFIG_CRYPTO_SHA3_ARM64) += sha3-ce.o
-sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o
-
 obj-$(CONFIG_CRYPTO_SM3_NEON) += sm3-neon.o
 sm3-neon-y := sm3-neon-glue.o sm3-neon-core.o
 
 obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o
 sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o
diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c
deleted file mode 100644
index 250f4fb76b472..0000000000000
--- a/arch/arm64/crypto/sha3-ce-glue.c
+++ /dev/null
@@ -1,150 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * sha3-ce-glue.c - core SHA-3 transform using v8.2 Crypto Extensions
- *
- * Copyright (C) 2018 Linaro Ltd <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <asm/hwcap.h>
-#include <asm/neon.h>
-#include <asm/simd.h>
-#include <crypto/internal/hash.h>
-#include <crypto/sha3.h>
-#include <linux/cpufeature.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-#include <linux/unaligned.h>
-
-MODULE_DESCRIPTION("SHA3 secure hash using ARMv8 Crypto Extensions");
-MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
-MODULE_LICENSE("GPL v2");
-MODULE_ALIAS_CRYPTO("sha3-224");
-MODULE_ALIAS_CRYPTO("sha3-256");
-MODULE_ALIAS_CRYPTO("sha3-384");
-MODULE_ALIAS_CRYPTO("sha3-512");
-
-asmlinkage size_t sha3_ce_transform(struct sha3_state *state, const u8 *data,
-				    size_t nblocks, size_t block_size);
-
-static int arm64_sha3_update(struct shash_desc *desc, const u8 *data,
-			     unsigned int len)
-{
-	struct sha3_state *sctx = shash_desc_ctx(desc);
-	struct crypto_shash *tfm = desc->tfm;
-	unsigned int bs;
-	int blocks;
-
-	bs = crypto_shash_blocksize(tfm);
-	blocks = len / bs;
-	len -= blocks * bs;
-	do {
-		int rem;
-
-		kernel_neon_begin();
-		rem = sha3_ce_transform(sctx, data, blocks, bs);
-		kernel_neon_end();
-		data += (blocks - rem) * bs;
-		blocks = rem;
-	} while (blocks);
-	return len;
-}
-
-static int sha3_finup(struct shash_desc *desc, const u8 *src, unsigned int len,
-		      u8 *out)
-{
-	struct sha3_state *sctx = shash_desc_ctx(desc);
-	struct crypto_shash *tfm = desc->tfm;
-	__le64 *digest = (__le64 *)out;
-	u8 block[SHA3_224_BLOCK_SIZE];
-	unsigned int bs, ds;
-	int i;
-
-	ds = crypto_shash_digestsize(tfm);
-	bs = crypto_shash_blocksize(tfm);
-	memcpy(block, src, len);
-
-	block[len++] = 0x06;
-	memset(block + len, 0, bs - len);
-	block[bs - 1] |= 0x80;
-
-	kernel_neon_begin();
-	sha3_ce_transform(sctx, block, 1, bs);
-	kernel_neon_end();
-	memzero_explicit(block , sizeof(block));
-
-	for (i = 0; i < ds / 8; i++)
-		put_unaligned_le64(sctx->st[i], digest++);
-
-	if (ds & 4)
-		put_unaligned_le32(sctx->st[i], (__le32 *)digest);
-
-	return 0;
-}
-
-static struct shash_alg algs[] = { {
-	.digestsize		= SHA3_224_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= arm64_sha3_update,
-	.finup			= sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-224",
-	.base.cra_driver_name	= "sha3-224-ce",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_224_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-	.base.cra_priority	= 200,
-}, {
-	.digestsize		= SHA3_256_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= arm64_sha3_update,
-	.finup			= sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-256",
-	.base.cra_driver_name	= "sha3-256-ce",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_256_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-	.base.cra_priority	= 200,
-}, {
-	.digestsize		= SHA3_384_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= arm64_sha3_update,
-	.finup			= sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-384",
-	.base.cra_driver_name	= "sha3-384-ce",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_384_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-	.base.cra_priority	= 200,
-}, {
-	.digestsize		= SHA3_512_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= arm64_sha3_update,
-	.finup			= sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-512",
-	.base.cra_driver_name	= "sha3-512-ce",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_512_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-	.base.cra_priority	= 200,
-} };
-
-static int __init sha3_neon_mod_init(void)
-{
-	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
-}
-
-static void __exit sha3_neon_mod_fini(void)
-{
-	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
-}
-
-module_cpu_feature_match(SHA3, sha3_neon_mod_init);
-module_exit(sha3_neon_mod_fini);
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index a05f5a349cd8c..587490ca65654 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -200,10 +200,15 @@ config CRYPTO_LIB_SHA3
 	select CRYPTO_LIB_UTILS
 	help
 	  The SHA3 library functions.  Select this if your module uses any of
 	  the functions from <crypto/sha3.h>.
 
+config CRYPTO_LIB_SHA3_ARCH
+	bool
+	depends on CRYPTO_LIB_SHA3 && !UML
+	default y if ARM64 && KERNEL_MODE_NEON
+
 config CRYPTO_LIB_SM3
 	tristate
 
 source "lib/crypto/tests/Kconfig"
 
diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index 0cfdb511f32b6..5515e73bfd5e3 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -279,10 +279,15 @@ endif # CONFIG_CRYPTO_LIB_SHA512_ARCH
 ################################################################################
 
 obj-$(CONFIG_CRYPTO_LIB_SHA3) += libsha3.o
 libsha3-y := sha3.o
 
+ifeq ($(CONFIG_CRYPTO_LIB_SHA3_ARCH),y)
+CFLAGS_sha3.o += -I$(src)/$(SRCARCH)
+libsha3-$(CONFIG_ARM64) += arm64/sha3-ce-core.o
+endif # CONFIG_CRYPTO_LIB_SHA3_ARCH
+
 ################################################################################
 
 obj-$(CONFIG_MPILIB) += mpi/
 
 obj-$(CONFIG_CRYPTO_SELFTESTS_FULL)		+= simd.o
diff --git a/arch/arm64/crypto/sha3-ce-core.S b/lib/crypto/arm64/sha3-ce-core.S
similarity index 100%
rename from arch/arm64/crypto/sha3-ce-core.S
rename to lib/crypto/arm64/sha3-ce-core.S
diff --git a/lib/crypto/arm64/sha3.h b/lib/crypto/arm64/sha3.h
new file mode 100644
index 0000000000000..6dd5183056da4
--- /dev/null
+++ b/lib/crypto/arm64/sha3.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <linux/cpufeature.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_sha3);
+
+asmlinkage size_t sha3_ce_transform(struct sha3_state *state, const u8 *data,
+				    size_t nblocks, size_t block_size);
+
+static void sha3_absorb_blocks(struct sha3_state *state, const u8 *data,
+			       size_t nblocks, size_t block_size)
+{
+	if (static_branch_likely(&have_sha3) && likely(may_use_simd())) {
+		do {
+			size_t rem;
+
+			kernel_neon_begin();
+			rem = sha3_ce_transform(state, data, nblocks,
+						block_size);
+			kernel_neon_end();
+			data += (nblocks - rem) * block_size;
+			nblocks = rem;
+		} while (nblocks);
+	} else {
+		sha3_absorb_blocks_generic(state, data, nblocks, block_size);
+	}
+}
+
+static void sha3_keccakf(struct sha3_state *state)
+{
+	if (static_branch_likely(&have_sha3) && likely(may_use_simd())) {
+		/*
+		 * Passing zeroes into sha3_ce_transform() gives the plain
+		 * Keccak-f permutation, which is what we want here.  Any
+		 * supported block size may be used.  Use SHA3_512_BLOCK_SIZE
+		 * since it's the shortest.
+		 */
+		static const u8 zeroes[SHA3_512_BLOCK_SIZE];
+
+		kernel_neon_begin();
+		sha3_ce_transform(state, zeroes, 1, sizeof(zeroes));
+		kernel_neon_end();
+	} else {
+		sha3_keccakf_generic(state);
+	}
+}
+
+#define sha3_mod_init_arch sha3_mod_init_arch
+static void sha3_mod_init_arch(void)
+{
+	if (cpu_have_named_feature(SHA3))
+		static_branch_enable(&have_sha3);
+}
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (8 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions Eric Biggers
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Implement sha3_absorb_blocks() and sha3_keccakf() using the hardware-
accelerated SHA-3 support in Message-Security-Assist Extension 6.

This accelerates the SHA3-224, SHA3-256, SHA3-384, SHA3-512, and
SHAKE256 library functions.

Note that arch/s390/crypto/ already has SHA-3 code that uses this
extension, but it is exposed only via crypto_shash.  This commit brings
the same acceleration to the SHA-3 library.  The arch/s390/crypto/
version will become redundant and be removed in later changes.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/Kconfig     |  1 +
 lib/crypto/s390/sha3.h | 88 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100644 lib/crypto/s390/sha3.h

diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index 587490ca65654..7445054fc0ad4 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -204,10 +204,11 @@ config CRYPTO_LIB_SHA3
 
 config CRYPTO_LIB_SHA3_ARCH
 	bool
 	depends on CRYPTO_LIB_SHA3 && !UML
 	default y if ARM64 && KERNEL_MODE_NEON
+	default y if S390
 
 config CRYPTO_LIB_SM3
 	tristate
 
 source "lib/crypto/tests/Kconfig"
diff --git a/lib/crypto/s390/sha3.h b/lib/crypto/s390/sha3.h
new file mode 100644
index 0000000000000..668e53da93d2c
--- /dev/null
+++ b/lib/crypto/s390/sha3.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * SHA-3 optimized using the CP Assist for Cryptographic Functions (CPACF)
+ *
+ * Copyright 2025 Google LLC
+ */
+#include <asm/cpacf.h>
+#include <linux/cpufeature.h>
+
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_sha3);
+
+static void sha3_absorb_blocks(struct sha3_state *state, const u8 *data,
+			       size_t nblocks, size_t block_size)
+{
+	if (static_branch_likely(&have_sha3)) {
+		/*
+		 * Note that KIMD assumes little-endian order of the state
+		 * words.  sha3_state already uses that order, though, so
+		 * there's no need for a byteswap.
+		 */
+		switch (block_size) {
+		case SHA3_224_BLOCK_SIZE:
+			cpacf_kimd(CPACF_KIMD_SHA3_224, state,
+				   data, nblocks * block_size);
+			return;
+		case SHA3_256_BLOCK_SIZE:
+			/*
+			 * This case handles both SHA3-256 and SHAKE256, since
+			 * they have the same block size.
+			 */
+			cpacf_kimd(CPACF_KIMD_SHA3_256, state,
+				   data, nblocks * block_size);
+			return;
+		case SHA3_384_BLOCK_SIZE:
+			cpacf_kimd(CPACF_KIMD_SHA3_384, state,
+				   data, nblocks * block_size);
+			return;
+		case SHA3_512_BLOCK_SIZE:
+			cpacf_kimd(CPACF_KIMD_SHA3_512, state,
+				   data, nblocks * block_size);
+			return;
+		}
+	}
+	sha3_absorb_blocks_generic(state, data, nblocks, block_size);
+}
+
+static void sha3_keccakf(struct sha3_state *state)
+{
+	if (static_branch_likely(&have_sha3)) {
+		/*
+		 * Passing zeroes into any of CPACF_KIMD_SHA3_* gives the plain
+		 * Keccak-f permutation, which is what we want here.  Use
+		 * SHA3-512 since it has the smallest block size.
+		 */
+		static const u8 zeroes[SHA3_512_BLOCK_SIZE];
+
+		cpacf_kimd(CPACF_KIMD_SHA3_512, state, zeroes, sizeof(zeroes));
+	} else {
+		sha3_keccakf_generic(state);
+	}
+}
+
+#define sha3_mod_init_arch sha3_mod_init_arch
+static void sha3_mod_init_arch(void)
+{
+	int num_present = 0;
+	int num_possible = 0;
+
+	if (!cpu_have_feature(S390_CPU_FEATURE_MSA))
+		return;
+	/*
+	 * Since all the SHA-3 functions are in Message-Security-Assist
+	 * Extension 6, just treat them as all or nothing.  This way we need
+	 * only one static_key.
+	 */
+#define QUERY(opcode, func) \
+	({ num_present += !!cpacf_query_func(opcode, func); num_possible++; })
+	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_224);
+	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_256);
+	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_384);
+	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_512);
+#undef QUERY
+
+	if (num_present == num_possible)
+		static_branch_enable(&have_sha3);
+	else if (num_present != 0)
+		pr_warn("Unsupported combination of SHA-3 facilities\n");
+}
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (9 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 " Eric Biggers
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Add support for architecture-specific overrides of sha3_224(),
sha3_256(), sha3_384(), and sha3_512().  This will be used to implement
these functions more efficiently on s390 than is possible via the usual
init + update + final flow.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/sha3.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/lib/crypto/sha3.c b/lib/crypto/sha3.c
index 6c94b4ebd0fd1..a598138d47a59 100644
--- a/lib/crypto/sha3.c
+++ b/lib/crypto/sha3.c
@@ -278,44 +278,81 @@ void shake_squeeze(struct shake_ctx *shake_ctx, u8 *out, size_t out_len)
 	}
 	ctx->squeeze_offset = squeeze_offset;
 }
 EXPORT_SYMBOL_GPL(shake_squeeze);
 
+#ifndef sha3_224_arch
+static inline bool sha3_224_arch(const u8 *in, size_t in_len,
+				 u8 out[SHA3_224_DIGEST_SIZE])
+{
+	return false;
+}
+#endif
+#ifndef sha3_256_arch
+static inline bool sha3_256_arch(const u8 *in, size_t in_len,
+				 u8 out[SHA3_256_DIGEST_SIZE])
+{
+	return false;
+}
+#endif
+#ifndef sha3_384_arch
+static inline bool sha3_384_arch(const u8 *in, size_t in_len,
+				 u8 out[SHA3_384_DIGEST_SIZE])
+{
+	return false;
+}
+#endif
+#ifndef sha3_512_arch
+static inline bool sha3_512_arch(const u8 *in, size_t in_len,
+				 u8 out[SHA3_512_DIGEST_SIZE])
+{
+	return false;
+}
+#endif
+
 void sha3_224(const u8 *in, size_t in_len, u8 out[SHA3_224_DIGEST_SIZE])
 {
 	struct sha3_ctx ctx;
 
+	if (sha3_224_arch(in, in_len, out))
+		return;
 	sha3_224_init(&ctx);
 	sha3_update(&ctx, in, in_len);
 	sha3_final(&ctx, out);
 }
 EXPORT_SYMBOL_GPL(sha3_224);
 
 void sha3_256(const u8 *in, size_t in_len, u8 out[SHA3_256_DIGEST_SIZE])
 {
 	struct sha3_ctx ctx;
 
+	if (sha3_256_arch(in, in_len, out))
+		return;
 	sha3_256_init(&ctx);
 	sha3_update(&ctx, in, in_len);
 	sha3_final(&ctx, out);
 }
 EXPORT_SYMBOL_GPL(sha3_256);
 
 void sha3_384(const u8 *in, size_t in_len, u8 out[SHA3_384_DIGEST_SIZE])
 {
 	struct sha3_ctx ctx;
 
+	if (sha3_384_arch(in, in_len, out))
+		return;
 	sha3_384_init(&ctx);
 	sha3_update(&ctx, in, in_len);
 	sha3_final(&ctx, out);
 }
 EXPORT_SYMBOL_GPL(sha3_384);
 
 void sha3_512(const u8 *in, size_t in_len, u8 out[SHA3_512_DIGEST_SIZE])
 {
 	struct sha3_ctx ctx;
 
+	if (sha3_512_arch(in, in_len, out))
+		return;
 	sha3_512_init(&ctx);
 	sha3_update(&ctx, in, in_len);
 	sha3_final(&ctx, out);
 }
 EXPORT_SYMBOL_GPL(sha3_512);
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (10 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation Eric Biggers
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Some z/Architecture processors can compute a SHA-3 digest in a single
instruction.  Use this capability to implement the sha3_224(),
sha3_256(), sha3_384(), and sha3_512() library functions.

Note that the performance improvement is likely to be relatively small
and be noticeable primarily on short messages, as the actual Keccak
permutation is already accelerated via the implementations of
sha3_absorb_blocks() and sha3_keccakf().  Nevertheless,
arch/s390/crypto/ takes advantage of the "do the full SHA-3" capability,
and it was requested that lib/crypto/ do so as well for parity with it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/s390/sha3.h | 67 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/lib/crypto/s390/sha3.h b/lib/crypto/s390/sha3.h
index 668e53da93d2c..85471404775a3 100644
--- a/lib/crypto/s390/sha3.h
+++ b/lib/crypto/s390/sha3.h
@@ -6,10 +6,11 @@
  */
 #include <asm/cpacf.h>
 #include <linux/cpufeature.h>
 
 static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_sha3);
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(have_sha3_init_optim);
 
 static void sha3_absorb_blocks(struct sha3_state *state, const u8 *data,
 			       size_t nblocks, size_t block_size)
 {
 	if (static_branch_likely(&have_sha3)) {
@@ -58,10 +59,65 @@ static void sha3_keccakf(struct sha3_state *state)
 	} else {
 		sha3_keccakf_generic(state);
 	}
 }
 
+static inline bool s390_sha3(int func, const u8 *in, size_t in_len,
+			     u8 *out, size_t out_len)
+{
+	struct sha3_state state;
+
+	if (!static_branch_likely(&have_sha3))
+		return false;
+
+	if (static_branch_likely(&have_sha3_init_optim))
+		func |= CPACF_KLMD_NIP | CPACF_KLMD_DUFOP;
+	else
+		memset(&state, 0, sizeof(state));
+
+	cpacf_klmd(func, &state, in, in_len);
+
+	if (static_branch_likely(&have_sha3_init_optim))
+		kmsan_unpoison_memory(&state, out_len);
+
+	memcpy(out, &state, out_len);
+	memzero_explicit(&state, sizeof(state));
+	return true;
+}
+
+#define sha3_224_arch sha3_224_arch
+static bool sha3_224_arch(const u8 *in, size_t in_len,
+			  u8 out[SHA3_224_DIGEST_SIZE])
+{
+	return s390_sha3(CPACF_KLMD_SHA3_224, in, in_len,
+			 out, SHA3_224_DIGEST_SIZE);
+}
+
+#define sha3_256_arch sha3_256_arch
+static bool sha3_256_arch(const u8 *in, size_t in_len,
+			  u8 out[SHA3_256_DIGEST_SIZE])
+{
+	return s390_sha3(CPACF_KLMD_SHA3_256, in, in_len,
+			 out, SHA3_256_DIGEST_SIZE);
+}
+
+#define sha3_384_arch sha3_384_arch
+static bool sha3_384_arch(const u8 *in, size_t in_len,
+			  u8 out[SHA3_384_DIGEST_SIZE])
+{
+	return s390_sha3(CPACF_KLMD_SHA3_384, in, in_len,
+			 out, SHA3_384_DIGEST_SIZE);
+}
+
+#define sha3_512_arch sha3_512_arch
+static bool sha3_512_arch(const u8 *in, size_t in_len,
+			  u8 out[SHA3_512_DIGEST_SIZE])
+{
+	return s390_sha3(CPACF_KLMD_SHA3_512, in, in_len,
+			 out, SHA3_512_DIGEST_SIZE);
+}
+
 #define sha3_mod_init_arch sha3_mod_init_arch
 static void sha3_mod_init_arch(void)
 {
 	int num_present = 0;
 	int num_possible = 0;
@@ -77,12 +133,19 @@ static void sha3_mod_init_arch(void)
 	({ num_present += !!cpacf_query_func(opcode, func); num_possible++; })
 	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_224);
 	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_256);
 	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_384);
 	QUERY(CPACF_KIMD, CPACF_KIMD_SHA3_512);
+	QUERY(CPACF_KLMD, CPACF_KLMD_SHA3_224);
+	QUERY(CPACF_KLMD, CPACF_KLMD_SHA3_256);
+	QUERY(CPACF_KLMD, CPACF_KLMD_SHA3_384);
+	QUERY(CPACF_KLMD, CPACF_KLMD_SHA3_512);
 #undef QUERY
 
-	if (num_present == num_possible)
+	if (num_present == num_possible) {
 		static_branch_enable(&have_sha3);
-	else if (num_present != 0)
+		if (test_facility(86))
+			static_branch_enable(&have_sha3_init_optim);
+	} else if (num_present != 0) {
 		pr_warn("Unsupported combination of SHA-3 facilities\n");
+	}
 }
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (11 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 " Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 14/15] crypto: sha3 - Reimplement using library API Eric Biggers
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Make jitterentropy use "sha3-256" instead of "sha3-256-generic", as the
ability to explicitly request the generic code is going away.  It's not
worth providing a special generic API just for jitterentropy.  There are
many other solutions available to it, such as doing more iterations or
using a more effective jitter collection method.

Moreover, the status quo is that SHA-3 is quite slow anyway.  Currently
only arm64 and s390 have architecture-optimized SHA-3 code.  I'm not
familiar with the performance of the s390 one, but the arm64 one isn't
actually that much faster than the generic code anyway.

Note that jitterentropy should just use the library API instead of
crypto_shash.  But that belongs in a separate change later.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 crypto/jitterentropy-kcapi.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/crypto/jitterentropy-kcapi.c b/crypto/jitterentropy-kcapi.c
index a53de7affe8d1..7c880cf34c523 100644
--- a/crypto/jitterentropy-kcapi.c
+++ b/crypto/jitterentropy-kcapi.c
@@ -46,11 +46,11 @@
 #include <linux/time.h>
 #include <crypto/internal/rng.h>
 
 #include "jitterentropy.h"
 
-#define JENT_CONDITIONING_HASH	"sha3-256-generic"
+#define JENT_CONDITIONING_HASH	"sha3-256"
 
 /***************************************************************************
  * Helper function
  ***************************************************************************/
 
@@ -228,19 +228,11 @@ static int jent_kcapi_init(struct crypto_tfm *tfm)
 	struct shash_desc *sdesc;
 	int size, ret = 0;
 
 	spin_lock_init(&rng->jent_lock);
 
-	/*
-	 * Use SHA3-256 as conditioner. We allocate only the generic
-	 * implementation as we are not interested in high-performance. The
-	 * execution time of the SHA3 operation is measured and adds to the
-	 * Jitter RNG's unpredictable behavior. If we have a slower hash
-	 * implementation, the execution timing variations are larger. When
-	 * using a fast implementation, we would need to call it more often
-	 * as its variations are lower.
-	 */
+	/* Use SHA3-256 as conditioner */
 	hash = crypto_alloc_shash(JENT_CONDITIONING_HASH, 0, 0);
 	if (IS_ERR(hash)) {
 		pr_err("Cannot allocate conditioning digest\n");
 		return PTR_ERR(hash);
 	}
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 14/15] crypto: sha3 - Reimplement using library API
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (12 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-26  5:50 ` [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code Eric Biggers
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

Replace sha3_generic.c with a new file sha3.c which implements the SHA-3
crypto_shash algorithms on top of the SHA-3 library API.

Change the driver name suffix from "-generic" to "-lib" to reflect that
these algorithms now just use the (possibly arch-optimized) library.

This closely mirrors crypto/{md5,sha1,sha256,sha512,blake2b}.c.

Implement export_core and import_core, since crypto/hmac.c expects these
to be present.  (Note that there is no security purpose in wrapping
SHA-3 with HMAC.  HMAC was designed for older algorithms that don't
resist length extension attacks.  But since someone could be using
"hmac(sha3-*)" via crypto_shash anyway, keep supporting it for now.)

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 crypto/Kconfig        |   1 +
 crypto/Makefile       |   2 +-
 crypto/sha3.c         | 166 ++++++++++++++++++++++++
 crypto/sha3_generic.c | 290 ------------------------------------------
 crypto/testmgr.c      |   8 ++
 include/crypto/sha3.h |   6 -
 6 files changed, 176 insertions(+), 297 deletions(-)
 create mode 100644 crypto/sha3.c
 delete mode 100644 crypto/sha3_generic.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 0a7e74ac870b0..57b85e903cf0b 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1004,10 +1004,11 @@ config CRYPTO_SHA512
 	  10118-3), including HMAC support.
 
 config CRYPTO_SHA3
 	tristate "SHA-3"
 	select CRYPTO_HASH
+	select CRYPTO_LIB_SHA3
 	help
 	  SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
 
 config CRYPTO_SM3_GENERIC
 	tristate "SM3 (ShangMi 3)"
diff --git a/crypto/Makefile b/crypto/Makefile
index 5b02ca2cb04e0..0388ff8d219d1 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -76,11 +76,11 @@ obj-$(CONFIG_CRYPTO_MD4) += md4.o
 obj-$(CONFIG_CRYPTO_MD5) += md5.o
 obj-$(CONFIG_CRYPTO_RMD160) += rmd160.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512.o
-obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
+obj-$(CONFIG_CRYPTO_SHA3) += sha3.o
 obj-$(CONFIG_CRYPTO_SM3_GENERIC) += sm3_generic.o
 obj-$(CONFIG_CRYPTO_STREEBOG) += streebog_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
 CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns)  # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
 obj-$(CONFIG_CRYPTO_BLAKE2B) += blake2b.o
diff --git a/crypto/sha3.c b/crypto/sha3.c
new file mode 100644
index 0000000000000..8f364979ec890
--- /dev/null
+++ b/crypto/sha3.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Crypto API support for SHA-3
+ * (https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
+ */
+#include <crypto/internal/hash.h>
+#include <crypto/sha3.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#define SHA3_CTX(desc) ((struct sha3_ctx *)shash_desc_ctx(desc))
+
+static int crypto_sha3_224_init(struct shash_desc *desc)
+{
+	sha3_224_init(SHA3_CTX(desc));
+	return 0;
+}
+
+static int crypto_sha3_256_init(struct shash_desc *desc)
+{
+	sha3_256_init(SHA3_CTX(desc));
+	return 0;
+}
+
+static int crypto_sha3_384_init(struct shash_desc *desc)
+{
+	sha3_384_init(SHA3_CTX(desc));
+	return 0;
+}
+
+static int crypto_sha3_512_init(struct shash_desc *desc)
+{
+	sha3_512_init(SHA3_CTX(desc));
+	return 0;
+}
+
+static int crypto_sha3_update(struct shash_desc *desc, const u8 *data,
+			      unsigned int len)
+{
+	sha3_update(SHA3_CTX(desc), data, len);
+	return 0;
+}
+
+static int crypto_sha3_final(struct shash_desc *desc, u8 *out)
+{
+	sha3_final(SHA3_CTX(desc), out);
+	return 0;
+}
+
+static int crypto_sha3_224_digest(struct shash_desc *desc,
+				  const u8 *data, unsigned int len, u8 *out)
+{
+	sha3_224(data, len, out);
+	return 0;
+}
+
+static int crypto_sha3_256_digest(struct shash_desc *desc,
+				  const u8 *data, unsigned int len, u8 *out)
+{
+	sha3_256(data, len, out);
+	return 0;
+}
+
+static int crypto_sha3_384_digest(struct shash_desc *desc,
+				  const u8 *data, unsigned int len, u8 *out)
+{
+	sha3_384(data, len, out);
+	return 0;
+}
+
+static int crypto_sha3_512_digest(struct shash_desc *desc,
+				  const u8 *data, unsigned int len, u8 *out)
+{
+	sha3_512(data, len, out);
+	return 0;
+}
+
+static int crypto_sha3_export_core(struct shash_desc *desc, void *out)
+{
+	memcpy(out, SHA3_CTX(desc), sizeof(struct sha3_ctx));
+	return 0;
+}
+
+static int crypto_sha3_import_core(struct shash_desc *desc, const void *in)
+{
+	memcpy(SHA3_CTX(desc), in, sizeof(struct sha3_ctx));
+	return 0;
+}
+
+static struct shash_alg algs[] = { {
+	.digestsize		= SHA3_224_DIGEST_SIZE,
+	.init			= crypto_sha3_224_init,
+	.update			= crypto_sha3_update,
+	.final			= crypto_sha3_final,
+	.digest			= crypto_sha3_224_digest,
+	.export_core		= crypto_sha3_export_core,
+	.import_core		= crypto_sha3_import_core,
+	.descsize		= sizeof(struct sha3_ctx),
+	.base.cra_name		= "sha3-224",
+	.base.cra_driver_name	= "sha3-224-lib",
+	.base.cra_blocksize	= SHA3_224_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.digestsize		= SHA3_256_DIGEST_SIZE,
+	.init			= crypto_sha3_256_init,
+	.update			= crypto_sha3_update,
+	.final			= crypto_sha3_final,
+	.digest			= crypto_sha3_256_digest,
+	.export_core		= crypto_sha3_export_core,
+	.import_core		= crypto_sha3_import_core,
+	.descsize		= sizeof(struct sha3_ctx),
+	.base.cra_name		= "sha3-256",
+	.base.cra_driver_name	= "sha3-256-lib",
+	.base.cra_blocksize	= SHA3_256_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.digestsize		= SHA3_384_DIGEST_SIZE,
+	.init			= crypto_sha3_384_init,
+	.update			= crypto_sha3_update,
+	.final			= crypto_sha3_final,
+	.digest			= crypto_sha3_384_digest,
+	.export_core		= crypto_sha3_export_core,
+	.import_core		= crypto_sha3_import_core,
+	.descsize		= sizeof(struct sha3_ctx),
+	.base.cra_name		= "sha3-384",
+	.base.cra_driver_name	= "sha3-384-lib",
+	.base.cra_blocksize	= SHA3_384_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.digestsize		= SHA3_512_DIGEST_SIZE,
+	.init			= crypto_sha3_512_init,
+	.update			= crypto_sha3_update,
+	.final			= crypto_sha3_final,
+	.digest			= crypto_sha3_512_digest,
+	.export_core		= crypto_sha3_export_core,
+	.import_core		= crypto_sha3_import_core,
+	.descsize		= sizeof(struct sha3_ctx),
+	.base.cra_name		= "sha3-512",
+	.base.cra_driver_name	= "sha3-512-lib",
+	.base.cra_blocksize	= SHA3_512_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+} };
+
+static int __init crypto_sha3_mod_init(void)
+{
+	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+module_init(crypto_sha3_mod_init);
+
+static void __exit crypto_sha3_mod_exit(void)
+{
+	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+module_exit(crypto_sha3_mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Crypto API support for SHA-3");
+
+MODULE_ALIAS_CRYPTO("sha3-224");
+MODULE_ALIAS_CRYPTO("sha3-224-lib");
+MODULE_ALIAS_CRYPTO("sha3-256");
+MODULE_ALIAS_CRYPTO("sha3-256-lib");
+MODULE_ALIAS_CRYPTO("sha3-384");
+MODULE_ALIAS_CRYPTO("sha3-384-lib");
+MODULE_ALIAS_CRYPTO("sha3-512");
+MODULE_ALIAS_CRYPTO("sha3-512-lib");
diff --git a/crypto/sha3_generic.c b/crypto/sha3_generic.c
deleted file mode 100644
index 41d1e506e6dea..0000000000000
--- a/crypto/sha3_generic.c
+++ /dev/null
@@ -1,290 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Cryptographic API.
- *
- * SHA-3, as specified in
- * https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
- *
- * SHA-3 code by Jeff Garzik <jeff@garzik.org>
- *               Ard Biesheuvel <ard.biesheuvel@linaro.org>
- */
-#include <crypto/internal/hash.h>
-#include <crypto/sha3.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-#include <linux/unaligned.h>
-
-/*
- * On some 32-bit architectures (h8300), GCC ends up using
- * over 1 KB of stack if we inline the round calculation into the loop
- * in keccakf(). On the other hand, on 64-bit architectures with plenty
- * of [64-bit wide] general purpose registers, not inlining it severely
- * hurts performance. So let's use 64-bitness as a heuristic to decide
- * whether to inline or not.
- */
-#ifdef CONFIG_64BIT
-#define SHA3_INLINE	inline
-#else
-#define SHA3_INLINE	noinline
-#endif
-
-#define KECCAK_ROUNDS 24
-
-static const u64 keccakf_rndc[24] = {
-	0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL,
-	0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL,
-	0x8000000080008081ULL, 0x8000000000008009ULL, 0x000000000000008aULL,
-	0x0000000000000088ULL, 0x0000000080008009ULL, 0x000000008000000aULL,
-	0x000000008000808bULL, 0x800000000000008bULL, 0x8000000000008089ULL,
-	0x8000000000008003ULL, 0x8000000000008002ULL, 0x8000000000000080ULL,
-	0x000000000000800aULL, 0x800000008000000aULL, 0x8000000080008081ULL,
-	0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL
-};
-
-/* update the state with given number of rounds */
-
-static SHA3_INLINE void keccakf_round(u64 st[25])
-{
-	u64 t[5], tt, bc[5];
-
-	/* Theta */
-	bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20];
-	bc[1] = st[1] ^ st[6] ^ st[11] ^ st[16] ^ st[21];
-	bc[2] = st[2] ^ st[7] ^ st[12] ^ st[17] ^ st[22];
-	bc[3] = st[3] ^ st[8] ^ st[13] ^ st[18] ^ st[23];
-	bc[4] = st[4] ^ st[9] ^ st[14] ^ st[19] ^ st[24];
-
-	t[0] = bc[4] ^ rol64(bc[1], 1);
-	t[1] = bc[0] ^ rol64(bc[2], 1);
-	t[2] = bc[1] ^ rol64(bc[3], 1);
-	t[3] = bc[2] ^ rol64(bc[4], 1);
-	t[4] = bc[3] ^ rol64(bc[0], 1);
-
-	st[0] ^= t[0];
-
-	/* Rho Pi */
-	tt = st[1];
-	st[ 1] = rol64(st[ 6] ^ t[1], 44);
-	st[ 6] = rol64(st[ 9] ^ t[4], 20);
-	st[ 9] = rol64(st[22] ^ t[2], 61);
-	st[22] = rol64(st[14] ^ t[4], 39);
-	st[14] = rol64(st[20] ^ t[0], 18);
-	st[20] = rol64(st[ 2] ^ t[2], 62);
-	st[ 2] = rol64(st[12] ^ t[2], 43);
-	st[12] = rol64(st[13] ^ t[3], 25);
-	st[13] = rol64(st[19] ^ t[4],  8);
-	st[19] = rol64(st[23] ^ t[3], 56);
-	st[23] = rol64(st[15] ^ t[0], 41);
-	st[15] = rol64(st[ 4] ^ t[4], 27);
-	st[ 4] = rol64(st[24] ^ t[4], 14);
-	st[24] = rol64(st[21] ^ t[1],  2);
-	st[21] = rol64(st[ 8] ^ t[3], 55);
-	st[ 8] = rol64(st[16] ^ t[1], 45);
-	st[16] = rol64(st[ 5] ^ t[0], 36);
-	st[ 5] = rol64(st[ 3] ^ t[3], 28);
-	st[ 3] = rol64(st[18] ^ t[3], 21);
-	st[18] = rol64(st[17] ^ t[2], 15);
-	st[17] = rol64(st[11] ^ t[1], 10);
-	st[11] = rol64(st[ 7] ^ t[2],  6);
-	st[ 7] = rol64(st[10] ^ t[0],  3);
-	st[10] = rol64(    tt ^ t[1],  1);
-
-	/* Chi */
-	bc[ 0] = ~st[ 1] & st[ 2];
-	bc[ 1] = ~st[ 2] & st[ 3];
-	bc[ 2] = ~st[ 3] & st[ 4];
-	bc[ 3] = ~st[ 4] & st[ 0];
-	bc[ 4] = ~st[ 0] & st[ 1];
-	st[ 0] ^= bc[ 0];
-	st[ 1] ^= bc[ 1];
-	st[ 2] ^= bc[ 2];
-	st[ 3] ^= bc[ 3];
-	st[ 4] ^= bc[ 4];
-
-	bc[ 0] = ~st[ 6] & st[ 7];
-	bc[ 1] = ~st[ 7] & st[ 8];
-	bc[ 2] = ~st[ 8] & st[ 9];
-	bc[ 3] = ~st[ 9] & st[ 5];
-	bc[ 4] = ~st[ 5] & st[ 6];
-	st[ 5] ^= bc[ 0];
-	st[ 6] ^= bc[ 1];
-	st[ 7] ^= bc[ 2];
-	st[ 8] ^= bc[ 3];
-	st[ 9] ^= bc[ 4];
-
-	bc[ 0] = ~st[11] & st[12];
-	bc[ 1] = ~st[12] & st[13];
-	bc[ 2] = ~st[13] & st[14];
-	bc[ 3] = ~st[14] & st[10];
-	bc[ 4] = ~st[10] & st[11];
-	st[10] ^= bc[ 0];
-	st[11] ^= bc[ 1];
-	st[12] ^= bc[ 2];
-	st[13] ^= bc[ 3];
-	st[14] ^= bc[ 4];
-
-	bc[ 0] = ~st[16] & st[17];
-	bc[ 1] = ~st[17] & st[18];
-	bc[ 2] = ~st[18] & st[19];
-	bc[ 3] = ~st[19] & st[15];
-	bc[ 4] = ~st[15] & st[16];
-	st[15] ^= bc[ 0];
-	st[16] ^= bc[ 1];
-	st[17] ^= bc[ 2];
-	st[18] ^= bc[ 3];
-	st[19] ^= bc[ 4];
-
-	bc[ 0] = ~st[21] & st[22];
-	bc[ 1] = ~st[22] & st[23];
-	bc[ 2] = ~st[23] & st[24];
-	bc[ 3] = ~st[24] & st[20];
-	bc[ 4] = ~st[20] & st[21];
-	st[20] ^= bc[ 0];
-	st[21] ^= bc[ 1];
-	st[22] ^= bc[ 2];
-	st[23] ^= bc[ 3];
-	st[24] ^= bc[ 4];
-}
-
-static void keccakf(u64 st[25])
-{
-	int round;
-
-	for (round = 0; round < KECCAK_ROUNDS; round++) {
-		keccakf_round(st);
-		/* Iota */
-		st[0] ^= keccakf_rndc[round];
-	}
-}
-
-int crypto_sha3_init(struct shash_desc *desc)
-{
-	struct sha3_state *sctx = shash_desc_ctx(desc);
-
-	memset(sctx->st, 0, sizeof(sctx->st));
-	return 0;
-}
-EXPORT_SYMBOL(crypto_sha3_init);
-
-static int crypto_sha3_update(struct shash_desc *desc, const u8 *data,
-			      unsigned int len)
-{
-	unsigned int rsiz = crypto_shash_blocksize(desc->tfm);
-	struct sha3_state *sctx = shash_desc_ctx(desc);
-	unsigned int rsizw = rsiz / 8;
-
-	do {
-		int i;
-
-		for (i = 0; i < rsizw; i++)
-			sctx->st[i] ^= get_unaligned_le64(data + 8 * i);
-		keccakf(sctx->st);
-
-		data += rsiz;
-		len -= rsiz;
-	} while (len >= rsiz);
-	return len;
-}
-
-static int crypto_sha3_finup(struct shash_desc *desc, const u8 *src,
-			     unsigned int len, u8 *out)
-{
-	unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
-	unsigned int rsiz = crypto_shash_blocksize(desc->tfm);
-	struct sha3_state *sctx = shash_desc_ctx(desc);
-	__le64 block[SHA3_224_BLOCK_SIZE / 8] = {};
-	__le64 *digest = (__le64 *)out;
-	unsigned int rsizw = rsiz / 8;
-	u8 *p;
-	int i;
-
-	p = memcpy(block, src, len);
-	p[len++] = 0x06;
-	p[rsiz - 1] |= 0x80;
-
-	for (i = 0; i < rsizw; i++)
-		sctx->st[i] ^= le64_to_cpu(block[i]);
-	memzero_explicit(block, sizeof(block));
-
-	keccakf(sctx->st);
-
-	for (i = 0; i < digest_size / 8; i++)
-		put_unaligned_le64(sctx->st[i], digest++);
-
-	if (digest_size & 4)
-		put_unaligned_le32(sctx->st[i], (__le32 *)digest);
-
-	return 0;
-}
-
-static struct shash_alg algs[] = { {
-	.digestsize		= SHA3_224_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= crypto_sha3_update,
-	.finup			= crypto_sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-224",
-	.base.cra_driver_name	= "sha3-224-generic",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_224_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-}, {
-	.digestsize		= SHA3_256_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= crypto_sha3_update,
-	.finup			= crypto_sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-256",
-	.base.cra_driver_name	= "sha3-256-generic",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_256_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-}, {
-	.digestsize		= SHA3_384_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= crypto_sha3_update,
-	.finup			= crypto_sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-384",
-	.base.cra_driver_name	= "sha3-384-generic",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_384_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-}, {
-	.digestsize		= SHA3_512_DIGEST_SIZE,
-	.init			= crypto_sha3_init,
-	.update			= crypto_sha3_update,
-	.finup			= crypto_sha3_finup,
-	.descsize		= SHA3_STATE_SIZE,
-	.base.cra_name		= "sha3-512",
-	.base.cra_driver_name	= "sha3-512-generic",
-	.base.cra_flags		= CRYPTO_AHASH_ALG_BLOCK_ONLY,
-	.base.cra_blocksize	= SHA3_512_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-} };
-
-static int __init sha3_generic_mod_init(void)
-{
-	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
-}
-
-static void __exit sha3_generic_mod_fini(void)
-{
-	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
-}
-
-module_init(sha3_generic_mod_init);
-module_exit(sha3_generic_mod_fini);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA-3 Secure Hash Algorithm");
-
-MODULE_ALIAS_CRYPTO("sha3-224");
-MODULE_ALIAS_CRYPTO("sha3-224-generic");
-MODULE_ALIAS_CRYPTO("sha3-256");
-MODULE_ALIAS_CRYPTO("sha3-256-generic");
-MODULE_ALIAS_CRYPTO("sha3-384");
-MODULE_ALIAS_CRYPTO("sha3-384-generic");
-MODULE_ALIAS_CRYPTO("sha3-512");
-MODULE_ALIAS_CRYPTO("sha3-512-generic");
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 3ab7adc1cdce5..90d06c3ec9679 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -5102,31 +5102,35 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.hash = __VECS(hmac_sha256_tv_template)
 		}
 	}, {
 		.alg = "hmac(sha3-224)",
+		.generic_driver = "hmac(sha3-224-lib)",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(hmac_sha3_224_tv_template)
 		}
 	}, {
 		.alg = "hmac(sha3-256)",
+		.generic_driver = "hmac(sha3-256-lib)",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(hmac_sha3_256_tv_template)
 		}
 	}, {
 		.alg = "hmac(sha3-384)",
+		.generic_driver = "hmac(sha3-384-lib)",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(hmac_sha3_384_tv_template)
 		}
 	}, {
 		.alg = "hmac(sha3-512)",
+		.generic_driver = "hmac(sha3-512-lib)",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(hmac_sha3_512_tv_template)
 		}
@@ -5476,31 +5480,35 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.hash = __VECS(sha256_tv_template)
 		}
 	}, {
 		.alg = "sha3-224",
+		.generic_driver = "sha3-224-lib",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(sha3_224_tv_template)
 		}
 	}, {
 		.alg = "sha3-256",
+		.generic_driver = "sha3-256-lib",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(sha3_256_tv_template)
 		}
 	}, {
 		.alg = "sha3-384",
+		.generic_driver = "sha3-384-lib",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(sha3_384_tv_template)
 		}
 	}, {
 		.alg = "sha3-512",
+		.generic_driver = "sha3-512-lib",
 		.test = alg_test_hash,
 		.fips_allowed = 1,
 		.suite = {
 			.hash = __VECS(sha3_512_tv_template)
 		}
diff --git a/include/crypto/sha3.h b/include/crypto/sha3.h
index a7503dfc1a044..d713b5e3d6956 100644
--- a/include/crypto/sha3.h
+++ b/include/crypto/sha3.h
@@ -35,14 +35,10 @@
 #define SHAKE256_DEFAULT_SIZE	(256 / 8)
 #define SHAKE256_BLOCK_SIZE	(200 - 2 * SHAKE256_DEFAULT_SIZE)
 
 #define SHA3_STATE_SIZE		200
 
-struct shash_desc;
-
-int crypto_sha3_init(struct shash_desc *desc);
-
 /*
  * State for the Keccak-f[1600] permutation: 25 64-bit words.
  *
  * We usually keep the state words as little-endian, to make absorbing and
  * squeezing easier.  (It means that absorbing and squeezing can just treat the
@@ -50,12 +46,10 @@ int crypto_sha3_init(struct shash_desc *desc);
  * temporarily by implementations of the permutation that need native-endian
  * words.  Of course, that conversion is a no-op on little-endian machines.
  */
 struct sha3_state {
 	union {
-		u64 st[SHA3_STATE_SIZE / 8]; /* temporarily retained for compatibility purposes */
-
 		__le64 words[SHA3_STATE_SIZE / 8];
 		u8 bytes[SHA3_STATE_SIZE];
 
 		u64 native_words[SHA3_STATE_SIZE / 8]; /* see comment above */
 	};
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (13 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 14/15] crypto: sha3 - Reimplement using library API Eric Biggers
@ 2025-10-26  5:50 ` Eric Biggers
  2025-10-29  9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-26  5:50 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld, Eric Biggers,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

The SHA-3 library now utilizes the same s390 SHA-3 acceleration
capabilities as the arch/s390/crypto/ SHA-3 crypto_shash algorithms.
Moreover, crypto/sha3.c now uses the SHA-3 library.  The result is that
all SHA-3 APIs are now s390-accelerated without any need for the old
SHA-3 code in arch/s390/crypto/.  Remove this superseded code.

Also update the s390 defconfig and debug_defconfig files to enable
CONFIG_CRYPTO_SHA3 instead of CONFIG_CRYPTO_SHA3_256_S390 and
CONFIG_CRYPTO_SHA3_512_S390.  This makes it so that the s390-optimized
SHA-3 continues to be built when either of these defconfigs is used.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 arch/s390/configs/debug_defconfig |   3 +-
 arch/s390/configs/defconfig       |   3 +-
 arch/s390/crypto/Kconfig          |  20 ----
 arch/s390/crypto/Makefile         |   2 -
 arch/s390/crypto/sha.h            |  51 ----------
 arch/s390/crypto/sha3_256_s390.c  | 157 ------------------------------
 arch/s390/crypto/sha3_512_s390.c  | 157 ------------------------------
 arch/s390/crypto/sha_common.c     | 117 ----------------------
 8 files changed, 2 insertions(+), 508 deletions(-)
 delete mode 100644 arch/s390/crypto/sha.h
 delete mode 100644 arch/s390/crypto/sha3_256_s390.c
 delete mode 100644 arch/s390/crypto/sha3_512_s390.c
 delete mode 100644 arch/s390/crypto/sha_common.c

diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig
index b31c1df902577..5fdfebcfd50f2 100644
--- a/arch/s390/configs/debug_defconfig
+++ b/arch/s390/configs/debug_defconfig
@@ -790,10 +790,11 @@ CONFIG_CRYPTO_GCM=y
 CONFIG_CRYPTO_SEQIV=y
 CONFIG_CRYPTO_MD4=m
 CONFIG_CRYPTO_MD5=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
 CONFIG_CRYPTO_RMD160=m
+CONFIG_CRYPTO_SHA3=m
 CONFIG_CRYPTO_SM3_GENERIC=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_XCBC=m
 CONFIG_CRYPTO_CRC32=m
 CONFIG_CRYPTO_842=m
@@ -803,12 +804,10 @@ CONFIG_CRYPTO_ZSTD=m
 CONFIG_CRYPTO_ANSI_CPRNG=m
 CONFIG_CRYPTO_USER_API_HASH=m
 CONFIG_CRYPTO_USER_API_SKCIPHER=m
 CONFIG_CRYPTO_USER_API_RNG=m
 CONFIG_CRYPTO_USER_API_AEAD=m
-CONFIG_CRYPTO_SHA3_256_S390=m
-CONFIG_CRYPTO_SHA3_512_S390=m
 CONFIG_CRYPTO_GHASH_S390=m
 CONFIG_CRYPTO_AES_S390=m
 CONFIG_CRYPTO_DES_S390=m
 CONFIG_CRYPTO_HMAC_S390=m
 CONFIG_ZCRYPT=m
diff --git a/arch/s390/configs/defconfig b/arch/s390/configs/defconfig
index 161dad7ef211a..7bac3f53a95b0 100644
--- a/arch/s390/configs/defconfig
+++ b/arch/s390/configs/defconfig
@@ -774,10 +774,11 @@ CONFIG_CRYPTO_GCM=y
 CONFIG_CRYPTO_SEQIV=y
 CONFIG_CRYPTO_MD4=m
 CONFIG_CRYPTO_MD5=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
 CONFIG_CRYPTO_RMD160=m
+CONFIG_CRYPTO_SHA3=m
 CONFIG_CRYPTO_SM3_GENERIC=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_XCBC=m
 CONFIG_CRYPTO_CRC32=m
 CONFIG_CRYPTO_842=m
@@ -788,12 +789,10 @@ CONFIG_CRYPTO_ANSI_CPRNG=m
 CONFIG_CRYPTO_JITTERENTROPY_OSR=1
 CONFIG_CRYPTO_USER_API_HASH=m
 CONFIG_CRYPTO_USER_API_SKCIPHER=m
 CONFIG_CRYPTO_USER_API_RNG=m
 CONFIG_CRYPTO_USER_API_AEAD=m
-CONFIG_CRYPTO_SHA3_256_S390=m
-CONFIG_CRYPTO_SHA3_512_S390=m
 CONFIG_CRYPTO_GHASH_S390=m
 CONFIG_CRYPTO_AES_S390=m
 CONFIG_CRYPTO_DES_S390=m
 CONFIG_CRYPTO_HMAC_S390=m
 CONFIG_ZCRYPT=m
diff --git a/arch/s390/crypto/Kconfig b/arch/s390/crypto/Kconfig
index 03f73fbd38b62..f838ca055f6d7 100644
--- a/arch/s390/crypto/Kconfig
+++ b/arch/s390/crypto/Kconfig
@@ -1,29 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
 menu "Accelerated Cryptographic Algorithms for CPU (s390)"
 
-config CRYPTO_SHA3_256_S390
-	tristate "Hash functions: SHA3-224 and SHA3-256"
-	select CRYPTO_HASH
-	help
-	  SHA3-224 and SHA3-256 secure hash algorithms (FIPS 202)
-
-	  Architecture: s390
-
-	  It is available as of z14.
-
-config CRYPTO_SHA3_512_S390
-	tristate "Hash functions: SHA3-384 and SHA3-512"
-	select CRYPTO_HASH
-	help
-	  SHA3-384 and SHA3-512 secure hash algorithms (FIPS 202)
-
-	  Architecture: s390
-
-	  It is available as of z14.
-
 config CRYPTO_GHASH_S390
 	tristate "Hash functions: GHASH"
 	select CRYPTO_HASH
 	help
 	  GCM GHASH hash function (NIST SP800-38D)
diff --git a/arch/s390/crypto/Makefile b/arch/s390/crypto/Makefile
index 998f4b656b18e..387a229e10381 100644
--- a/arch/s390/crypto/Makefile
+++ b/arch/s390/crypto/Makefile
@@ -1,12 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # Cryptographic API
 #
 
-obj-$(CONFIG_CRYPTO_SHA3_256_S390) += sha3_256_s390.o sha_common.o
-obj-$(CONFIG_CRYPTO_SHA3_512_S390) += sha3_512_s390.o sha_common.o
 obj-$(CONFIG_CRYPTO_DES_S390) += des_s390.o
 obj-$(CONFIG_CRYPTO_AES_S390) += aes_s390.o
 obj-$(CONFIG_CRYPTO_PAES_S390) += paes_s390.o
 obj-$(CONFIG_S390_PRNG) += prng.o
 obj-$(CONFIG_CRYPTO_GHASH_S390) += ghash_s390.o
diff --git a/arch/s390/crypto/sha.h b/arch/s390/crypto/sha.h
deleted file mode 100644
index b9cd9572dd35c..0000000000000
--- a/arch/s390/crypto/sha.h
+++ /dev/null
@@ -1,51 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0+ */
-/*
- * Cryptographic API.
- *
- * s390 generic implementation of the SHA Secure Hash Algorithms.
- *
- * Copyright IBM Corp. 2007
- * Author(s): Jan Glauber (jang@de.ibm.com)
- */
-#ifndef _CRYPTO_ARCH_S390_SHA_H
-#define _CRYPTO_ARCH_S390_SHA_H
-
-#include <crypto/hash.h>
-#include <crypto/sha2.h>
-#include <crypto/sha3.h>
-#include <linux/build_bug.h>
-#include <linux/types.h>
-
-/* must be big enough for the largest SHA variant */
-#define CPACF_MAX_PARMBLOCK_SIZE	SHA3_STATE_SIZE
-#define SHA_MAX_BLOCK_SIZE		SHA3_224_BLOCK_SIZE
-
-struct s390_sha_ctx {
-	u64 count;		/* message length in bytes */
-	union {
-		u32 state[CPACF_MAX_PARMBLOCK_SIZE / sizeof(u32)];
-		struct {
-			u64 state[SHA512_DIGEST_SIZE / sizeof(u64)];
-			u64 count_hi;
-		} sha512;
-		struct {
-			__le64 state[SHA3_STATE_SIZE / sizeof(u64)];
-		} sha3;
-	};
-	int func;		/* KIMD function to use */
-	bool first_message_part;
-};
-
-struct shash_desc;
-
-int s390_sha_update_blocks(struct shash_desc *desc, const u8 *data,
-			   unsigned int len);
-int s390_sha_finup(struct shash_desc *desc, const u8 *src, unsigned int len,
-		   u8 *out);
-
-static inline void __check_s390_sha_ctx_size(void)
-{
-	BUILD_BUG_ON(S390_SHA_CTX_SIZE != sizeof(struct s390_sha_ctx));
-}
-
-#endif
diff --git a/arch/s390/crypto/sha3_256_s390.c b/arch/s390/crypto/sha3_256_s390.c
deleted file mode 100644
index 7415d56649a52..0000000000000
--- a/arch/s390/crypto/sha3_256_s390.c
+++ /dev/null
@@ -1,157 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/*
- * Cryptographic API.
- *
- * s390 implementation of the SHA256 and SHA224 Secure Hash Algorithm.
- *
- * s390 Version:
- *   Copyright IBM Corp. 2019
- *   Author(s): Joerg Schmidbauer (jschmidb@de.ibm.com)
- */
-#include <asm/cpacf.h>
-#include <crypto/internal/hash.h>
-#include <crypto/sha3.h>
-#include <linux/cpufeature.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-#include "sha.h"
-
-static int s390_sha3_256_init(struct shash_desc *desc)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-
-	sctx->first_message_part = test_facility(86);
-	if (!sctx->first_message_part)
-		memset(sctx->state, 0, sizeof(sctx->state));
-	sctx->count = 0;
-	sctx->func = CPACF_KIMD_SHA3_256;
-
-	return 0;
-}
-
-static int sha3_256_export(struct shash_desc *desc, void *out)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-	union {
-		u8 *u8;
-		u64 *u64;
-	} p = { .u8 = out };
-	int i;
-
-	if (sctx->first_message_part) {
-		memset(out, 0, SHA3_STATE_SIZE);
-		return 0;
-	}
-	for (i = 0; i < SHA3_STATE_SIZE / 8; i++)
-		put_unaligned(le64_to_cpu(sctx->sha3.state[i]), p.u64++);
-	return 0;
-}
-
-static int sha3_256_import(struct shash_desc *desc, const void *in)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-	union {
-		const u8 *u8;
-		const u64 *u64;
-	} p = { .u8 = in };
-	int i;
-
-	for (i = 0; i < SHA3_STATE_SIZE / 8; i++)
-		sctx->sha3.state[i] = cpu_to_le64(get_unaligned(p.u64++));
-	sctx->count = 0;
-	sctx->first_message_part = 0;
-	sctx->func = CPACF_KIMD_SHA3_256;
-
-	return 0;
-}
-
-static int sha3_224_import(struct shash_desc *desc, const void *in)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-
-	sha3_256_import(desc, in);
-	sctx->func = CPACF_KIMD_SHA3_224;
-	return 0;
-}
-
-static struct shash_alg sha3_256_alg = {
-	.digestsize	=	SHA3_256_DIGEST_SIZE,	   /* = 32 */
-	.init		=	s390_sha3_256_init,
-	.update		=	s390_sha_update_blocks,
-	.finup		=	s390_sha_finup,
-	.export		=	sha3_256_export,
-	.import		=	sha3_256_import,
-	.descsize	=	S390_SHA_CTX_SIZE,
-	.statesize	=	SHA3_STATE_SIZE,
-	.base		=	{
-		.cra_name	 =	"sha3-256",
-		.cra_driver_name =	"sha3-256-s390",
-		.cra_priority	 =	300,
-		.cra_flags	 =	CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize	 =	SHA3_256_BLOCK_SIZE,
-		.cra_module	 =	THIS_MODULE,
-	}
-};
-
-static int s390_sha3_224_init(struct shash_desc *desc)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-
-	s390_sha3_256_init(desc);
-	sctx->func = CPACF_KIMD_SHA3_224;
-	return 0;
-}
-
-static struct shash_alg sha3_224_alg = {
-	.digestsize	=	SHA3_224_DIGEST_SIZE,
-	.init		=	s390_sha3_224_init,
-	.update		=	s390_sha_update_blocks,
-	.finup		=	s390_sha_finup,
-	.export		=	sha3_256_export, /* same as for 256 */
-	.import		=	sha3_224_import, /* function code different! */
-	.descsize	=	S390_SHA_CTX_SIZE,
-	.statesize	=	SHA3_STATE_SIZE,
-	.base		=	{
-		.cra_name	 =	"sha3-224",
-		.cra_driver_name =	"sha3-224-s390",
-		.cra_priority	 =	300,
-		.cra_flags	 =	CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize	 =	SHA3_224_BLOCK_SIZE,
-		.cra_module	 =	THIS_MODULE,
-	}
-};
-
-static int __init sha3_256_s390_init(void)
-{
-	int ret;
-
-	if (!cpacf_query_func(CPACF_KIMD, CPACF_KIMD_SHA3_256))
-		return -ENODEV;
-
-	ret = crypto_register_shash(&sha3_256_alg);
-	if (ret < 0)
-		goto out;
-
-	ret = crypto_register_shash(&sha3_224_alg);
-	if (ret < 0)
-		crypto_unregister_shash(&sha3_256_alg);
-out:
-	return ret;
-}
-
-static void __exit sha3_256_s390_fini(void)
-{
-	crypto_unregister_shash(&sha3_224_alg);
-	crypto_unregister_shash(&sha3_256_alg);
-}
-
-module_cpu_feature_match(S390_CPU_FEATURE_MSA, sha3_256_s390_init);
-module_exit(sha3_256_s390_fini);
-
-MODULE_ALIAS_CRYPTO("sha3-256");
-MODULE_ALIAS_CRYPTO("sha3-224");
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA3-256 and SHA3-224 Secure Hash Algorithm");
diff --git a/arch/s390/crypto/sha3_512_s390.c b/arch/s390/crypto/sha3_512_s390.c
deleted file mode 100644
index ff6ee55844005..0000000000000
--- a/arch/s390/crypto/sha3_512_s390.c
+++ /dev/null
@@ -1,157 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/*
- * Cryptographic API.
- *
- * s390 implementation of the SHA512 and SHA384 Secure Hash Algorithm.
- *
- * Copyright IBM Corp. 2019
- * Author(s): Joerg Schmidbauer (jschmidb@de.ibm.com)
- */
-#include <asm/cpacf.h>
-#include <crypto/internal/hash.h>
-#include <crypto/sha3.h>
-#include <linux/cpufeature.h>
-#include <linux/errno.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-#include "sha.h"
-
-static int s390_sha3_512_init(struct shash_desc *desc)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-
-	sctx->first_message_part = test_facility(86);
-	if (!sctx->first_message_part)
-		memset(sctx->state, 0, sizeof(sctx->state));
-	sctx->count = 0;
-	sctx->func = CPACF_KIMD_SHA3_512;
-
-	return 0;
-}
-
-static int sha3_512_export(struct shash_desc *desc, void *out)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-	union {
-		u8 *u8;
-		u64 *u64;
-	} p = { .u8 = out };
-	int i;
-
-	if (sctx->first_message_part) {
-		memset(out, 0, SHA3_STATE_SIZE);
-		return 0;
-	}
-	for (i = 0; i < SHA3_STATE_SIZE / 8; i++)
-		put_unaligned(le64_to_cpu(sctx->sha3.state[i]), p.u64++);
-	return 0;
-}
-
-static int sha3_512_import(struct shash_desc *desc, const void *in)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-	union {
-		const u8 *u8;
-		const u64 *u64;
-	} p = { .u8 = in };
-	int i;
-
-	for (i = 0; i < SHA3_STATE_SIZE / 8; i++)
-		sctx->sha3.state[i] = cpu_to_le64(get_unaligned(p.u64++));
-	sctx->count = 0;
-	sctx->first_message_part = 0;
-	sctx->func = CPACF_KIMD_SHA3_512;
-
-	return 0;
-}
-
-static int sha3_384_import(struct shash_desc *desc, const void *in)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-
-	sha3_512_import(desc, in);
-	sctx->func = CPACF_KIMD_SHA3_384;
-	return 0;
-}
-
-static struct shash_alg sha3_512_alg = {
-	.digestsize	=	SHA3_512_DIGEST_SIZE,
-	.init		=	s390_sha3_512_init,
-	.update		=	s390_sha_update_blocks,
-	.finup		=	s390_sha_finup,
-	.export		=	sha3_512_export,
-	.import		=	sha3_512_import,
-	.descsize	=	S390_SHA_CTX_SIZE,
-	.statesize	=	SHA3_STATE_SIZE,
-	.base		=	{
-		.cra_name	 =	"sha3-512",
-		.cra_driver_name =	"sha3-512-s390",
-		.cra_priority	 =	300,
-		.cra_flags	 =	CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize	 =	SHA3_512_BLOCK_SIZE,
-		.cra_module	 =	THIS_MODULE,
-	}
-};
-
-MODULE_ALIAS_CRYPTO("sha3-512");
-
-static int s390_sha3_384_init(struct shash_desc *desc)
-{
-	struct s390_sha_ctx *sctx = shash_desc_ctx(desc);
-
-	s390_sha3_512_init(desc);
-	sctx->func = CPACF_KIMD_SHA3_384;
-	return 0;
-}
-
-static struct shash_alg sha3_384_alg = {
-	.digestsize	=	SHA3_384_DIGEST_SIZE,
-	.init		=	s390_sha3_384_init,
-	.update		=	s390_sha_update_blocks,
-	.finup		=	s390_sha_finup,
-	.export		=	sha3_512_export, /* same as for 512 */
-	.import		=	sha3_384_import, /* function code different! */
-	.descsize	=	S390_SHA_CTX_SIZE,
-	.statesize	=	SHA3_STATE_SIZE,
-	.base		=	{
-		.cra_name	 =	"sha3-384",
-		.cra_driver_name =	"sha3-384-s390",
-		.cra_priority	 =	300,
-		.cra_flags	 =	CRYPTO_AHASH_ALG_BLOCK_ONLY,
-		.cra_blocksize	 =	SHA3_384_BLOCK_SIZE,
-		.cra_ctxsize	 =	sizeof(struct s390_sha_ctx),
-		.cra_module	 =	THIS_MODULE,
-	}
-};
-
-MODULE_ALIAS_CRYPTO("sha3-384");
-
-static int __init init(void)
-{
-	int ret;
-
-	if (!cpacf_query_func(CPACF_KIMD, CPACF_KIMD_SHA3_512))
-		return -ENODEV;
-	ret = crypto_register_shash(&sha3_512_alg);
-	if (ret < 0)
-		goto out;
-	ret = crypto_register_shash(&sha3_384_alg);
-	if (ret < 0)
-		crypto_unregister_shash(&sha3_512_alg);
-out:
-	return ret;
-}
-
-static void __exit fini(void)
-{
-	crypto_unregister_shash(&sha3_512_alg);
-	crypto_unregister_shash(&sha3_384_alg);
-}
-
-module_cpu_feature_match(S390_CPU_FEATURE_MSA, init);
-module_exit(fini);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("SHA3-512 and SHA3-384 Secure Hash Algorithm");
diff --git a/arch/s390/crypto/sha_common.c b/arch/s390/crypto/sha_common.c
deleted file mode 100644
index d6f8396187946..0000000000000
--- a/arch/s390/crypto/sha_common.c
+++ /dev/null
@@ -1,117 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/*
- * Cryptographic API.
- *
- * s390 generic implementation of the SHA Secure Hash Algorithms.
- *
- * Copyright IBM Corp. 2007
- * Author(s): Jan Glauber (jang@de.ibm.com)
- */
-
-#include <crypto/internal/hash.h>
-#include <linux/export.h>
-#include <linux/module.h>
-#include <asm/cpacf.h>
-#include "sha.h"
-
-int s390_sha_update_blocks(struct shash_desc *desc, const u8 *data,
-			   unsigned int len)
-{
-	unsigned int bsize = crypto_shash_blocksize(desc->tfm);
-	struct s390_sha_ctx *ctx = shash_desc_ctx(desc);
-	unsigned int n;
-	int fc;
-
-	fc = ctx->func;
-	if (ctx->first_message_part)
-		fc |= CPACF_KIMD_NIP;
-
-	/* process as many blocks as possible */
-	n = (len / bsize) * bsize;
-	ctx->count += n;
-	switch (ctx->func) {
-	case CPACF_KLMD_SHA_512:
-	case CPACF_KLMD_SHA3_384:
-		if (ctx->count < n)
-			ctx->sha512.count_hi++;
-		break;
-	}
-	cpacf_kimd(fc, ctx->state, data, n);
-	ctx->first_message_part = 0;
-	return len - n;
-}
-EXPORT_SYMBOL_GPL(s390_sha_update_blocks);
-
-static int s390_crypto_shash_parmsize(int func)
-{
-	switch (func) {
-	case CPACF_KLMD_SHA_1:
-		return 20;
-	case CPACF_KLMD_SHA_256:
-		return 32;
-	case CPACF_KLMD_SHA_512:
-		return 64;
-	case CPACF_KLMD_SHA3_224:
-	case CPACF_KLMD_SHA3_256:
-	case CPACF_KLMD_SHA3_384:
-	case CPACF_KLMD_SHA3_512:
-		return 200;
-	default:
-		return -EINVAL;
-	}
-}
-
-int s390_sha_finup(struct shash_desc *desc, const u8 *src, unsigned int len,
-		   u8 *out)
-{
-	struct s390_sha_ctx *ctx = shash_desc_ctx(desc);
-	int mbl_offset, fc;
-	u64 bits;
-
-	ctx->count += len;
-
-	bits = ctx->count * 8;
-	mbl_offset = s390_crypto_shash_parmsize(ctx->func);
-	if (mbl_offset < 0)
-		return -EINVAL;
-
-	mbl_offset = mbl_offset / sizeof(u32);
-
-	/* set total msg bit length (mbl) in CPACF parmblock */
-	switch (ctx->func) {
-	case CPACF_KLMD_SHA_512:
-		/* The SHA512 parmblock has a 128-bit mbl field. */
-		if (ctx->count < len)
-			ctx->sha512.count_hi++;
-		ctx->sha512.count_hi <<= 3;
-		ctx->sha512.count_hi |= ctx->count >> 61;
-		mbl_offset += sizeof(u64) / sizeof(u32);
-		fallthrough;
-	case CPACF_KLMD_SHA_1:
-	case CPACF_KLMD_SHA_256:
-		memcpy(ctx->state + mbl_offset, &bits, sizeof(bits));
-		break;
-	case CPACF_KLMD_SHA3_224:
-	case CPACF_KLMD_SHA3_256:
-	case CPACF_KLMD_SHA3_384:
-	case CPACF_KLMD_SHA3_512:
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	fc = ctx->func;
-	fc |= test_facility(86) ? CPACF_KLMD_DUFOP : 0;
-	if (ctx->first_message_part)
-		fc |= CPACF_KLMD_NIP;
-	cpacf_klmd(fc, ctx->state, src, len);
-
-	/* copy digest to out */
-	memcpy(out, ctx->state, crypto_shash_digestsize(desc->tfm));
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(s390_sha_finup);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("s390 SHA cipher common functions");
-- 
2.51.1.dirty



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (14 preceding siblings ...)
  2025-10-26  5:50 ` [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code Eric Biggers
@ 2025-10-29  9:30 ` Harald Freudenberger
  2025-10-29 16:32   ` Eric Biggers
  2025-10-30 14:08 ` Ard Biesheuvel
  2025-11-03 17:34 ` Eric Biggers
  17 siblings, 1 reply; 34+ messages in thread
From: Harald Freudenberger @ 2025-10-29  9:30 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-10-26 06:50, Eric Biggers wrote:
> This series is targeting libcrypto-next.  It can also be retrieved 
> from:
> 
>     git fetch
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> sha3-lib-v2
> 
> This series adds SHA-3 support to lib/crypto/.  This includes support
> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> and also support for the extendable-output functions SHAKE128 and
> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> 
> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> into lib/crypto/.  (The existing s390 code couldn't really be reused, 
> so
> really I rewrote it from scratch.)  This makes the SHA-3 library
> functions be accelerated on these architectures.
> 
> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> algorithms are reimplemented on top of the library API.
> 
> If the s390 folks could re-test the s390 optimized SHA-3 code (by
> enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> would be helpful.  QEMU doesn't support the instructions it uses.  
> Also,
> it would be helpful to provide the benchmark output from just before
> "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions".  Then we can verify that each change is useful.
> 
> Changed in v2:
>   - Added missing selection of CRYPTO_LIB_SHA3 from CRYPTO_SHA3.
>   - Fixed a bug where incorrect SHAKE output was produced if a
>     zero-length squeeze was followed by a nonzero-length squeeze.
>   - Improved the SHAKE tests.
>   - Utilized the one-shot SHA-3 digest instructions on s390.
>   - Split the s390 changes into several patches.
>   - Folded some of my patches into David's.
>   - Dropped some unnecessary changes from the first 2 patches.
>   - Lots more cleanups, mainly to "lib/crypto: sha3: Add SHA-3 
> support".
> 
> Changed in v1 (vs. first 5 patches of David's v6 patchset):
>   - Migrated the arm64 and s390 code into lib/crypto/
>   - Simplified the library API
>   - Added FIPS test
>   - Many other fixes and improvements
> 
> The first 5 patches are derived from David's v6 patchset
> (https://lore.kernel.org/linux-crypto/20251017144311.817771-1-dhowells@redhat.com/).
> Earlier changelogs can be found there.
> 
> David Howells (5):
>   crypto: s390/sha3 - Rename conflicting functions
>   crypto: arm64/sha3 - Rename conflicting function
>   lib/crypto: sha3: Add SHA-3 support
>   lib/crypto: sha3: Move SHA3 Iota step mapping into round function
>   lib/crypto: tests: Add SHA3 kunit tests
> 
> Eric Biggers (10):
>   lib/crypto: tests: Add additional SHAKE tests
>   lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
>   crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for 
> library
>   lib/crypto: arm64/sha3: Migrate optimized code into library
>   lib/crypto: s390/sha3: Add optimized Keccak functions
>   lib/crypto: sha3: Support arch overrides of one-shot digest functions
>   lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
>   crypto: jitterentropy - Use default sha3 implementation
>   crypto: sha3 - Reimplement using library API
>   crypto: s390/sha3 - Remove superseded SHA-3 code
> 
>  Documentation/crypto/index.rst                |   1 +
>  Documentation/crypto/sha3.rst                 | 130 ++++++
>  arch/arm64/configs/defconfig                  |   2 +-
>  arch/arm64/crypto/Kconfig                     |  11 -
>  arch/arm64/crypto/Makefile                    |   3 -
>  arch/arm64/crypto/sha3-ce-glue.c              | 151 -------
>  arch/s390/configs/debug_defconfig             |   3 +-
>  arch/s390/configs/defconfig                   |   3 +-
>  arch/s390/crypto/Kconfig                      |  20 -
>  arch/s390/crypto/Makefile                     |   2 -
>  arch/s390/crypto/sha.h                        |  51 ---
>  arch/s390/crypto/sha3_256_s390.c              | 157 -------
>  arch/s390/crypto/sha3_512_s390.c              | 157 -------
>  arch/s390/crypto/sha_common.c                 | 117 -----
>  crypto/Kconfig                                |   1 +
>  crypto/Makefile                               |   2 +-
>  crypto/jitterentropy-kcapi.c                  |  12 +-
>  crypto/sha3.c                                 | 166 +++++++
>  crypto/sha3_generic.c                         | 290 ------------
>  crypto/testmgr.c                              |   8 +
>  include/crypto/sha3.h                         | 306 ++++++++++++-
>  lib/crypto/Kconfig                            |  13 +
>  lib/crypto/Makefile                           |  10 +
>  .../crypto/arm64}/sha3-ce-core.S              |  67 +--
>  lib/crypto/arm64/sha3.h                       |  62 +++
>  lib/crypto/fips.h                             |   7 +
>  lib/crypto/s390/sha3.h                        | 151 +++++++
>  lib/crypto/sha3.c                             | 411 +++++++++++++++++
>  lib/crypto/tests/Kconfig                      |  11 +
>  lib/crypto/tests/Makefile                     |   1 +
>  lib/crypto/tests/sha3-testvecs.h              | 249 +++++++++++
>  lib/crypto/tests/sha3_kunit.c                 | 422 ++++++++++++++++++
>  scripts/crypto/gen-fips-testvecs.py           |   4 +
>  scripts/crypto/gen-hash-testvecs.py           |  27 +-
>  34 files changed, 2012 insertions(+), 1016 deletions(-)
>  create mode 100644 Documentation/crypto/sha3.rst
>  delete mode 100644 arch/arm64/crypto/sha3-ce-glue.c
>  delete mode 100644 arch/s390/crypto/sha.h
>  delete mode 100644 arch/s390/crypto/sha3_256_s390.c
>  delete mode 100644 arch/s390/crypto/sha3_512_s390.c
>  delete mode 100644 arch/s390/crypto/sha_common.c
>  create mode 100644 crypto/sha3.c
>  delete mode 100644 crypto/sha3_generic.c
>  rename {arch/arm64/crypto => lib/crypto/arm64}/sha3-ce-core.S (84%)
>  create mode 100644 lib/crypto/arm64/sha3.h
>  create mode 100644 lib/crypto/s390/sha3.h
>  create mode 100644 lib/crypto/sha3.c
>  create mode 100644 lib/crypto/tests/sha3-testvecs.h
>  create mode 100644 lib/crypto/tests/sha3_kunit.c
> 
> base-commit: e3068492d0016d0ea9a1ff07dbfa624d2ec773ca

Picked this series from your ebiggers repo branch sha3-lib-v2.
Build on s390 runs without any complains, no warnings.
As recommended I enabled the KUNIT option and also 
CRYPTO_SELFTESTS_FULL.
With an "modprobe tcrypt" I enforced to run the selftests
and in parallel I checked that the s390 specific CPACF instructions
are really used (can be done with the pai command and check for
the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
all the the sha3 hashes and check for thread safety.
All this ran without any findings. However there are NO performance
related tests involved.

What's a little bit tricky here is that the sha3 lib is statically
build into the kernel. So no chance to unload/load this as a module.
For sha1 and the sha2 stuff I can understand the need to have this
statically enabled in the kernel. Sha3 is only supposed to be available
as backup in case of sha2 deficiencies. So I can't see why this is
really statically needed.

Tested-by: Harald Freudenberger <freude@linux.ibm.com>




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-29  9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
@ 2025-10-29 16:32   ` Eric Biggers
  2025-10-29 20:33     ` Eric Biggers
  2025-10-30 10:10     ` Harald Freudenberger
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-29 16:32 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
> > If the s390 folks could re-test the s390 optimized SHA-3 code (by
> > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> > it would be helpful to provide the benchmark output from just before
> > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > functions".  Then we can verify that each change is useful.
[...]
> 
> Picked this series from your ebiggers repo branch sha3-lib-v2.
> Build on s390 runs without any complains, no warnings.
> As recommended I enabled the KUNIT option and also CRYPTO_SELFTESTS_FULL.
> With an "modprobe tcrypt" I enforced to run the selftests
> and in parallel I checked that the s390 specific CPACF instructions
> are really used (can be done with the pai command and check for
> the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
> all the the sha3 hashes and check for thread safety.
> All this ran without any findings. However there are NO performance
> related tests involved.

Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
verify that all its test cases passed?  That's the most important one.
It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
enabled, and I was hoping to see your results from that after each
change.  The results get printed to the kernel log when the test runs.

> What's a little bit tricky here is that the sha3 lib is statically
> build into the kernel. So no chance to unload/load this as a module.
> For sha1 and the sha2 stuff I can understand the need to have this
> statically enabled in the kernel. Sha3 is only supposed to be available
> as backup in case of sha2 deficiencies. So I can't see why this is
> really statically needed.

CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
or a loadable module, depending on what other kconfig options select it.
Same as all the other crypto library modules.

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-29 16:32   ` Eric Biggers
@ 2025-10-29 20:33     ` Eric Biggers
  2025-10-30  8:11       ` Heiko Carstens
  2025-10-30 10:16       ` Harald Freudenberger
  2025-10-30 10:10     ` Harald Freudenberger
  1 sibling, 2 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-29 20:33 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Wed, Oct 29, 2025 at 09:32:16AM -0700, Eric Biggers wrote:
> On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
> > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
> > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> > > it would be helpful to provide the benchmark output from just before
> > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > > functions".  Then we can verify that each change is useful.
> [...]
> > 
> > Picked this series from your ebiggers repo branch sha3-lib-v2.
> > Build on s390 runs without any complains, no warnings.
> > As recommended I enabled the KUNIT option and also CRYPTO_SELFTESTS_FULL.
> > With an "modprobe tcrypt" I enforced to run the selftests
> > and in parallel I checked that the s390 specific CPACF instructions
> > are really used (can be done with the pai command and check for
> > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
> > all the the sha3 hashes and check for thread safety.
> > All this ran without any findings. However there are NO performance
> > related tests involved.
> 
> Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
> verify that all its test cases passed?  That's the most important one.
> It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
> enabled, and I was hoping to see your results from that after each
> change.  The results get printed to the kernel log when the test runs.
> 

Also, can you confirm that you ran the test on a CPU that has
"facility 86", so that the one-shot digest functions get exercised?

(By the way, I recommend defining named constants somewhere in
arch/s390/ for the different facilities.  I borrowed the
"test_facility(86)" from the existing code, which does not say what 86
means.  After doing some research, it looks like it means MSA12.)

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-29 20:33     ` Eric Biggers
@ 2025-10-30  8:11       ` Heiko Carstens
  2025-10-30 10:16       ` Harald Freudenberger
  1 sibling, 0 replies; 34+ messages in thread
From: Heiko Carstens @ 2025-10-30  8:11 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Harald Freudenberger, linux-crypto, David Howells, Ard Biesheuvel,
	Jason A . Donenfeld, Holger Dengler, Herbert Xu, linux-arm-kernel,
	linux-s390, linux-kernel

On Wed, Oct 29, 2025 at 08:33:45PM +0000, Eric Biggers wrote:
> (By the way, I recommend defining named constants somewhere in
> arch/s390/ for the different facilities.  I borrowed the
> "test_facility(86)" from the existing code, which does not say what 86
> means.  After doing some research, it looks like it means MSA12.)

Not so surpringly this has been discussed several times in the
past. It would have been easy if each of those bits would represent
exactly one facility, but then there is e.g. bit 46 which means:

the distinct-operands, fast-BCR-serialization, high-word, and
population-count facilities, the interlocked-access facility 1, and
the load/store-on- condition facility 1 are installed in the
z/Architecture architectural mode

Some proposed to add defines like "FACILITY_MULTI_46", which is of
course pointless, since there is added benefit for just using plain
46. Alternatively it would be possible to have a define for each of
them. But if you need two or three of them for your code, then there
would be several tests needed - all for the same bit, and each one
generating a static branch - which also doesn't make too much sense.

So in the end we ended up with the conclusion to stick with the plain
numbers.

That said, users are still free to add aliases like e.g. cpu_has_vx(),
see arch/s390/include/asm/cpufeature.h. It is just not an all or
nothing approach.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-29 16:32   ` Eric Biggers
  2025-10-29 20:33     ` Eric Biggers
@ 2025-10-30 10:10     ` Harald Freudenberger
  2025-10-30 17:14       ` Eric Biggers
  1 sibling, 1 reply; 34+ messages in thread
From: Harald Freudenberger @ 2025-10-30 10:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-10-29 17:32, Eric Biggers wrote:
> On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > it would be helpful to provide the benchmark output from just before
>> > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > functions".  Then we can verify that each change is useful.
> [...]
>> 
>> Picked this series from your ebiggers repo branch sha3-lib-v2.
>> Build on s390 runs without any complains, no warnings.
>> As recommended I enabled the KUNIT option and also 
>> CRYPTO_SELFTESTS_FULL.
>> With an "modprobe tcrypt" I enforced to run the selftests
>> and in parallel I checked that the s390 specific CPACF instructions
>> are really used (can be done with the pai command and check for
>> the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> all the the sha3 hashes and check for thread safety.
>> All this ran without any findings. However there are NO performance
>> related tests involved.
> 
> Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
> verify that all its test cases passed?  That's the most important one.
> It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
> enabled, and I was hoping to see your results from that after each
> change.  The results get printed to the kernel log when the test runs.
> 

Here it is - as this is a zVM system the benchmark values may show poor 
performance.

Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
14 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
109 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
911 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 1849 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1872 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 2647 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 3338 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 5484 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 5562 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 8297 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 12625 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 11242 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12853 MB/s
Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21
Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
skip:0 total:21
Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3

>> What's a little bit tricky here is that the sha3 lib is statically
>> build into the kernel. So no chance to unload/load this as a module.
>> For sha1 and the sha2 stuff I can understand the need to have this
>> statically enabled in the kernel. Sha3 is only supposed to be 
>> available
>> as backup in case of sha2 deficiencies. So I can't see why this is
>> really statically needed.
> 
> CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
> or a loadable module, depending on what other kconfig options select 
> it.
> Same as all the other crypto library modules.

I know and see this. However, I am unable to switch this to 'm'. It 
seems
like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
'm'.
And honestly I am unable to read these dependencies (forgive my 
ignorance):

CONFIG_CRYPTO_SHA3:
SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
  Symbol: CRYPTO_SHA3 [=y]
   Type  : tristate
   Defined at crypto/Kconfig:1006
     Prompt: SHA-3
     Depends on: CRYPTO [=y]
     Location:
       -> Cryptographic API (CRYPTO [=y])
         -> Hashes, digests, and MACs
           -> SHA-3 (CRYPTO_SHA3 [=y])
   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
   Selected by [y]:
     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
   Selected by [n]:
     - MODULE_SIG_SHA3_256 [=n] && MODULES [=y] && (MODULE_SIG [=y] || 
IMA_APPRAISE_MODSIG [=n])
     - MODULE_SIG_SHA3_384 [=n] && MODULES [=y] && (MODULE_SIG [=y] || 
IMA_APPRAISE_MODSIG [=n])
     - MODULE_SIG_SHA3_512 [=n] && MODULES [=y] && (MODULE_SIG [=y] || 
IMA_APPRAISE_MODSIG [=n])
     - CRYPTO_DEV_ZYNQMP_SHA3 [=n] && CRYPTO [=y] && CRYPTO_HW [=y] && 
(ZYNQMP_FIRMWARE [=n] || COMPILE_TEST [=n])
     - CRYPTO_DEV_STM32_HASH [=n] && CRYPTO [=y] && CRYPTO_HW [=y] && 
(ARCH_STM32 || ARCH_U8500) && HAS_DMA [=y]
     - CRYPTO_DEV_SAFEXCEL [=n] && CRYPTO [=y] && CRYPTO_HW [=y] && (OF 
[=n] || PCI [=y] || COMPILE_TEST [=n]) && HAS_IOMEM [=y]

> 
> - Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-29 20:33     ` Eric Biggers
  2025-10-30  8:11       ` Heiko Carstens
@ 2025-10-30 10:16       ` Harald Freudenberger
  1 sibling, 0 replies; 34+ messages in thread
From: Harald Freudenberger @ 2025-10-30 10:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-10-29 21:33, Eric Biggers wrote:
> On Wed, Oct 29, 2025 at 09:32:16AM -0700, Eric Biggers wrote:
>> On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > it would be helpful to provide the benchmark output from just before
>> > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > functions".  Then we can verify that each change is useful.
>> [...]
>> >
>> > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > Build on s390 runs without any complains, no warnings.
>> > As recommended I enabled the KUNIT option and also CRYPTO_SELFTESTS_FULL.
>> > With an "modprobe tcrypt" I enforced to run the selftests
>> > and in parallel I checked that the s390 specific CPACF instructions
>> > are really used (can be done with the pai command and check for
>> > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > all the the sha3 hashes and check for thread safety.
>> > All this ran without any findings. However there are NO performance
>> > related tests involved.
>> 
>> Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> verify that all its test cases passed?  That's the most important one.
>> It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> enabled, and I was hoping to see your results from that after each
>> change.  The results get printed to the kernel log when the test runs.
>> 
> 
> Also, can you confirm that you ran the test on a CPU that has
> "facility 86", so that the one-shot digest functions get exercised?
> 
> (By the way, I recommend defining named constants somewhere in
> arch/s390/ for the different facilities.  I borrowed the
> "test_facility(86)" from the existing code, which does not say what 86
> means.  After doing some research, it looks like it means MSA12.)
> 

Of course, the machine I used has MSA level 12 (stfle bit 86).

> - Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (15 preceding siblings ...)
  2025-10-29  9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
@ 2025-10-30 14:08 ` Ard Biesheuvel
  2025-11-03 17:34 ` Eric Biggers
  17 siblings, 0 replies; 34+ messages in thread
From: Ard Biesheuvel @ 2025-10-30 14:08 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Jason A . Donenfeld, Holger Dengler,
	Harald Freudenberger, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Sun, 26 Oct 2025 at 06:53, Eric Biggers <ebiggers@kernel.org> wrote:
>
> This series is targeting libcrypto-next.  It can also be retrieved from:
>
>     git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git sha3-lib-v2
>
> This series adds SHA-3 support to lib/crypto/.  This includes support
> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> and also support for the extendable-output functions SHAKE128 and
> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
>
> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> into lib/crypto/.  (The existing s390 code couldn't really be reused, so
> really I rewrote it from scratch.)  This makes the SHA-3 library
> functions be accelerated on these architectures.
>
> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> algorithms are reimplemented on top of the library API.
>
> If the s390 folks could re-test the s390 optimized SHA-3 code (by
> enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> it would be helpful to provide the benchmark output from just before
> "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions".  Then we can verify that each change is useful.
>
> Changed in v2:
>   - Added missing selection of CRYPTO_LIB_SHA3 from CRYPTO_SHA3.
>   - Fixed a bug where incorrect SHAKE output was produced if a
>     zero-length squeeze was followed by a nonzero-length squeeze.
>   - Improved the SHAKE tests.
>   - Utilized the one-shot SHA-3 digest instructions on s390.
>   - Split the s390 changes into several patches.
>   - Folded some of my patches into David's.
>   - Dropped some unnecessary changes from the first 2 patches.
>   - Lots more cleanups, mainly to "lib/crypto: sha3: Add SHA-3 support".
>
> Changed in v1 (vs. first 5 patches of David's v6 patchset):
>   - Migrated the arm64 and s390 code into lib/crypto/
>   - Simplified the library API
>   - Added FIPS test
>   - Many other fixes and improvements
>
> The first 5 patches are derived from David's v6 patchset
> (https://lore.kernel.org/linux-crypto/20251017144311.817771-1-dhowells@redhat.com/).
> Earlier changelogs can be found there.
>
> David Howells (5):
>   crypto: s390/sha3 - Rename conflicting functions
>   crypto: arm64/sha3 - Rename conflicting function
>   lib/crypto: sha3: Add SHA-3 support
>   lib/crypto: sha3: Move SHA3 Iota step mapping into round function
>   lib/crypto: tests: Add SHA3 kunit tests
>
> Eric Biggers (10):
>   lib/crypto: tests: Add additional SHAKE tests
>   lib/crypto: sha3: Add FIPS cryptographic algorithm self-test
>   crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library
>   lib/crypto: arm64/sha3: Migrate optimized code into library
>   lib/crypto: s390/sha3: Add optimized Keccak functions
>   lib/crypto: sha3: Support arch overrides of one-shot digest functions
>   lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
>   crypto: jitterentropy - Use default sha3 implementation
>   crypto: sha3 - Reimplement using library API
>   crypto: s390/sha3 - Remove superseded SHA-3 code
>

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-30 10:10     ` Harald Freudenberger
@ 2025-10-30 17:14       ` Eric Biggers
  2025-10-31 14:29         ` Harald Freudenberger
                           ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Eric Biggers @ 2025-10-30 17:14 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
> On 2025-10-29 17:32, Eric Biggers wrote:
> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
> > > > it would be helpful to provide the benchmark output from just before
> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > > > functions".  Then we can verify that each change is useful.
> > [...]
> > > 
> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
> > > Build on s390 runs without any complains, no warnings.
> > > As recommended I enabled the KUNIT option and also
> > > CRYPTO_SELFTESTS_FULL.
> > > With an "modprobe tcrypt" I enforced to run the selftests
> > > and in parallel I checked that the s390 specific CPACF instructions
> > > are really used (can be done with the pai command and check for
> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
> > > all the the sha3 hashes and check for thread safety.
> > > All this ran without any findings. However there are NO performance
> > > related tests involved.
> > 
> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
> > verify that all its test cases passed?  That's the most important one.
> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
> > enabled, and I was hoping to see your results from that after each
> > change.  The results get printed to the kernel log when the test runs.
> > 
> 
> Here it is - as this is a zVM system the benchmark values may show poor
> performance.
> 
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 14
> MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 109
> MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 911
> MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 1849 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1872 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 2647 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 3338 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 5484 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 5562 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 8297 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 12625 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 11242 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12853 MB/s
> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 skip:0
> total:21
> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3

Thanks!  Is this with the whole series applied?  Those numbers are
pretty fast, so probably at least the Keccak acceleration part is
worthwhile.  But just to reiterate what I asked for:

    Also, it would be helpful to provide the benchmark output from just
    before "lib/crypto: s390/sha3: Add optimized Keccak function", just
    after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
    SHA-3 digest functions".

So I'd like to see how much each change helped, which isn't clear if you
show only the result at the end.

If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
one-shot SHA-3 digest functions" actually helps significantly vs. simply
doing the Keccak acceleration, then we should drop it for simplicity.

> > > What's a little bit tricky here is that the sha3 lib is statically
> > > build into the kernel. So no chance to unload/load this as a module.
> > > For sha1 and the sha2 stuff I can understand the need to have this
> > > statically enabled in the kernel. Sha3 is only supposed to be
> > > available
> > > as backup in case of sha2 deficiencies. So I can't see why this is
> > > really statically needed.
> > 
> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
> > or a loadable module, depending on what other kconfig options select it.
> > Same as all the other crypto library modules.
> 
> I know and see this. However, I am unable to switch this to 'm'. It seems
> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 'm'.
> And honestly I am unable to read these dependencies (forgive my ignorance):
> 
> CONFIG_CRYPTO_SHA3:
> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>  Symbol: CRYPTO_SHA3 [=y]
>   Type  : tristate
>   Defined at crypto/Kconfig:1006
>     Prompt: SHA-3
>     Depends on: CRYPTO [=y]
>     Location:
>       -> Cryptographic API (CRYPTO [=y])
>         -> Hashes, digests, and MACs
>           -> SHA-3 (CRYPTO_SHA3 [=y])
>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>   Selected by [y]:
>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]

Well, all that is saying is that there is a built-in option that selects
SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
working as intended in that case.  (And it's also intended that we no
longer allow the architecture-optimized code to be built as a module
when the generic code is built-in.  That was always a huge footgun.)  If
you want to know why something that needs SHA-3 is being built-in, you'd
need to follow the chain of dependencies up to see how it gets selected.

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-30 17:14       ` Eric Biggers
@ 2025-10-31 14:29         ` Harald Freudenberger
  2025-11-04 11:07         ` Harald Freudenberger
  2025-11-04 11:55         ` Harald Freudenberger
  2 siblings, 0 replies; 34+ messages in thread
From: Harald Freudenberger @ 2025-10-31 14:29 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-10-30 18:14, Eric Biggers wrote:
> On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
>> On 2025-10-29 17:32, Eric Biggers wrote:
>> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > > it would be helpful to provide the benchmark output from just before
>> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > > functions".  Then we can verify that each change is useful.
>> > [...]
>> > >
>> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > > Build on s390 runs without any complains, no warnings.
>> > > As recommended I enabled the KUNIT option and also
>> > > CRYPTO_SELFTESTS_FULL.
>> > > With an "modprobe tcrypt" I enforced to run the selftests
>> > > and in parallel I checked that the s390 specific CPACF instructions
>> > > are really used (can be done with the pai command and check for
>> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > > all the the sha3 hashes and check for thread safety.
>> > > All this ran without any findings. However there are NO performance
>> > > related tests involved.
>> >
>> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> > verify that all its test cases passed?  That's the most important one.
>> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> > enabled, and I was hoping to see your results from that after each
>> > change.  The results get printed to the kernel log when the test runs.
>> >
>> 
>> Here it is - as this is a zVM system the benchmark values may show 
>> poor
>> performance.
>> 
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 14
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 109
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 911
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1849 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1872 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2647 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 3338 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 5484 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 5562 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 8297 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 12625 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11242 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12853 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3
> 
> Thanks!  Is this with the whole series applied?  Those numbers are
> pretty fast, so probably at least the Keccak acceleration part is
> worthwhile.  But just to reiterate what I asked for:
> 
>     Also, it would be helpful to provide the benchmark output from just
>     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>     SHA-3 digest functions".
> 
> So I'd like to see how much each change helped, which isn't clear if 
> you
> show only the result at the end.

Yea, let's see ... Monday maybe ...

> 
> If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> one-shot SHA-3 digest functions" actually helps significantly vs. 
> simply
> doing the Keccak acceleration, then we should drop it for simplicity.
> 
>> > > What's a little bit tricky here is that the sha3 lib is statically
>> > > build into the kernel. So no chance to unload/load this as a module.
>> > > For sha1 and the sha2 stuff I can understand the need to have this
>> > > statically enabled in the kernel. Sha3 is only supposed to be
>> > > available
>> > > as backup in case of sha2 deficiencies. So I can't see why this is
>> > > really statically needed.
>> >
>> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
>> > or a loadable module, depending on what other kconfig options select it.
>> > Same as all the other crypto library modules.
>> 
>> I know and see this. However, I am unable to switch this to 'm'. It 
>> seems
>> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
>> 'm'.
>> And honestly I am unable to read these dependencies (forgive my 
>> ignorance):
>> 
>> CONFIG_CRYPTO_SHA3:
>> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>>  Symbol: CRYPTO_SHA3 [=y]
>>   Type  : tristate
>>   Defined at crypto/Kconfig:1006
>>     Prompt: SHA-3
>>     Depends on: CRYPTO [=y]
>>     Location:
>>       -> Cryptographic API (CRYPTO [=y])
>>         -> Hashes, digests, and MACs
>>           -> SHA-3 (CRYPTO_SHA3 [=y])
>>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>>   Selected by [y]:
>>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
> 
> Well, all that is saying is that there is a built-in option that 
> selects
> SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
> working as intended in that case.  (And it's also intended that we no
> longer allow the architecture-optimized code to be built as a module
> when the generic code is built-in.  That was always a huge footgun.)  
> If
> you want to know why something that needs SHA-3 is being built-in, 
> you'd
> need to follow the chain of dependencies up to see how it gets 
> selected.
> 
> - Eric

Thanks


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
                   ` (16 preceding siblings ...)
  2025-10-30 14:08 ` Ard Biesheuvel
@ 2025-11-03 17:34 ` Eric Biggers
       [not found]   ` <4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com>
  17 siblings, 1 reply; 34+ messages in thread
From: Eric Biggers @ 2025-11-03 17:34 UTC (permalink / raw)
  To: linux-crypto
  Cc: David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Harald Freudenberger, Herbert Xu,
	linux-arm-kernel, linux-s390, linux-kernel

On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> This series is targeting libcrypto-next.  It can also be retrieved from:
> 
>     git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git sha3-lib-v2
> 
> This series adds SHA-3 support to lib/crypto/.  This includes support
> for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> and also support for the extendable-output functions SHAKE128 and
> SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> 
> The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> into lib/crypto/.  (The existing s390 code couldn't really be reused, so
> really I rewrote it from scratch.)  This makes the SHA-3 library
> functions be accelerated on these architectures.
> 
> Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> algorithms are reimplemented on top of the library API.

I've applied this series to
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
excluding the following 2 patches which are waiting on benchmark results
from the s390 folks:

    lib/crypto: sha3: Support arch overrides of one-shot digest functions
    lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions

I'd be glad to apply those too if they're shown to be worthwhile.

Note: I also reordered the commits in libcrypto-next to put the new
KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
improvements on a separate branch that's merged in.  This will allow
making separate pull requests for the tests and the AES-GCM
improvements, which I think aligns with what Linus had requested before
(https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-30 17:14       ` Eric Biggers
  2025-10-31 14:29         ` Harald Freudenberger
@ 2025-11-04 11:07         ` Harald Freudenberger
  2025-11-04 18:27           ` Eric Biggers
  2025-11-04 11:55         ` Harald Freudenberger
  2 siblings, 1 reply; 34+ messages in thread
From: Harald Freudenberger @ 2025-11-04 11:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-10-30 18:14, Eric Biggers wrote:
> On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
>> On 2025-10-29 17:32, Eric Biggers wrote:
>> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > > it would be helpful to provide the benchmark output from just before
>> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > > functions".  Then we can verify that each change is useful.
>> > [...]
>> > >
>> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > > Build on s390 runs without any complains, no warnings.
>> > > As recommended I enabled the KUNIT option and also
>> > > CRYPTO_SELFTESTS_FULL.
>> > > With an "modprobe tcrypt" I enforced to run the selftests
>> > > and in parallel I checked that the s390 specific CPACF instructions
>> > > are really used (can be done with the pai command and check for
>> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > > all the the sha3 hashes and check for thread safety.
>> > > All this ran without any findings. However there are NO performance
>> > > related tests involved.
>> >
>> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> > verify that all its test cases passed?  That's the most important one.
>> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> > enabled, and I was hoping to see your results from that after each
>> > change.  The results get printed to the kernel log when the test runs.
>> >
>> 
>> Here it is - as this is a zVM system the benchmark values may show 
>> poor
>> performance.
>> 
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 14
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 109
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 911
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1849 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1872 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2647 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 3338 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 5484 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 5562 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 8297 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 12625 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11242 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12853 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3
> 
> Thanks!  Is this with the whole series applied?  Those numbers are
> pretty fast, so probably at least the Keccak acceleration part is
> worthwhile.  But just to reiterate what I asked for:
> 
>     Also, it would be helpful to provide the benchmark output from just
>     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>     SHA-3 digest functions".
> 
> So I'd like to see how much each change helped, which isn't clear if 
> you
> show only the result at the end.
> 
> If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> one-shot SHA-3 digest functions" actually helps significantly vs. 
> simply
> doing the Keccak acceleration, then we should drop it for simplicity.
> 
>> > > What's a little bit tricky here is that the sha3 lib is statically
>> > > build into the kernel. So no chance to unload/load this as a module.
>> > > For sha1 and the sha2 stuff I can understand the need to have this
>> > > statically enabled in the kernel. Sha3 is only supposed to be
>> > > available
>> > > as backup in case of sha2 deficiencies. So I can't see why this is
>> > > really statically needed.
>> >
>> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
>> > or a loadable module, depending on what other kconfig options select it.
>> > Same as all the other crypto library modules.
>> 
>> I know and see this. However, I am unable to switch this to 'm'. It 
>> seems
>> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
>> 'm'.
>> And honestly I am unable to read these dependencies (forgive my 
>> ignorance):
>> 
>> CONFIG_CRYPTO_SHA3:
>> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>>  Symbol: CRYPTO_SHA3 [=y]
>>   Type  : tristate
>>   Defined at crypto/Kconfig:1006
>>     Prompt: SHA-3
>>     Depends on: CRYPTO [=y]
>>     Location:
>>       -> Cryptographic API (CRYPTO [=y])
>>         -> Hashes, digests, and MACs
>>           -> SHA-3 (CRYPTO_SHA3 [=y])
>>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>>   Selected by [y]:
>>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
> 
> Well, all that is saying is that there is a built-in option that 
> selects
> SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
> working as intended in that case.  (And it's also intended that we no
> longer allow the architecture-optimized code to be built as a module
> when the generic code is built-in.  That was always a huge footgun.)  
> If
> you want to know why something that needs SHA-3 is being built-in, 
> you'd
> need to follow the chain of dependencies up to see how it gets 
> selected.
> 
> - Eric

commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 
digest functions:

Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
12 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
80 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
785 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 812 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1619 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 2319 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 2176 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 4881 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 4968 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 7565 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 11909 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 10378 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12273 MB/s
Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21

commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak 
functions:

Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
12 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
211 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
835 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 1557 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1617 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 1457 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 1830 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 3035 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 3245 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 5319 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 9969 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 11123 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12767 MB/s
Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21

commit aaca0ebc0717 lib/crypto: arm64/sha3: Migrate optimized code into 
library:

Nov 04 12:02:31 b3545008.lnxne.boe kernel:     # module: sha3_kunit
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     1..21
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 1 
test_hash_test_vectors
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 2 
test_hash_all_lens_up_to_4096
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 3 
test_hash_incremental_updates
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 4 
test_hash_buffer_overruns
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 6 
test_hash_alignment_consistency
Nov 04 12:02:31 b3545008.lnxne.boe kernel:     ok 7 
test_hash_ctx_zeroization
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 8 
test_hash_interrupt_context_1
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 9 
test_hash_interrupt_context_2
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 18 
test_shake_all_lens_up_to_4096
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 19 
test_shake_multiple_squeezes
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 20 
test_shake_with_guarded_bufs
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
1 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
29 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
120 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 236 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 238 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 185 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 237 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 240 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 239 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 246 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 251 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 253 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 259 MB/s
Nov 04 12:02:32 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
Nov 04 12:02:32 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0 
total:21

obviously this is without s390 specific acceleration.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-10-30 17:14       ` Eric Biggers
  2025-10-31 14:29         ` Harald Freudenberger
  2025-11-04 11:07         ` Harald Freudenberger
@ 2025-11-04 11:55         ` Harald Freudenberger
  2 siblings, 0 replies; 34+ messages in thread
From: Harald Freudenberger @ 2025-11-04 11:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-10-30 18:14, Eric Biggers wrote:
> On Thu, Oct 30, 2025 at 11:10:22AM +0100, Harald Freudenberger wrote:
>> On 2025-10-29 17:32, Eric Biggers wrote:
>> > On Wed, Oct 29, 2025 at 10:30:40AM +0100, Harald Freudenberger wrote:
>> > > > If the s390 folks could re-test the s390 optimized SHA-3 code (by
>> > > > enabling CRYPTO_LIB_SHA3_KUNIT_TEST and CRYPTO_LIB_BENCHMARK), that
>> > > > would be helpful.  QEMU doesn't support the instructions it uses.  Also,
>> > > > it would be helpful to provide the benchmark output from just before
>> > > > "lib/crypto: s390/sha3: Add optimized Keccak function", just after it,
>> > > > and after "lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > > > functions".  Then we can verify that each change is useful.
>> > [...]
>> > >
>> > > Picked this series from your ebiggers repo branch sha3-lib-v2.
>> > > Build on s390 runs without any complains, no warnings.
>> > > As recommended I enabled the KUNIT option and also
>> > > CRYPTO_SELFTESTS_FULL.
>> > > With an "modprobe tcrypt" I enforced to run the selftests
>> > > and in parallel I checked that the s390 specific CPACF instructions
>> > > are really used (can be done with the pai command and check for
>> > > the KIMD_SHA3_* counters). Also ran some AF-alg tests to verify
>> > > all the the sha3 hashes and check for thread safety.
>> > > All this ran without any findings. However there are NO performance
>> > > related tests involved.
>> >
>> > Thanks!  Just to confirm, did you actually run the sha3 KUnit test and
>> > verify that all its test cases passed?  That's the most important one.
>> > It also includes a benchmark, if CONFIG_CRYPTO_LIB_BENCHMARK=y is
>> > enabled, and I was hoping to see your results from that after each
>> > change.  The results get printed to the kernel log when the test runs.
>> >
>> 
>> Here it is - as this is a zVM system the benchmark values may show 
>> poor
>> performance.
>> 
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: 1..1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     KTAP version 1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # Subtest: sha3
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     1..21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 14
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 109
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 911
>> MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1849 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1872 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2647 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 3338 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 5484 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 5562 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 8297 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 12625 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11242 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12853 MB/s
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: # Totals: pass:21 fail:0 
>> skip:0
>> total:21
>> Oct 30 10:46:44 b3545008.lnxne.boe kernel: ok 1 sha3
> 
> Thanks!  Is this with the whole series applied?  Those numbers are
> pretty fast, so probably at least the Keccak acceleration part is
> worthwhile.  But just to reiterate what I asked for:
> 
>     Also, it would be helpful to provide the benchmark output from just
>     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>     SHA-3 digest functions".
> 
> So I'd like to see how much each change helped, which isn't clear if 
> you
> show only the result at the end.
> 
> If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> one-shot SHA-3 digest functions" actually helps significantly vs. 
> simply
> doing the Keccak acceleration, then we should drop it for simplicity.
> 
>> > > What's a little bit tricky here is that the sha3 lib is statically
>> > > build into the kernel. So no chance to unload/load this as a module.
>> > > For sha1 and the sha2 stuff I can understand the need to have this
>> > > statically enabled in the kernel. Sha3 is only supposed to be
>> > > available
>> > > as backup in case of sha2 deficiencies. So I can't see why this is
>> > > really statically needed.
>> >
>> > CONFIG_CRYPTO_LIB_SHA3 is a tristate option.  It can be either built-in
>> > or a loadable module, depending on what other kconfig options select it.
>> > Same as all the other crypto library modules.
>> 
>> I know and see this. However, I am unable to switch this to 'm'. It 
>> seems
>> like the root cause is that CRYPTO_SHA3='y' and I can't change this to 
>> 'm'.
>> And honestly I am unable to read these dependencies (forgive my 
>> ignorance):
>> 
>> CONFIG_CRYPTO_SHA3:
>> SHA-3 secure hash algorithms (FIPS 202, ISO/IEC 10118-3)
>>  Symbol: CRYPTO_SHA3 [=y]
>>   Type  : tristate
>>   Defined at crypto/Kconfig:1006
>>     Prompt: SHA-3
>>     Depends on: CRYPTO [=y]
>>     Location:
>>       -> Cryptographic API (CRYPTO [=y])
>>         -> Hashes, digests, and MACs
>>           -> SHA-3 (CRYPTO_SHA3 [=y])
>>   Selects: CRYPTO_HASH [=y] && CRYPTO_LIB_SHA3 [=y]
>>   Selected by [y]:
>>     - CRYPTO_JITTERENTROPY [=y] && CRYPTO [=y]
> 
> Well, all that is saying is that there is a built-in option that 
> selects
> SHA-3, which causes it to be built-in.  So SHA-3 being built-in is
> working as intended in that case.  (And it's also intended that we no
> longer allow the architecture-optimized code to be built as a module
> when the generic code is built-in.  That was always a huge footgun.)  
> If
> you want to know why something that needs SHA-3 is being built-in, 
> you'd
> need to follow the chain of dependencies up to see how it gets 
> selected.
> 
> - Eric

And here is a benchmark where I used I used commit
151fbe15a6cb lib/crypto: s390/sha3: Migrate optimized code into library
from your branch sha3-lib-v1. As far as I have in mind this lacks the
code optimizations:

Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 
12 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 
196 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 
648 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=127: 1011 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=128: 1014 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=200: 1281 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=256: 1396 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=511: 2593 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=512: 2624 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=1024: 4637 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=3173: 8931 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=4096: 10636 MB/s
Nov 04 12:47:32 b3545008.lnxne.boe kernel:     # benchmark_hash: 
len=16384: 12339 MB/s


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-11-04 11:07         ` Harald Freudenberger
@ 2025-11-04 18:27           ` Eric Biggers
  2025-11-05  8:16             ` Harald Freudenberger
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Biggers @ 2025-11-04 18:27 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
> > Thanks!  Is this with the whole series applied?  Those numbers are
> > pretty fast, so probably at least the Keccak acceleration part is
> > worthwhile.  But just to reiterate what I asked for:
> > 
> >     Also, it would be helpful to provide the benchmark output from just
> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
> >     SHA-3 digest functions".
> > 
> > So I'd like to see how much each change helped, which isn't clear if you
> > show only the result at the end.
> > 
> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
> > doing the Keccak acceleration, then we should drop it for simplicity.
[...]
> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3
> digest functions:
> 
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 80
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 785
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 812 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1619 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 2319 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 2176 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 4881 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 4968 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 7565 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 11909 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 10378 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12273 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
> 
> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak functions:
> 
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 211
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 835
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 1557 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1617 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 1457 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 1830 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 3035 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 3245 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 5319 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 9969 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 11123 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12767 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21

Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
optimized one-shot SHA-3 digest functions" are:

    Length (bytes)      Before            After
    ==============    ==========        ==========
         1               12 MB/s           12 MB/s
        16              211 MB/s           80 MB/s
        64              835 MB/s          785 MB/s
       127             1557 MB/s          812 MB/s
       128             1617 MB/s         1619 MB/s
       200             1457 MB/s         2319 MB/s
       256             1830 MB/s         2176 MB/s
       511             3035 MB/s         4881 MB/s
       512             3245 MB/s         4968 MB/s
      1024             5319 MB/s         7565 MB/s
      3173             9969 MB/s        11909 MB/s
      4096            11123 MB/s        10378 MB/s
     16384            12767 MB/s        12273 MB/s

Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.

I expected the most improvement on short lengths.  The fact that some of
the short lengths actually regressed is concerning.

It's also clear the the Keccak acceleration itself matters far more than
this additional one-shot optimization, as expected.  The generic code
maxed out at only 259 MB/s for you.

I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" for now, to avoid the extra maintainence cost
and opportunity for bugs.

If you can provide more accurate numbers that show it's worthwhile, we
can reconsider.  Maybe set the CPU to a fixed frequency, and run
sha3_kunit multiple times (triggered via KUnit's debugfs interface)?

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-11-04 18:27           ` Eric Biggers
@ 2025-11-05  8:16             ` Harald Freudenberger
  0 siblings, 0 replies; 34+ messages in thread
From: Harald Freudenberger @ 2025-11-05  8:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-11-04 19:27, Eric Biggers wrote:
> On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
>> > Thanks!  Is this with the whole series applied?  Those numbers are
>> > pretty fast, so probably at least the Keccak acceleration part is
>> > worthwhile.  But just to reiterate what I asked for:
>> >
>> >     Also, it would be helpful to provide the benchmark output from just
>> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>> >     SHA-3 digest functions".
>> >
>> > So I'd like to see how much each change helped, which isn't clear if you
>> > show only the result at the end.
>> >
>> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
>> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
>> > doing the Keccak acceleration, then we should drop it for simplicity.
> [...]
>> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot 
>> SHA-3
>> digest functions:
>> 
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 80
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 785
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 812 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1619 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2319 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 2176 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 4881 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 4968 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 7565 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 11909 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 10378 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12273 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> 
>> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak 
>> functions:
>> 
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 211
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 835
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1557 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1617 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 1457 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 1830 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 3035 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 3245 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 5319 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 9969 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11123 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12767 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
> 
> Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
> optimized one-shot SHA-3 digest functions" are:
> 
>     Length (bytes)      Before            After
>     ==============    ==========        ==========
>          1               12 MB/s           12 MB/s
>         16              211 MB/s           80 MB/s
>         64              835 MB/s          785 MB/s
>        127             1557 MB/s          812 MB/s
>        128             1617 MB/s         1619 MB/s
>        200             1457 MB/s         2319 MB/s
>        256             1830 MB/s         2176 MB/s
>        511             3035 MB/s         4881 MB/s
>        512             3245 MB/s         4968 MB/s
>       1024             5319 MB/s         7565 MB/s
>       3173             9969 MB/s        11909 MB/s
>       4096            11123 MB/s        10378 MB/s
>      16384            12767 MB/s        12273 MB/s
> 
> Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
> 3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.
> 
> I expected the most improvement on short lengths.  The fact that some 
> of
> the short lengths actually regressed is concerning.
> 
> It's also clear the the Keccak acceleration itself matters far more 
> than
> this additional one-shot optimization, as expected.  The generic code
> maxed out at only 259 MB/s for you.
> 
> I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" for now, to avoid the extra maintainence cost
> and opportunity for bugs.
> 
> If you can provide more accurate numbers that show it's worthwhile, we
> can reconsider.  Maybe set the CPU to a fixed frequency, and run
> sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
> 
> - Eric

The focus should be on the small data. Let me see what I can do ...
I used a zVM guest for this. Instead use an LPAR may be an option and
some CPU pinning. And do some more tests to be able to calculate a gauss
distribution. However, not within the next few days.
So I agree with you: let's hold back the one-shot optimization.

Harald Freudenberger


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
       [not found]   ` <4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com>
@ 2025-11-06  4:33     ` Eric Biggers
  2025-11-06  7:22       ` Eric Biggers
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Biggers @ 2025-11-06  4:33 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
> On 2025-11-03 18:34, Eric Biggers wrote:
> > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> > > This series is targeting libcrypto-next.  It can also be retrieved
> > > from:
> > > 
> > >     git fetch
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> > > sha3-lib-v2
> > > 
> > > This series adds SHA-3 support to lib/crypto/.  This includes support
> > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> > > and also support for the extendable-output functions SHAKE128 and
> > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> > > 
> > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> > > into lib/crypto/.  (The existing s390 code couldn't really be
> > > reused, so
> > > really I rewrote it from scratch.)  This makes the SHA-3 library
> > > functions be accelerated on these architectures.
> > > 
> > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> > > algorithms are reimplemented on top of the library API.
> > 
> > I've applied this series to
> > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
> > excluding the following 2 patches which are waiting on benchmark results
> > from the s390 folks:
> > 
> >     lib/crypto: sha3: Support arch overrides of one-shot digest
> > functions
> >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
> > 
> > I'd be glad to apply those too if they're shown to be worthwhile.
> > 
> > Note: I also reordered the commits in libcrypto-next to put the new
> > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
> > improvements on a separate branch that's merged in.  This will allow
> > making separate pull requests for the tests and the AES-GCM
> > improvements, which I think aligns with what Linus had requested before
> > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
> > 
> > - Eric
> 
> Here are now some measurements on a LPAR with 500 runs once with
> sha3-lib-v2 branch full ("with") and once with reverting only the
> b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions
> patch ("without"). With the help of gnuplot I generated distribution
> charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
> See attached pictures - Sorry but I see no other way to provide this data
> than using an attachment.
> 
> Clearly the patch brings a boost - especially for the 256 byte case.
> 
> Harald Freudenberger

Thanks.  I applied "lib/crypto: sha3: Support arch overrides of one-shot
digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
the commit message to mention your benchmark results:

commit 862445d3b9e74f58360a7a89787da4dca783e6dd
Author: Eric Biggers <ebiggers@kernel.org>
Date:   Sat Oct 25 22:50:29 2025 -0700

    lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
    
    Some z/Architecture processors can compute a SHA-3 digest in a single
    instruction.  arch/s390/crypto/ already uses this capability to optimize
    the SHA-3 crypto_shash algorithms.
    
    Use this capability to implement the sha3_224(), sha3_256(), sha3_384(),
    and sha3_512() library functions too.
    
    SHA3-256 benchmark results provided by Harald Freudenberger
    (https://lore.kernel.org/r/4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com/)
    on a z/Architecture machine with "facility 86" (MSA level 12):
    
        Length (bytes)    Before (MB/s)   After (MB/s)
        ==============    =============   ============
              16                212             225
              64                820             915
             256               1850            3350
            1024               5400            8300
            4096              11200           11300
    
    Note: the original data from Harald was given in the form of a graph for
    each length, showing the distribution of throughputs from 500 runs.  I
    guesstimated the peak of each one.
    
    Harald also reported that the generic SHA-3 code was at most 259 MB/s
    (https://lore.kernel.org/r/c39f6b6c110def0095e5da5becc12085@linux.ibm.com/).
    So as expected, the earlier commit that optimized sha3_absorb_blocks()
    and sha3_keccakf() is the more important one; it optimized the Keccak
    permutation which is the most performance-critical part of SHA-3.
    Still, this additional commit does notably improve performance further
    on some lengths.
    
    Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
    Tested-by: Harald Freudenberger <freude@linux.ibm.com>
    Link: https://lore.kernel.org/r/20251026055032.1413733-13-ebiggers@kernel.org
    Signed-off-by: Eric Biggers <ebiggers@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-11-06  4:33     ` Eric Biggers
@ 2025-11-06  7:22       ` Eric Biggers
  2025-11-06  8:54         ` Harald Freudenberger
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Biggers @ 2025-11-06  7:22 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Wed, Nov 05, 2025 at 08:33:40PM -0800, Eric Biggers wrote:
> On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
> > On 2025-11-03 18:34, Eric Biggers wrote:
> > > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> > > > This series is targeting libcrypto-next.  It can also be retrieved
> > > > from:
> > > > 
> > > >     git fetch
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> > > > sha3-lib-v2
> > > > 
> > > > This series adds SHA-3 support to lib/crypto/.  This includes support
> > > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> > > > and also support for the extendable-output functions SHAKE128 and
> > > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> > > > 
> > > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> > > > into lib/crypto/.  (The existing s390 code couldn't really be
> > > > reused, so
> > > > really I rewrote it from scratch.)  This makes the SHA-3 library
> > > > functions be accelerated on these architectures.
> > > > 
> > > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> > > > algorithms are reimplemented on top of the library API.
> > > 
> > > I've applied this series to
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
> > > excluding the following 2 patches which are waiting on benchmark results
> > > from the s390 folks:
> > > 
> > >     lib/crypto: sha3: Support arch overrides of one-shot digest
> > > functions
> > >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
> > > 
> > > I'd be glad to apply those too if they're shown to be worthwhile.
> > > 
> > > Note: I also reordered the commits in libcrypto-next to put the new
> > > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
> > > improvements on a separate branch that's merged in.  This will allow
> > > making separate pull requests for the tests and the AES-GCM
> > > improvements, which I think aligns with what Linus had requested before
> > > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
> > > 
> > > - Eric
> > 
> > Here are now some measurements on a LPAR with 500 runs once with
> > sha3-lib-v2 branch full ("with") and once with reverting only the
> > b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> > functions
> > patch ("without"). With the help of gnuplot I generated distribution
> > charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
> > See attached pictures - Sorry but I see no other way to provide this data
> > than using an attachment.
> > 
> > Clearly the patch brings a boost - especially for the 256 byte case.
> > 
> > Harald Freudenberger
> 
> Thanks.  I applied "lib/crypto: sha3: Support arch overrides of one-shot
> digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
> the commit message to mention your benchmark results:

Also, I'm wondering what your plan to add support for these instructions
to QEMU is?  The status quo, where only people with an s390 mainframe
can test this code, isn't sustainable.

I already have s390 in my testing matrix; I run the crypto and CRC tests
on all architectures with optimized crypto or CRC code.  But most of the
s390 optimized crypto code isn't actually being executed.

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-11-06  7:22       ` Eric Biggers
@ 2025-11-06  8:54         ` Harald Freudenberger
  2025-11-06 19:51           ` Eric Biggers
  0 siblings, 1 reply; 34+ messages in thread
From: Harald Freudenberger @ 2025-11-06  8:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On 2025-11-06 08:22, Eric Biggers wrote:
> On Wed, Nov 05, 2025 at 08:33:40PM -0800, Eric Biggers wrote:
>> On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
>> > On 2025-11-03 18:34, Eric Biggers wrote:
>> > > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
>> > > > This series is targeting libcrypto-next.  It can also be retrieved
>> > > > from:
>> > > >
>> > > >     git fetch
>> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
>> > > > sha3-lib-v2
>> > > >
>> > > > This series adds SHA-3 support to lib/crypto/.  This includes support
>> > > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
>> > > > and also support for the extendable-output functions SHAKE128 and
>> > > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
>> > > >
>> > > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
>> > > > into lib/crypto/.  (The existing s390 code couldn't really be
>> > > > reused, so
>> > > > really I rewrote it from scratch.)  This makes the SHA-3 library
>> > > > functions be accelerated on these architectures.
>> > > >
>> > > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
>> > > > algorithms are reimplemented on top of the library API.
>> > >
>> > > I've applied this series to
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
>> > > excluding the following 2 patches which are waiting on benchmark results
>> > > from the s390 folks:
>> > >
>> > >     lib/crypto: sha3: Support arch overrides of one-shot digest
>> > > functions
>> > >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
>> > >
>> > > I'd be glad to apply those too if they're shown to be worthwhile.
>> > >
>> > > Note: I also reordered the commits in libcrypto-next to put the new
>> > > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
>> > > improvements on a separate branch that's merged in.  This will allow
>> > > making separate pull requests for the tests and the AES-GCM
>> > > improvements, which I think aligns with what Linus had requested before
>> > > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
>> > >
>> > > - Eric
>> >
>> > Here are now some measurements on a LPAR with 500 runs once with
>> > sha3-lib-v2 branch full ("with") and once with reverting only the
>> > b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
>> > functions
>> > patch ("without"). With the help of gnuplot I generated distribution
>> > charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
>> > See attached pictures - Sorry but I see no other way to provide this data
>> > than using an attachment.
>> >
>> > Clearly the patch brings a boost - especially for the 256 byte case.
>> >
>> > Harald Freudenberger
>> 
>> Thanks.  I applied "lib/crypto: sha3: Support arch overrides of 
>> one-shot
>> digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
>> SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
>> the commit message to mention your benchmark results:
> 
> Also, I'm wondering what your plan to add support for these 
> instructions
> to QEMU is?  The status quo, where only people with an s390 mainframe
> can test this code, isn't sustainable.
> 
> I already have s390 in my testing matrix; I run the crypto and CRC 
> tests
> on all architectures with optimized crypto or CRC code.  But most of 
> the
> s390 optimized crypto code isn't actually being executed.
> 
> - Eric

Well, there are no plans. However, there has been a decision some while 
ago
that "we" may support this in the future. But as there are currently no
human resources available and working there I suspect a qemu CPACF 
support
in general will not come soon. Please note also that this is really an
implementation of crypto algorithms then and as such it needs to apply
to some regulations with regards of the EAR of the US Bureau of Industry
and Security...

Harald Freudenberger




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/15] SHA-3 library
  2025-11-06  8:54         ` Harald Freudenberger
@ 2025-11-06 19:51           ` Eric Biggers
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Biggers @ 2025-11-06 19:51 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: linux-crypto, David Howells, Ard Biesheuvel, Jason A . Donenfeld,
	Holger Dengler, Herbert Xu, linux-arm-kernel, linux-s390,
	linux-kernel

On Thu, Nov 06, 2025 at 09:54:59AM +0100, Harald Freudenberger wrote:
> > Also, I'm wondering what your plan to add support for these instructions
> > to QEMU is?  The status quo, where only people with an s390 mainframe
> > can test this code, isn't sustainable.
> > 
> > I already have s390 in my testing matrix; I run the crypto and CRC tests
> > on all architectures with optimized crypto or CRC code.  But most of the
> > s390 optimized crypto code isn't actually being executed.
> > 
> > - Eric
> 
> Well, there are no plans. However, there has been a decision some while ago
> that "we" may support this in the future. But as there are currently no
> human resources available and working there I suspect a qemu CPACF support
> in general will not come soon.

Great to hear that you might have someone work on this in the future.
It should be noted that this is a significant gap that puts s390 behind
all the major architectures (x86_64, arm64, arm32, riscv, etc.) and
makes it much more likely that s390 specific bugs be introduced.

We need to have higher standards for cryptography code.

As I've mentioned before, I don't plan to accept code that uses new
instructions without QEMU support.  The SHA-{1,2,3} code is allowed only
because the instructions were already being used by arch/s390/crypto/.

I see that Jason actually added support for CPACF_*_SHA_512 to QEMU a
few years ago
(https://github.com/qemu/qemu/commit/9f17bfdab422887807cbd5260ed6b0b6e54ddb33).
So clearly it's possible to support these instructions in QEMU.
Someone just needs to add support for the other SHA algorithms.

> Please note also that this is really an implementation of crypto
> algorithms then and as such it needs to apply to some regulations with
> regards of the EAR of the US Bureau of Industry and Security...

Like Linux, QEMU is an open source project, published publicly, and
which already contains many cryptographic algorithms.  Check out
https://www.linuxfoundation.org/resources/publications/understanding-us-export-controls-with-open-source-projects

- Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-11-06 19:51 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function Eric Biggers
2025-10-26  5:50 ` [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support Eric Biggers
2025-10-26  5:50 ` [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function Eric Biggers
2025-10-26  5:50 ` [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests Eric Biggers
2025-10-26  5:50 ` [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests Eric Biggers
2025-10-26  5:50 ` [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test Eric Biggers
2025-10-26  5:50 ` [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 " Eric Biggers
2025-10-26  5:50 ` [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation Eric Biggers
2025-10-26  5:50 ` [PATCH v2 14/15] crypto: sha3 - Reimplement using library API Eric Biggers
2025-10-26  5:50 ` [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code Eric Biggers
2025-10-29  9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
2025-10-29 16:32   ` Eric Biggers
2025-10-29 20:33     ` Eric Biggers
2025-10-30  8:11       ` Heiko Carstens
2025-10-30 10:16       ` Harald Freudenberger
2025-10-30 10:10     ` Harald Freudenberger
2025-10-30 17:14       ` Eric Biggers
2025-10-31 14:29         ` Harald Freudenberger
2025-11-04 11:07         ` Harald Freudenberger
2025-11-04 18:27           ` Eric Biggers
2025-11-05  8:16             ` Harald Freudenberger
2025-11-04 11:55         ` Harald Freudenberger
2025-10-30 14:08 ` Ard Biesheuvel
2025-11-03 17:34 ` Eric Biggers
     [not found]   ` <4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com>
2025-11-06  4:33     ` Eric Biggers
2025-11-06  7:22       ` Eric Biggers
2025-11-06  8:54         ` Harald Freudenberger
2025-11-06 19:51           ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).