From: Eric Biggers <ebiggers@kernel.org>
To: Harald Freudenberger <freude@linux.ibm.com>
Cc: linux-crypto@vger.kernel.org, David Howells <dhowells@redhat.com>,
Ard Biesheuvel <ardb@kernel.org>,
"Jason A . Donenfeld" <Jason@zx2c4.com>,
Holger Dengler <dengler@linux.ibm.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library
Date: Tue, 4 Nov 2025 10:27:38 -0800 [thread overview]
Message-ID: <20251104182738.GA2419@sol> (raw)
In-Reply-To: <c39f6b6c110def0095e5da5becc12085@linux.ibm.com>
On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
> > Thanks! Is this with the whole series applied? Those numbers are
> > pretty fast, so probably at least the Keccak acceleration part is
> > worthwhile. But just to reiterate what I asked for:
> >
> > Also, it would be helpful to provide the benchmark output from just
> > before "lib/crypto: s390/sha3: Add optimized Keccak function", just
> > after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
> > SHA-3 digest functions".
> >
> > So I'd like to see how much each change helped, which isn't clear if you
> > show only the result at the end.
> >
> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
> > doing the Keccak acceleration, then we should drop it for simplicity.
[...]
> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3
> digest functions:
>
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # module: sha3_kunit
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: 1..21
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 1 test_hash_test_vectors
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 3
> test_hash_incremental_updates
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 4
> test_hash_buffer_overruns
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 5 test_hash_overlaps
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 6
> test_hash_alignment_consistency
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 7
> test_hash_ctx_zeroization
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 8
> test_hash_interrupt_context_1
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 9
> test_hash_interrupt_context_2
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 10 test_sha3_224_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 11 test_sha3_256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 12 test_sha3_384_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 13 test_sha3_512_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 14 test_shake128_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 15 test_shake256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 16 test_shake128_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 17 test_shake256_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 19
> test_shake_multiple_squeezes
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=16: 80
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=64: 785
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=127:
> 812 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=128:
> 1619 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=200:
> 2319 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=256:
> 2176 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=511:
> 4881 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=512:
> 4968 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=1024:
> 7565 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=3173:
> 11909 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=4096:
> 10378 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # benchmark_hash: len=16384:
> 12273 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: ok 21 benchmark_hash
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
>
> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak functions:
>
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: # module: sha3_kunit
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: 1..21
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 1 test_hash_test_vectors
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 3
> test_hash_incremental_updates
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 4
> test_hash_buffer_overruns
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 5 test_hash_overlaps
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 6
> test_hash_alignment_consistency
> Nov 04 10:55:37 b3545008.lnxne.boe kernel: ok 7
> test_hash_ctx_zeroization
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 8
> test_hash_interrupt_context_1
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 9
> test_hash_interrupt_context_2
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 10 test_sha3_224_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 11 test_sha3_256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 12 test_sha3_384_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 13 test_sha3_512_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 14 test_shake128_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 15 test_shake256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 16 test_shake128_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 17 test_shake256_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 19
> test_shake_multiple_squeezes
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=16: 211
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=64: 835
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=127:
> 1557 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=128:
> 1617 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=200:
> 1457 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=256:
> 1830 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=511:
> 3035 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=512:
> 3245 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=1024:
> 5319 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=3173:
> 9969 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=4096:
> 11123 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # benchmark_hash: len=16384:
> 12767 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: ok 21 benchmark_hash
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
Thanks. So the results before and after "lib/crypto: s390/sha3: Add
optimized one-shot SHA-3 digest functions" are:
Length (bytes) Before After
============== ========== ==========
1 12 MB/s 12 MB/s
16 211 MB/s 80 MB/s
64 835 MB/s 785 MB/s
127 1557 MB/s 812 MB/s
128 1617 MB/s 1619 MB/s
200 1457 MB/s 2319 MB/s
256 1830 MB/s 2176 MB/s
511 3035 MB/s 4881 MB/s
512 3245 MB/s 4968 MB/s
1024 5319 MB/s 7565 MB/s
3173 9969 MB/s 11909 MB/s
4096 11123 MB/s 10378 MB/s
16384 12767 MB/s 12273 MB/s
Unfortunately that seems inconclusive. len=200, 256, 511, 512, 1024,
3173 improved. But len=16, 64, 127, 4096, 16384 regressed.
I expected the most improvement on short lengths. The fact that some of
the short lengths actually regressed is concerning.
It's also clear the the Keccak acceleration itself matters far more than
this additional one-shot optimization, as expected. The generic code
maxed out at only 259 MB/s for you.
I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" for now, to avoid the extra maintainence cost
and opportunity for bugs.
If you can provide more accurate numbers that show it's worthwhile, we
can reconsider. Maybe set the CPU to a fixed frequency, and run
sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
- Eric
next prev parent reply other threads:[~2025-11-04 18:29 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-26 5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
2025-10-26 5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
2025-10-26 5:50 ` [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function Eric Biggers
2025-10-26 5:50 ` [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support Eric Biggers
2025-10-26 5:50 ` [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function Eric Biggers
2025-10-26 5:50 ` [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests Eric Biggers
2025-10-26 5:50 ` [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests Eric Biggers
2025-10-26 5:50 ` [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test Eric Biggers
2025-10-26 5:50 ` [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library Eric Biggers
2025-10-26 5:50 ` [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library Eric Biggers
2025-10-26 5:50 ` [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions Eric Biggers
2025-10-26 5:50 ` [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions Eric Biggers
2025-10-26 5:50 ` [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 " Eric Biggers
2025-10-26 5:50 ` [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation Eric Biggers
2025-10-26 5:50 ` [PATCH v2 14/15] crypto: sha3 - Reimplement using library API Eric Biggers
2025-10-26 5:50 ` [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code Eric Biggers
2025-10-29 9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
2025-10-29 16:32 ` Eric Biggers
2025-10-29 20:33 ` Eric Biggers
2025-10-30 8:11 ` Heiko Carstens
2025-10-30 10:16 ` Harald Freudenberger
2025-10-30 10:10 ` Harald Freudenberger
2025-10-30 17:14 ` Eric Biggers
2025-10-31 14:29 ` Harald Freudenberger
2025-11-04 11:07 ` Harald Freudenberger
2025-11-04 18:27 ` Eric Biggers [this message]
2025-11-05 8:16 ` Harald Freudenberger
2025-11-04 11:55 ` Harald Freudenberger
2025-10-30 14:08 ` Ard Biesheuvel
2025-11-03 17:34 ` Eric Biggers
2025-11-05 15:39 ` Harald Freudenberger
2025-11-06 4:33 ` Eric Biggers
2025-11-06 7:22 ` Eric Biggers
2025-11-06 8:54 ` Harald Freudenberger
2025-11-06 19:51 ` Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251104182738.GA2419@sol \
--to=ebiggers@kernel.org \
--cc=Jason@zx2c4.com \
--cc=ardb@kernel.org \
--cc=dengler@linux.ibm.com \
--cc=dhowells@redhat.com \
--cc=freude@linux.ibm.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).