linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Harald Freudenberger <freude@linux.ibm.com>
Cc: linux-crypto@vger.kernel.org, David Howells <dhowells@redhat.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Holger Dengler <dengler@linux.ibm.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library
Date: Tue, 4 Nov 2025 10:27:38 -0800	[thread overview]
Message-ID: <20251104182738.GA2419@sol> (raw)
In-Reply-To: <c39f6b6c110def0095e5da5becc12085@linux.ibm.com>

On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
> > Thanks!  Is this with the whole series applied?  Those numbers are
> > pretty fast, so probably at least the Keccak acceleration part is
> > worthwhile.  But just to reiterate what I asked for:
> > 
> >     Also, it would be helpful to provide the benchmark output from just
> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
> >     SHA-3 digest functions".
> > 
> > So I'd like to see how much each change helped, which isn't clear if you
> > show only the result at the end.
> > 
> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
> > doing the Keccak acceleration, then we should drop it for simplicity.
[...]
> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3
> digest functions:
> 
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 80
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 785
> MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 812 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1619 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 2319 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 2176 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 4881 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 4968 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 7565 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 11909 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 10378 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12273 MB/s
> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21
> 
> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak functions:
> 
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 test_hash_test_vectors
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
> test_hash_all_lens_up_to_4096
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
> test_hash_incremental_updates
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
> test_hash_buffer_overruns
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
> test_hash_alignment_consistency
> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
> test_hash_ctx_zeroization
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
> test_hash_interrupt_context_1
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
> test_hash_interrupt_context_2
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 test_sha3_224_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 test_sha3_256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 test_sha3_384_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 test_sha3_512_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 test_shake128_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 test_shake256_basic
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 test_shake128_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 test_shake256_nist
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
> test_shake_all_lens_up_to_4096
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
> test_shake_multiple_squeezes
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
> test_shake_with_guarded_bufs
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1: 12
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16: 211
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=64: 835
> MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=127:
> 1557 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=128:
> 1617 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=200:
> 1457 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=256:
> 1830 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=511:
> 3035 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=512:
> 3245 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=1024:
> 5319 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=3173:
> 9969 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=4096:
> 11123 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: len=16384:
> 12767 MB/s
> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 skip:0
> total:21

Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
optimized one-shot SHA-3 digest functions" are:

    Length (bytes)      Before            After
    ==============    ==========        ==========
         1               12 MB/s           12 MB/s
        16              211 MB/s           80 MB/s
        64              835 MB/s          785 MB/s
       127             1557 MB/s          812 MB/s
       128             1617 MB/s         1619 MB/s
       200             1457 MB/s         2319 MB/s
       256             1830 MB/s         2176 MB/s
       511             3035 MB/s         4881 MB/s
       512             3245 MB/s         4968 MB/s
      1024             5319 MB/s         7565 MB/s
      3173             9969 MB/s        11909 MB/s
      4096            11123 MB/s        10378 MB/s
     16384            12767 MB/s        12273 MB/s

Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.

I expected the most improvement on short lengths.  The fact that some of
the short lengths actually regressed is concerning.

It's also clear the the Keccak acceleration itself matters far more than
this additional one-shot optimization, as expected.  The generic code
maxed out at only 259 MB/s for you.

I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" for now, to avoid the extra maintainence cost
and opportunity for bugs.

If you can provide more accurate numbers that show it's worthwhile, we
can reconsider.  Maybe set the CPU to a fixed frequency, and run
sha3_kunit multiple times (triggered via KUnit's debugfs interface)?

- Eric

  reply	other threads:[~2025-11-04 18:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function Eric Biggers
2025-10-26  5:50 ` [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support Eric Biggers
2025-10-26  5:50 ` [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function Eric Biggers
2025-10-26  5:50 ` [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests Eric Biggers
2025-10-26  5:50 ` [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests Eric Biggers
2025-10-26  5:50 ` [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test Eric Biggers
2025-10-26  5:50 ` [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 " Eric Biggers
2025-10-26  5:50 ` [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation Eric Biggers
2025-10-26  5:50 ` [PATCH v2 14/15] crypto: sha3 - Reimplement using library API Eric Biggers
2025-10-26  5:50 ` [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code Eric Biggers
2025-10-29  9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
2025-10-29 16:32   ` Eric Biggers
2025-10-29 20:33     ` Eric Biggers
2025-10-30  8:11       ` Heiko Carstens
2025-10-30 10:16       ` Harald Freudenberger
2025-10-30 10:10     ` Harald Freudenberger
2025-10-30 17:14       ` Eric Biggers
2025-10-31 14:29         ` Harald Freudenberger
2025-11-04 11:07         ` Harald Freudenberger
2025-11-04 18:27           ` Eric Biggers [this message]
2025-11-05  8:16             ` Harald Freudenberger
2025-11-04 11:55         ` Harald Freudenberger
2025-10-30 14:08 ` Ard Biesheuvel
2025-11-03 17:34 ` Eric Biggers
2025-11-05 15:39   ` Harald Freudenberger
2025-11-06  4:33     ` Eric Biggers
2025-11-06  7:22       ` Eric Biggers
2025-11-06  8:54         ` Harald Freudenberger
2025-11-06 19:51           ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251104182738.GA2419@sol \
    --to=ebiggers@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=dengler@linux.ibm.com \
    --cc=dhowells@redhat.com \
    --cc=freude@linux.ibm.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).