linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Harald Freudenberger <freude@linux.ibm.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org, David Howells <dhowells@redhat.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Holger Dengler <dengler@linux.ibm.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 00/15] SHA-3 library
Date: Wed, 05 Nov 2025 09:16:56 +0100	[thread overview]
Message-ID: <70461134f12796b1166978c8628b5cf3@linux.ibm.com> (raw)
In-Reply-To: <20251104182738.GA2419@sol>

On 2025-11-04 19:27, Eric Biggers wrote:
> On Tue, Nov 04, 2025 at 12:07:40PM +0100, Harald Freudenberger wrote:
>> > Thanks!  Is this with the whole series applied?  Those numbers are
>> > pretty fast, so probably at least the Keccak acceleration part is
>> > worthwhile.  But just to reiterate what I asked for:
>> >
>> >     Also, it would be helpful to provide the benchmark output from just
>> >     before "lib/crypto: s390/sha3: Add optimized Keccak function", just
>> >     after it, and after "lib/crypto: s390/sha3: Add optimized one-shot
>> >     SHA-3 digest functions".
>> >
>> > So I'd like to see how much each change helped, which isn't clear if you
>> > show only the result at the end.
>> >
>> > If there's still no evidence that "lib/crypto: s390/sha3: Add optimized
>> > one-shot SHA-3 digest functions" actually helps significantly vs. simply
>> > doing the Keccak acceleration, then we should drop it for simplicity.
> [...]
>> commit b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot 
>> SHA-3
>> digest functions:
>> 
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 80
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 785
>> MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 812 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1619 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 2319 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 2176 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 4881 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 4968 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 7565 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 11909 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 10378 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12273 MB/s
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:50:50 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
>> 
>> commit 02266b8a383e lib/crypto: s390/sha3: Add optimized Keccak 
>> functions:
>> 
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     # module: sha3_kunit
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     1..21
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 1 
>> test_hash_test_vectors
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 2
>> test_hash_all_lens_up_to_4096
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 3
>> test_hash_incremental_updates
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 4
>> test_hash_buffer_overruns
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 5 test_hash_overlaps
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 6
>> test_hash_alignment_consistency
>> Nov 04 10:55:37 b3545008.lnxne.boe kernel:     ok 7
>> test_hash_ctx_zeroization
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 8
>> test_hash_interrupt_context_1
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 9
>> test_hash_interrupt_context_2
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 10 
>> test_sha3_224_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 11 
>> test_sha3_256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 12 
>> test_sha3_384_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 13 
>> test_sha3_512_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 14 
>> test_shake128_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 15 
>> test_shake256_basic
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 16 
>> test_shake128_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 17 
>> test_shake256_nist
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 18
>> test_shake_all_lens_up_to_4096
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 19
>> test_shake_multiple_squeezes
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 20
>> test_shake_with_guarded_bufs
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1: 12
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16: 211
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=64: 835
>> MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=127:
>> 1557 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=128:
>> 1617 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=200:
>> 1457 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=256:
>> 1830 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=511:
>> 3035 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=512:
>> 3245 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=1024:
>> 5319 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=3173:
>> 9969 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=4096:
>> 11123 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     # benchmark_hash: 
>> len=16384:
>> 12767 MB/s
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel:     ok 21 benchmark_hash
>> Nov 04 10:55:38 b3545008.lnxne.boe kernel: # sha3: pass:21 fail:0 
>> skip:0
>> total:21
> 
> Thanks.  So the results before and after "lib/crypto: s390/sha3: Add
> optimized one-shot SHA-3 digest functions" are:
> 
>     Length (bytes)      Before            After
>     ==============    ==========        ==========
>          1               12 MB/s           12 MB/s
>         16              211 MB/s           80 MB/s
>         64              835 MB/s          785 MB/s
>        127             1557 MB/s          812 MB/s
>        128             1617 MB/s         1619 MB/s
>        200             1457 MB/s         2319 MB/s
>        256             1830 MB/s         2176 MB/s
>        511             3035 MB/s         4881 MB/s
>        512             3245 MB/s         4968 MB/s
>       1024             5319 MB/s         7565 MB/s
>       3173             9969 MB/s        11909 MB/s
>       4096            11123 MB/s        10378 MB/s
>      16384            12767 MB/s        12273 MB/s
> 
> Unfortunately that seems inconclusive.  len=200, 256, 511, 512, 1024,
> 3173 improved.  But len=16, 64, 127, 4096, 16384 regressed.
> 
> I expected the most improvement on short lengths.  The fact that some 
> of
> the short lengths actually regressed is concerning.
> 
> It's also clear the the Keccak acceleration itself matters far more 
> than
> this additional one-shot optimization, as expected.  The generic code
> maxed out at only 259 MB/s for you.
> 
> I suggest we hold off on "lib/crypto: s390/sha3: Add optimized one-shot
> SHA-3 digest functions" for now, to avoid the extra maintainence cost
> and opportunity for bugs.
> 
> If you can provide more accurate numbers that show it's worthwhile, we
> can reconsider.  Maybe set the CPU to a fixed frequency, and run
> sha3_kunit multiple times (triggered via KUnit's debugfs interface)?
> 
> - Eric

The focus should be on the small data. Let me see what I can do ...
I used a zVM guest for this. Instead use an LPAR may be an option and
some CPU pinning. And do some more tests to be able to calculate a gauss
distribution. However, not within the next few days.
So I agree with you: let's hold back the one-shot optimization.

Harald Freudenberger


  reply	other threads:[~2025-11-05  8:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-26  5:50 [PATCH v2 00/15] SHA-3 library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 01/15] crypto: s390/sha3 - Rename conflicting functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 02/15] crypto: arm64/sha3 - Rename conflicting function Eric Biggers
2025-10-26  5:50 ` [PATCH v2 03/15] lib/crypto: sha3: Add SHA-3 support Eric Biggers
2025-10-26  5:50 ` [PATCH v2 04/15] lib/crypto: sha3: Move SHA3 Iota step mapping into round function Eric Biggers
2025-10-26  5:50 ` [PATCH v2 05/15] lib/crypto: tests: Add SHA3 kunit tests Eric Biggers
2025-10-26  5:50 ` [PATCH v2 06/15] lib/crypto: tests: Add additional SHAKE tests Eric Biggers
2025-10-26  5:50 ` [PATCH v2 07/15] lib/crypto: sha3: Add FIPS cryptographic algorithm self-test Eric Biggers
2025-10-26  5:50 ` [PATCH v2 08/15] crypto: arm64/sha3 - Update sha3_ce_transform() to prepare for library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 09/15] lib/crypto: arm64/sha3: Migrate optimized code into library Eric Biggers
2025-10-26  5:50 ` [PATCH v2 10/15] lib/crypto: s390/sha3: Add optimized Keccak functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 11/15] lib/crypto: sha3: Support arch overrides of one-shot digest functions Eric Biggers
2025-10-26  5:50 ` [PATCH v2 12/15] lib/crypto: s390/sha3: Add optimized one-shot SHA-3 " Eric Biggers
2025-10-26  5:50 ` [PATCH v2 13/15] crypto: jitterentropy - Use default sha3 implementation Eric Biggers
2025-10-26  5:50 ` [PATCH v2 14/15] crypto: sha3 - Reimplement using library API Eric Biggers
2025-10-26  5:50 ` [PATCH v2 15/15] crypto: s390/sha3 - Remove superseded SHA-3 code Eric Biggers
2025-10-29  9:30 ` [PATCH v2 00/15] SHA-3 library Harald Freudenberger
2025-10-29 16:32   ` Eric Biggers
2025-10-29 20:33     ` Eric Biggers
2025-10-30  8:11       ` Heiko Carstens
2025-10-30 10:16       ` Harald Freudenberger
2025-10-30 10:10     ` Harald Freudenberger
2025-10-30 17:14       ` Eric Biggers
2025-10-31 14:29         ` Harald Freudenberger
2025-11-04 11:07         ` Harald Freudenberger
2025-11-04 18:27           ` Eric Biggers
2025-11-05  8:16             ` Harald Freudenberger [this message]
2025-11-04 11:55         ` Harald Freudenberger
2025-10-30 14:08 ` Ard Biesheuvel
2025-11-03 17:34 ` Eric Biggers
     [not found]   ` <4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com>
2025-11-06  4:33     ` Eric Biggers
2025-11-06  7:22       ` Eric Biggers
2025-11-06  8:54         ` Harald Freudenberger
2025-11-06 19:51           ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70461134f12796b1166978c8628b5cf3@linux.ibm.com \
    --to=freude@linux.ibm.com \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=dengler@linux.ibm.com \
    --cc=dhowells@redhat.com \
    --cc=ebiggers@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).