linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 00/12] crypto: sha256 - Use partial block API
       [not found] <cover.1745992998.git.herbert@gondor.apana.org.au>
@ 2025-04-30 17:45 ` Eric Biggers
  2025-05-01  1:21   ` Herbert Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Biggers @ 2025-04-30 17:45 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Crypto Mailing List, linux-kernel, linux-arch,
	linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv,
	sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld,
	Linus Torvalds

[Added back Cc's that were dropped]

On Wed, Apr 30, 2025 at 02:06:15PM +0800, Herbert Xu wrote:
> This is based on
> 
> 	https://patchwork.kernel.org/project/linux-crypto/list/?series=957785

I'm assuming that you mean that with your diff
https://lore.kernel.org/r/aBGdiv17ztQnhAps@gondor.apana.org.au folded into my
first patch, since otherwise your patch series doesn't apply.  But even with
that done, your patch series doesn't build:

    In file included from ./include/crypto/hash_info.h:12,
                     from crypto/hash_info.c:9:
    ./include/crypto/sha2.h: In function ‘sha256_init’:
    ./include/crypto/sha2.h:101:32: error: ‘struct sha256_state’ has no member named ‘ctx’
      101 |         sha256_block_init(&sctx->ctx);
          |                                ^~

> Rather than going through the lib/sha256 partial block handling,
> use the native shash partial block API.  Add two extra shash
> algorithms to provide testing coverage for lib/sha256.
> 
> Herbert Xu (12):
>   crypto: lib/sha256 - Restore lib_sha256 finup code
>   crypto: sha256 - Use the partial block API for generic
>   crypto: arm/sha256 - Add simd block function
>   crypto: arm64/sha256 - Add simd block function
>   crypto: mips/sha256 - Export block functions as GPL only
>   crypto: powerpc/sha256 - Export block functions as GPL only
>   crypto: riscv/sha256 - Add simd block function
>   crypto: s390/sha256 - Export block functions as GPL only
>   crypto: sparc/sha256 - Export block functions as GPL only
>   crypto: x86/sha256 - Add simd block function
>   crypto: lib/sha256 - Use generic block helper
>   crypto: sha256 - Use the partial block API
>
>  arch/arm/lib/crypto/Kconfig                   |   1 +
>  arch/arm/lib/crypto/sha256-armv4.pl           |  20 +--
>  arch/arm/lib/crypto/sha256.c                  |  16 +--
>  arch/arm64/crypto/sha512-glue.c               |   6 +-
>  arch/arm64/lib/crypto/Kconfig                 |   1 +
>  arch/arm64/lib/crypto/sha2-armv8.pl           |   2 +-
>  arch/arm64/lib/crypto/sha256.c                |  16 +--
>  .../mips/cavium-octeon/crypto/octeon-sha256.c |   4 +-
>  arch/powerpc/lib/crypto/sha256.c              |   4 +-
>  arch/riscv/lib/crypto/Kconfig                 |   1 +
>  arch/riscv/lib/crypto/sha256.c                |  17 ++-
>  arch/s390/lib/crypto/sha256.c                 |   4 +-
>  arch/sparc/lib/crypto/sha256.c                |   4 +-
>  arch/x86/lib/crypto/Kconfig                   |   1 +
>  arch/x86/lib/crypto/sha256.c                  |  16 ++-
>  crypto/sha256.c                               | 134 +++++++++++-------
>  include/crypto/internal/sha2.h                |  46 ++++++
>  include/crypto/sha2.h                         |  14 +-
>  lib/crypto/Kconfig                            |   8 ++
>  lib/crypto/sha256.c                           | 100 +++----------
>  20 files changed, 232 insertions(+), 183 deletions(-)

The EXPORT_SYMBOL => EXPORT_SYMBOL_GPL changes are fine and should just be one
patch.  I was just trying to be consistent with lib/crypto/sha256.c which uses
EXPORT_SYMBOL, but EXPORT_SYMBOL_GPL is fine too.

Everything else in this series is harmful, IMO.

I already covered why crypto_shash should simply use the library and not do
anything special.

As for your sha256_finup "optimization", it's an interesting idea, but
unfortunately it slightly slows down the common case which is count % 64 < 56,
due to the unnecessary copy to the stack and the following zeroization.  In the
uncommon case where count % 64 >= 56 you do get to pass nblocks=2 to
sha256_blocks_*(), but ultimately SHA-256 is serialized block-by-block anyway,
so it ends up being only slightly faster in that case, which again is the
uncommon case.  So while it's an interesting idea, it doesn't seem to actually
be better.  And the fact that that patch is also being used to submit unrelated,
more dubious changes isn't very helpful, of course.

- Eric


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/12] crypto: sha256 - Use partial block API
  2025-04-30 17:45 ` [PATCH 00/12] crypto: sha256 - Use partial block API Eric Biggers
@ 2025-05-01  1:21   ` Herbert Xu
  2025-05-01  2:26     ` Eric Biggers
  0 siblings, 1 reply; 4+ messages in thread
From: Herbert Xu @ 2025-05-01  1:21 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Crypto Mailing List, linux-kernel, linux-arch,
	linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv,
	sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld,
	Linus Torvalds

On Wed, Apr 30, 2025 at 10:45:43AM -0700, Eric Biggers wrote:
>
> As for your sha256_finup "optimization", it's an interesting idea, but
> unfortunately it slightly slows down the common case which is count % 64 < 56,
> due to the unnecessary copy to the stack and the following zeroization.  In the
> uncommon case where count % 64 >= 56 you do get to pass nblocks=2 to
> sha256_blocks_*(), but ultimately SHA-256 is serialized block-by-block anyway,
> so it ends up being only slightly faster in that case, which again is the
> uncommon case.  So while it's an interesting idea, it doesn't seem to actually
> be better.  And the fact that that patch is also being used to submit unrelated,
> more dubious changes isn't very helpful, of course.

I'm more than willing to change sha256_finup if you can prove it
with real numbers that it is worse than the single-block version.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/12] crypto: sha256 - Use partial block API
  2025-05-01  1:21   ` Herbert Xu
@ 2025-05-01  2:26     ` Eric Biggers
  2025-05-01  5:19       ` Herbert Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Biggers @ 2025-05-01  2:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Crypto Mailing List, linux-kernel, linux-arch,
	linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv,
	sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld,
	Linus Torvalds

On Thu, May 01, 2025 at 09:21:15AM +0800, Herbert Xu wrote:
> On Wed, Apr 30, 2025 at 10:45:43AM -0700, Eric Biggers wrote:
> >
> > As for your sha256_finup "optimization", it's an interesting idea, but
> > unfortunately it slightly slows down the common case which is count % 64 < 56,
> > due to the unnecessary copy to the stack and the following zeroization.  In the
> > uncommon case where count % 64 >= 56 you do get to pass nblocks=2 to
> > sha256_blocks_*(), but ultimately SHA-256 is serialized block-by-block anyway,
> > so it ends up being only slightly faster in that case, which again is the
> > uncommon case.  So while it's an interesting idea, it doesn't seem to actually
> > be better.  And the fact that that patch is also being used to submit unrelated,
> > more dubious changes isn't very helpful, of course.
> 
> I'm more than willing to change sha256_finup if you can prove it
> with real numbers that it is worse than the single-block version.

Interesting approach -- pushing out misguided optimizations without data, then
demanding data for them to be reverted.  It's obviously worse for
len % 64 < 56 for the reason I gave, so this is a waste of time IMO.  But since
you're insisting on data anyway, here are some quick benchmarks on AMD Zen 5
(not going to bother formatting into a table):

Before your finup "optimization":

 sha256(len=0): 145 cycles
 sha256(len=1): 146 cycles
 sha256(len=2): 146 cycles
 sha256(len=3): 146 cycles
 sha256(len=4): 146 cycles
 sha256(len=5): 146 cycles
 sha256(len=6): 146 cycles
 sha256(len=7): 146 cycles
 sha256(len=8): 151 cycles
 sha256(len=9): 148 cycles
 sha256(len=10): 148 cycles
 sha256(len=11): 148 cycles
 sha256(len=12): 148 cycles
 sha256(len=13): 148 cycles
 sha256(len=14): 148 cycles
 sha256(len=15): 149 cycles
 sha256(len=16): 149 cycles
 sha256(len=17): 148 cycles
 sha256(len=18): 148 cycles
 sha256(len=19): 148 cycles
 sha256(len=20): 148 cycles
 sha256(len=21): 148 cycles
 sha256(len=22): 148 cycles
 sha256(len=23): 148 cycles
 sha256(len=24): 148 cycles
 sha256(len=25): 148 cycles
 sha256(len=26): 148 cycles
 sha256(len=27): 148 cycles
 sha256(len=28): 148 cycles
 sha256(len=29): 148 cycles
 sha256(len=30): 148 cycles
 sha256(len=31): 148 cycles
 sha256(len=32): 151 cycles
 sha256(len=33): 148 cycles
 sha256(len=34): 148 cycles
 sha256(len=35): 148 cycles
 sha256(len=36): 148 cycles
 sha256(len=37): 148 cycles
 sha256(len=38): 148 cycles
 sha256(len=39): 148 cycles
 sha256(len=40): 148 cycles
 sha256(len=41): 148 cycles
 sha256(len=42): 148 cycles
 sha256(len=43): 148 cycles
 sha256(len=44): 148 cycles
 sha256(len=45): 148 cycles
 sha256(len=46): 150 cycles
 sha256(len=47): 149 cycles
 sha256(len=48): 147 cycles
 sha256(len=49): 147 cycles
 sha256(len=50): 147 cycles
 sha256(len=51): 147 cycles
 sha256(len=52): 147 cycles
 sha256(len=53): 147 cycles
 sha256(len=54): 147 cycles
 sha256(len=55): 148 cycles
 sha256(len=56): 278 cycles
 sha256(len=57): 278 cycles
 sha256(len=58): 278 cycles
 sha256(len=59): 278 cycles
 sha256(len=60): 277 cycles
 sha256(len=61): 277 cycles
 sha256(len=62): 277 cycles
 sha256(len=63): 276 cycles
 sha256(len=64): 276 cycles

After your finup "optimization":

 sha256(len=0): 188 cycles
 sha256(len=1): 190 cycles
 sha256(len=2): 190 cycles
 sha256(len=3): 190 cycles
 sha256(len=4): 189 cycles
 sha256(len=5): 189 cycles
 sha256(len=6): 189 cycles
 sha256(len=7): 190 cycles
 sha256(len=8): 187 cycles
 sha256(len=9): 188 cycles
 sha256(len=10): 188 cycles
 sha256(len=11): 188 cycles
 sha256(len=12): 189 cycles
 sha256(len=13): 189 cycles
 sha256(len=14): 188 cycles
 sha256(len=15): 189 cycles
 sha256(len=16): 189 cycles
 sha256(len=17): 190 cycles
 sha256(len=18): 190 cycles
 sha256(len=19): 190 cycles
 sha256(len=20): 190 cycles
 sha256(len=21): 190 cycles
 sha256(len=22): 190 cycles
 sha256(len=23): 190 cycles
 sha256(len=24): 191 cycles
 sha256(len=25): 191 cycles
 sha256(len=26): 191 cycles
 sha256(len=27): 191 cycles
 sha256(len=28): 191 cycles
 sha256(len=29): 192 cycles
 sha256(len=30): 191 cycles
 sha256(len=31): 191 cycles
 sha256(len=32): 191 cycles
 sha256(len=33): 191 cycles
 sha256(len=34): 191 cycles
 sha256(len=35): 191 cycles
 sha256(len=36): 192 cycles
 sha256(len=37): 192 cycles
 sha256(len=38): 192 cycles
 sha256(len=39): 191 cycles
 sha256(len=40): 191 cycles
 sha256(len=41): 194 cycles
 sha256(len=42): 193 cycles
 sha256(len=43): 193 cycles
 sha256(len=44): 193 cycles
 sha256(len=45): 193 cycles
 sha256(len=46): 194 cycles
 sha256(len=47): 194 cycles
 sha256(len=48): 193 cycles
 sha256(len=49): 195 cycles
 sha256(len=50): 195 cycles
 sha256(len=51): 196 cycles
 sha256(len=52): 196 cycles
 sha256(len=53): 195 cycles
 sha256(len=54): 195 cycles
 sha256(len=55): 195 cycles
 sha256(len=56): 297 cycles
 sha256(len=57): 297 cycles
 sha256(len=58): 297 cycles
 sha256(len=59): 297 cycles
 sha256(len=60): 297 cycles
 sha256(len=61): 297 cycles
 sha256(len=62): 297 cycles
 sha256(len=63): 297 cycles
 sha256(len=64): 292 cycles

So your "optimization" made it ~43 cycles slower for len % 64 < 56, or ~19
cycles slower for len % 64 >= 56.

As I said, it's from the overhead of unnecessarily copying the data onto the
stack and then having to zeroize it at the end.

- Eric


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/12] crypto: sha256 - Use partial block API
  2025-05-01  2:26     ` Eric Biggers
@ 2025-05-01  5:19       ` Herbert Xu
  0 siblings, 0 replies; 4+ messages in thread
From: Herbert Xu @ 2025-05-01  5:19 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Crypto Mailing List, linux-kernel, linux-arch,
	linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv,
	sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld,
	Linus Torvalds

On Wed, Apr 30, 2025 at 07:26:17PM -0700, Eric Biggers wrote:
>
> Interesting approach -- pushing out misguided optimizations without data, then
> demanding data for them to be reverted.  It's obviously worse for
> len % 64 < 56 for the reason I gave, so this is a waste of time IMO.  But since
> you're insisting on data anyway, here are some quick benchmarks on AMD Zen 5
> (not going to bother formatting into a table):
> 
> Before your finup "optimization":

Thanks, I'll revert to the single-block version.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-01  5:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1745992998.git.herbert@gondor.apana.org.au>
2025-04-30 17:45 ` [PATCH 00/12] crypto: sha256 - Use partial block API Eric Biggers
2025-05-01  1:21   ` Herbert Xu
2025-05-01  2:26     ` Eric Biggers
2025-05-01  5:19       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).