* Re: [PATCH 00/12] crypto: sha256 - Use partial block API [not found] <cover.1745992998.git.herbert@gondor.apana.org.au> @ 2025-04-30 17:45 ` Eric Biggers 2025-05-01 1:21 ` Herbert Xu 0 siblings, 1 reply; 4+ messages in thread From: Eric Biggers @ 2025-04-30 17:45 UTC (permalink / raw) To: Herbert Xu Cc: Linux Crypto Mailing List, linux-kernel, linux-arch, linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv, sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld, Linus Torvalds [Added back Cc's that were dropped] On Wed, Apr 30, 2025 at 02:06:15PM +0800, Herbert Xu wrote: > This is based on > > https://patchwork.kernel.org/project/linux-crypto/list/?series=957785 I'm assuming that you mean that with your diff https://lore.kernel.org/r/aBGdiv17ztQnhAps@gondor.apana.org.au folded into my first patch, since otherwise your patch series doesn't apply. But even with that done, your patch series doesn't build: In file included from ./include/crypto/hash_info.h:12, from crypto/hash_info.c:9: ./include/crypto/sha2.h: In function ‘sha256_init’: ./include/crypto/sha2.h:101:32: error: ‘struct sha256_state’ has no member named ‘ctx’ 101 | sha256_block_init(&sctx->ctx); | ^~ > Rather than going through the lib/sha256 partial block handling, > use the native shash partial block API. Add two extra shash > algorithms to provide testing coverage for lib/sha256. > > Herbert Xu (12): > crypto: lib/sha256 - Restore lib_sha256 finup code > crypto: sha256 - Use the partial block API for generic > crypto: arm/sha256 - Add simd block function > crypto: arm64/sha256 - Add simd block function > crypto: mips/sha256 - Export block functions as GPL only > crypto: powerpc/sha256 - Export block functions as GPL only > crypto: riscv/sha256 - Add simd block function > crypto: s390/sha256 - Export block functions as GPL only > crypto: sparc/sha256 - Export block functions as GPL only > crypto: x86/sha256 - Add simd block function > crypto: lib/sha256 - Use generic block helper > crypto: sha256 - Use the partial block API > > arch/arm/lib/crypto/Kconfig | 1 + > arch/arm/lib/crypto/sha256-armv4.pl | 20 +-- > arch/arm/lib/crypto/sha256.c | 16 +-- > arch/arm64/crypto/sha512-glue.c | 6 +- > arch/arm64/lib/crypto/Kconfig | 1 + > arch/arm64/lib/crypto/sha2-armv8.pl | 2 +- > arch/arm64/lib/crypto/sha256.c | 16 +-- > .../mips/cavium-octeon/crypto/octeon-sha256.c | 4 +- > arch/powerpc/lib/crypto/sha256.c | 4 +- > arch/riscv/lib/crypto/Kconfig | 1 + > arch/riscv/lib/crypto/sha256.c | 17 ++- > arch/s390/lib/crypto/sha256.c | 4 +- > arch/sparc/lib/crypto/sha256.c | 4 +- > arch/x86/lib/crypto/Kconfig | 1 + > arch/x86/lib/crypto/sha256.c | 16 ++- > crypto/sha256.c | 134 +++++++++++------- > include/crypto/internal/sha2.h | 46 ++++++ > include/crypto/sha2.h | 14 +- > lib/crypto/Kconfig | 8 ++ > lib/crypto/sha256.c | 100 +++---------- > 20 files changed, 232 insertions(+), 183 deletions(-) The EXPORT_SYMBOL => EXPORT_SYMBOL_GPL changes are fine and should just be one patch. I was just trying to be consistent with lib/crypto/sha256.c which uses EXPORT_SYMBOL, but EXPORT_SYMBOL_GPL is fine too. Everything else in this series is harmful, IMO. I already covered why crypto_shash should simply use the library and not do anything special. As for your sha256_finup "optimization", it's an interesting idea, but unfortunately it slightly slows down the common case which is count % 64 < 56, due to the unnecessary copy to the stack and the following zeroization. In the uncommon case where count % 64 >= 56 you do get to pass nblocks=2 to sha256_blocks_*(), but ultimately SHA-256 is serialized block-by-block anyway, so it ends up being only slightly faster in that case, which again is the uncommon case. So while it's an interesting idea, it doesn't seem to actually be better. And the fact that that patch is also being used to submit unrelated, more dubious changes isn't very helpful, of course. - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 00/12] crypto: sha256 - Use partial block API 2025-04-30 17:45 ` [PATCH 00/12] crypto: sha256 - Use partial block API Eric Biggers @ 2025-05-01 1:21 ` Herbert Xu 2025-05-01 2:26 ` Eric Biggers 0 siblings, 1 reply; 4+ messages in thread From: Herbert Xu @ 2025-05-01 1:21 UTC (permalink / raw) To: Eric Biggers Cc: Linux Crypto Mailing List, linux-kernel, linux-arch, linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv, sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld, Linus Torvalds On Wed, Apr 30, 2025 at 10:45:43AM -0700, Eric Biggers wrote: > > As for your sha256_finup "optimization", it's an interesting idea, but > unfortunately it slightly slows down the common case which is count % 64 < 56, > due to the unnecessary copy to the stack and the following zeroization. In the > uncommon case where count % 64 >= 56 you do get to pass nblocks=2 to > sha256_blocks_*(), but ultimately SHA-256 is serialized block-by-block anyway, > so it ends up being only slightly faster in that case, which again is the > uncommon case. So while it's an interesting idea, it doesn't seem to actually > be better. And the fact that that patch is also being used to submit unrelated, > more dubious changes isn't very helpful, of course. I'm more than willing to change sha256_finup if you can prove it with real numbers that it is worse than the single-block version. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 00/12] crypto: sha256 - Use partial block API 2025-05-01 1:21 ` Herbert Xu @ 2025-05-01 2:26 ` Eric Biggers 2025-05-01 5:19 ` Herbert Xu 0 siblings, 1 reply; 4+ messages in thread From: Eric Biggers @ 2025-05-01 2:26 UTC (permalink / raw) To: Herbert Xu Cc: Linux Crypto Mailing List, linux-kernel, linux-arch, linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv, sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld, Linus Torvalds On Thu, May 01, 2025 at 09:21:15AM +0800, Herbert Xu wrote: > On Wed, Apr 30, 2025 at 10:45:43AM -0700, Eric Biggers wrote: > > > > As for your sha256_finup "optimization", it's an interesting idea, but > > unfortunately it slightly slows down the common case which is count % 64 < 56, > > due to the unnecessary copy to the stack and the following zeroization. In the > > uncommon case where count % 64 >= 56 you do get to pass nblocks=2 to > > sha256_blocks_*(), but ultimately SHA-256 is serialized block-by-block anyway, > > so it ends up being only slightly faster in that case, which again is the > > uncommon case. So while it's an interesting idea, it doesn't seem to actually > > be better. And the fact that that patch is also being used to submit unrelated, > > more dubious changes isn't very helpful, of course. > > I'm more than willing to change sha256_finup if you can prove it > with real numbers that it is worse than the single-block version. Interesting approach -- pushing out misguided optimizations without data, then demanding data for them to be reverted. It's obviously worse for len % 64 < 56 for the reason I gave, so this is a waste of time IMO. But since you're insisting on data anyway, here are some quick benchmarks on AMD Zen 5 (not going to bother formatting into a table): Before your finup "optimization": sha256(len=0): 145 cycles sha256(len=1): 146 cycles sha256(len=2): 146 cycles sha256(len=3): 146 cycles sha256(len=4): 146 cycles sha256(len=5): 146 cycles sha256(len=6): 146 cycles sha256(len=7): 146 cycles sha256(len=8): 151 cycles sha256(len=9): 148 cycles sha256(len=10): 148 cycles sha256(len=11): 148 cycles sha256(len=12): 148 cycles sha256(len=13): 148 cycles sha256(len=14): 148 cycles sha256(len=15): 149 cycles sha256(len=16): 149 cycles sha256(len=17): 148 cycles sha256(len=18): 148 cycles sha256(len=19): 148 cycles sha256(len=20): 148 cycles sha256(len=21): 148 cycles sha256(len=22): 148 cycles sha256(len=23): 148 cycles sha256(len=24): 148 cycles sha256(len=25): 148 cycles sha256(len=26): 148 cycles sha256(len=27): 148 cycles sha256(len=28): 148 cycles sha256(len=29): 148 cycles sha256(len=30): 148 cycles sha256(len=31): 148 cycles sha256(len=32): 151 cycles sha256(len=33): 148 cycles sha256(len=34): 148 cycles sha256(len=35): 148 cycles sha256(len=36): 148 cycles sha256(len=37): 148 cycles sha256(len=38): 148 cycles sha256(len=39): 148 cycles sha256(len=40): 148 cycles sha256(len=41): 148 cycles sha256(len=42): 148 cycles sha256(len=43): 148 cycles sha256(len=44): 148 cycles sha256(len=45): 148 cycles sha256(len=46): 150 cycles sha256(len=47): 149 cycles sha256(len=48): 147 cycles sha256(len=49): 147 cycles sha256(len=50): 147 cycles sha256(len=51): 147 cycles sha256(len=52): 147 cycles sha256(len=53): 147 cycles sha256(len=54): 147 cycles sha256(len=55): 148 cycles sha256(len=56): 278 cycles sha256(len=57): 278 cycles sha256(len=58): 278 cycles sha256(len=59): 278 cycles sha256(len=60): 277 cycles sha256(len=61): 277 cycles sha256(len=62): 277 cycles sha256(len=63): 276 cycles sha256(len=64): 276 cycles After your finup "optimization": sha256(len=0): 188 cycles sha256(len=1): 190 cycles sha256(len=2): 190 cycles sha256(len=3): 190 cycles sha256(len=4): 189 cycles sha256(len=5): 189 cycles sha256(len=6): 189 cycles sha256(len=7): 190 cycles sha256(len=8): 187 cycles sha256(len=9): 188 cycles sha256(len=10): 188 cycles sha256(len=11): 188 cycles sha256(len=12): 189 cycles sha256(len=13): 189 cycles sha256(len=14): 188 cycles sha256(len=15): 189 cycles sha256(len=16): 189 cycles sha256(len=17): 190 cycles sha256(len=18): 190 cycles sha256(len=19): 190 cycles sha256(len=20): 190 cycles sha256(len=21): 190 cycles sha256(len=22): 190 cycles sha256(len=23): 190 cycles sha256(len=24): 191 cycles sha256(len=25): 191 cycles sha256(len=26): 191 cycles sha256(len=27): 191 cycles sha256(len=28): 191 cycles sha256(len=29): 192 cycles sha256(len=30): 191 cycles sha256(len=31): 191 cycles sha256(len=32): 191 cycles sha256(len=33): 191 cycles sha256(len=34): 191 cycles sha256(len=35): 191 cycles sha256(len=36): 192 cycles sha256(len=37): 192 cycles sha256(len=38): 192 cycles sha256(len=39): 191 cycles sha256(len=40): 191 cycles sha256(len=41): 194 cycles sha256(len=42): 193 cycles sha256(len=43): 193 cycles sha256(len=44): 193 cycles sha256(len=45): 193 cycles sha256(len=46): 194 cycles sha256(len=47): 194 cycles sha256(len=48): 193 cycles sha256(len=49): 195 cycles sha256(len=50): 195 cycles sha256(len=51): 196 cycles sha256(len=52): 196 cycles sha256(len=53): 195 cycles sha256(len=54): 195 cycles sha256(len=55): 195 cycles sha256(len=56): 297 cycles sha256(len=57): 297 cycles sha256(len=58): 297 cycles sha256(len=59): 297 cycles sha256(len=60): 297 cycles sha256(len=61): 297 cycles sha256(len=62): 297 cycles sha256(len=63): 297 cycles sha256(len=64): 292 cycles So your "optimization" made it ~43 cycles slower for len % 64 < 56, or ~19 cycles slower for len % 64 >= 56. As I said, it's from the overhead of unnecessarily copying the data onto the stack and then having to zeroize it at the end. - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 00/12] crypto: sha256 - Use partial block API 2025-05-01 2:26 ` Eric Biggers @ 2025-05-01 5:19 ` Herbert Xu 0 siblings, 0 replies; 4+ messages in thread From: Herbert Xu @ 2025-05-01 5:19 UTC (permalink / raw) To: Eric Biggers Cc: Linux Crypto Mailing List, linux-kernel, linux-arch, linux-arm-kernel, linux-mips, linuxppc-dev, linux-riscv, sparclinux, linux-s390, x86, Ard Biesheuvel, Jason A . Donenfeld, Linus Torvalds On Wed, Apr 30, 2025 at 07:26:17PM -0700, Eric Biggers wrote: > > Interesting approach -- pushing out misguided optimizations without data, then > demanding data for them to be reverted. It's obviously worse for > len % 64 < 56 for the reason I gave, so this is a waste of time IMO. But since > you're insisting on data anyway, here are some quick benchmarks on AMD Zen 5 > (not going to bother formatting into a table): > > Before your finup "optimization": Thanks, I'll revert to the single-block version. -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-01 5:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1745992998.git.herbert@gondor.apana.org.au>
2025-04-30 17:45 ` [PATCH 00/12] crypto: sha256 - Use partial block API Eric Biggers
2025-05-01 1:21 ` Herbert Xu
2025-05-01 2:26 ` Eric Biggers
2025-05-01 5:19 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).