From: Eric Biggers <ebiggers@kernel.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: david laight <david.laight@runbox.com>,
Thorsten Blum <thorsten.blum@linux.dev>,
Ard Biesheuvel <ardb@kernel.org>,
linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/crypto: blake2b: Limit frame size workaround to GCC < 12.2 on i386
Date: Mon, 24 Nov 2025 14:40:28 -0800 [thread overview]
Message-ID: <20251124224028.GA1827@quark> (raw)
In-Reply-To: <CAHmME9o7rw=Hi9ykfU4GD6Jxzo6Q404FVGVkUDh+RCjr_-DadQ@mail.gmail.com>
On Mon, Nov 24, 2025 at 06:14:31PM +0100, Jason A. Donenfeld wrote:
> On Mon, Nov 24, 2025 at 10:08 AM david laight <david.laight@runbox.com> wrote:
> > > How about we roll up the BLAKE2b rounds loop if !CONFIG_64BIT?
> >
> > I do wonder about the real benefit of some of the massive loop unrolling
> > that happens in a lot of these algorithms (not just blake2b).
>
> I remember looking at this in the context of blake2s, with two paths,
> depending on CONFIG_CC_OPTIMIZE_FOR_SIZE, but the savings didn't seem
> enough for the performance hit. It might be platform specific though.
> I guess try it and post numbers, and that'll either be a compelling
> reason to adjust it or still "meh"?
Earlier I did some quick microbenchmarks with blake2b_kunit. The
existing unrolling does increase throughput by as much as 50%. It's
probably mostly due to inlining the blake2b_sigma constants.
However, the increased code size is a real issue that doesn't show up in
that microbenchmark. Naturally, it will be especially bad on 32-bit
CPUs, given that BLAKE2b works with 64-bit words. The 32-bit code gets
the code size blow-up from emulating the 64-bit arithmetic using 32-bit
instructions, in addition to the unrolling. Rolling up the rounds loop
when !CONFIG_64BIT seems like a reasonable first step.
We could consider rolling up the rounds loop even when CONFIG_64BIT. If
optimal BLAKE2b throughput was actually important on x86_64, we should
have an AVX optimized implementation anyway. But no one has ever cared
to add one. I think btrfs is the only user currently, but btrfs's use
case is non-cryptographic and it already supports much faster
non-cryptographic checksums (crc32c and xxhash64).
- Eric
next prev parent reply other threads:[~2025-11-24 22:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-22 10:55 [PATCH] lib/crypto: blake2b: Limit frame size workaround to GCC < 12.2 on i386 Thorsten Blum
2025-11-22 20:04 ` Eric Biggers
2025-11-22 23:23 ` Thorsten Blum
2025-11-23 1:55 ` kernel test robot
2025-11-23 9:28 ` david laight
2025-11-23 17:00 ` Thorsten Blum
2025-11-23 18:58 ` david laight
2025-11-23 20:26 ` Eric Biggers
2025-11-24 9:08 ` david laight
2025-11-24 17:14 ` Jason A. Donenfeld
2025-11-24 22:40 ` Eric Biggers [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-11-23 0:19 kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251124224028.GA1827@quark \
--to=ebiggers@kernel.org \
--cc=Jason@zx2c4.com \
--cc=ardb@kernel.org \
--cc=david.laight@runbox.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=thorsten.blum@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.