[PATCH] lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Thorsten Blum <thorsten.blum@linux.dev>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Bill Wendling <morbo@google.com>,
	Justin Stitt <justinstitt@google.com>,
	David Laight <david.laight@runbox.com>,
	David Sterba <dsterba@suse.com>,
	llvm@lists.linux.dev, linux-btrfs@vger.kernel.org,
	Eric Biggers <ebiggers@kernel.org>
Subject: [PATCH] lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit
Date: Wed,  3 Dec 2025 11:06:52 -0800	[thread overview]
Message-ID: <20251203190652.144076-1-ebiggers@kernel.org> (raw)

BLAKE2b has a state of 16 64-bit words.  Add the message data in and
there are 32 64-bit words.  With the current code where all the rounds
are unrolled to enable constant-folding of the blake2b_sigma values,
this results in a very large code size on 32-bit kernels, including a
recurring issue where gcc uses a large amount of stack.

There's just not much benefit to this unrolling when the code is already
so large.  Let's roll up the rounds when !CONFIG_64BIT.  Then, remove
the now-unnecessary override of the stack frame size warning.

Code size improvements for blake2b_compress_generic():

                  Size before (bytes)    Size after (bytes)
                  -------------------    ------------------
    i386, gcc           27584                 3632
    i386, clang         18208                 3248
    arm32, gcc          19912                 2860
    arm32, clang        21336                 3344

Running the BLAKE2b benchmark on a !CONFIG_64BIT kernel on an x86_64
processor shows a 16384B throughput change of 351 => 340 MB/s (gcc) or
442 MB/s => 375 MB/s (clang).  So clearly not much of a slowdown either.
But also that microbenchmark also effectively disregards cache usage,
which is important in practice and is far better in the smaller code.

Note: If we rolled up the loop on x86_64 too, the change would be
7024 bytes => 1584 bytes and 1960 MB/s => 1396 MB/s (gcc), or
6848 bytes => 1696 bytes and 1920 MB/s => 1263 MB/s (clang).
Maybe still worth it, though not quite as clearly beneficial.

Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 lib/crypto/Makefile  |  1 -
 lib/crypto/blake2b.c | 25 +++++++++++++------------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile
index b5346cebbb55..330ab65b29c4 100644
--- a/lib/crypto/Makefile
+++ b/lib/crypto/Makefile
@@ -31,11 +31,10 @@ obj-$(CONFIG_CRYPTO_LIB_GF128MUL)		+= gf128mul.o
 
 ################################################################################
 
 obj-$(CONFIG_CRYPTO_LIB_BLAKE2B) += libblake2b.o
 libblake2b-y := blake2b.o
-CFLAGS_blake2b.o := -Wframe-larger-than=4096 #  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
 ifeq ($(CONFIG_CRYPTO_LIB_BLAKE2B_ARCH),y)
 CFLAGS_blake2b.o += -I$(src)/$(SRCARCH)
 libblake2b-$(CONFIG_ARM) += arm/blake2b-neon-core.o
 endif # CONFIG_CRYPTO_LIB_BLAKE2B_ARCH
 
diff --git a/lib/crypto/blake2b.c b/lib/crypto/blake2b.c
index 09c6d65d8a6e..923cda5c7603 100644
--- a/lib/crypto/blake2b.c
+++ b/lib/crypto/blake2b.c
@@ -12,10 +12,11 @@
 #include <linux/bug.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/string.h>
+#include <linux/unroll.h>
 #include <linux/types.h>
 
 static const u8 blake2b_sigma[12][16] = {
 	{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
 	{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 },
@@ -81,22 +82,22 @@ blake2b_compress_generic(struct blake2b_ctx *ctx,
 	G(r, 4, v[0], v[ 5], v[10], v[15]); \
 	G(r, 5, v[1], v[ 6], v[11], v[12]); \
 	G(r, 6, v[2], v[ 7], v[ 8], v[13]); \
 	G(r, 7, v[3], v[ 4], v[ 9], v[14]); \
 } while (0)
-		ROUND(0);
-		ROUND(1);
-		ROUND(2);
-		ROUND(3);
-		ROUND(4);
-		ROUND(5);
-		ROUND(6);
-		ROUND(7);
-		ROUND(8);
-		ROUND(9);
-		ROUND(10);
-		ROUND(11);
+
+#ifdef CONFIG_64BIT
+		/*
+		 * Unroll the rounds loop to enable constant-folding of the
+		 * blake2b_sigma values.  Seems worthwhile on 64-bit kernels.
+		 * Not worthwhile on 32-bit kernels because the code size is
+		 * already so large there due to BLAKE2b using 64-bit words.
+		 */
+		unrolled_full
+#endif
+		for (int r = 0; r < 12; r++)
+			ROUND(r);
 
 #undef G
 #undef ROUND
 
 		for (i = 0; i < 8; ++i)

base-commit: 3f9f0252130e7dd60d41be0802bf58f6471c691d
-- 
2.52.0

next             reply	other threads:[~2025-12-03 19:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03 19:06 Eric Biggers [this message]
2025-12-04  9:05 ` [PATCH] lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit Ard Biesheuvel
2025-12-04 17:56 ` Jason A. Donenfeld
2025-12-05  4:58   ` Eric Biggers
2025-12-05 14:16 ` david laight
2025-12-05 20:14   ` Eric Biggers
2025-12-05 22:04     ` david laight
2025-12-26 20:24     ` david laight

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b5346cebbb5 dfblob:330ab65b29c dfblob:09c6d65d8a6
dfblob:923cda5c760 )
 OR (
bs:"[PATCH] lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251203190652.144076-1-ebiggers@kernel.org \
    --to=ebiggers@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=david.laight@runbox.com \
    --cc=dsterba@suse.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=justinstitt@google.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=thorsten.blum@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.