* [PATCH] lib/crypto: gf128hash: mark clmul32() as noinline_for_stack
@ 2026-06-11 12:59 Arnd Bergmann
2026-06-11 20:06 ` Eric Biggers
0 siblings, 1 reply; 2+ messages in thread
From: Arnd Bergmann @ 2026-06-11 12:59 UTC (permalink / raw)
To: Eric Biggers, Jason A. Donenfeld, Ard Biesheuvel,
Nathan Chancellor
Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
linux-crypto, linux-kernel, llvm
From: Arnd Bergmann <arnd@arndb.de>
During randconfig testing, I came across a lot of warnings for the newly
added carryless multiplication function triggering excessive stack usage
from spilling temporary variables to the stack:
lib/crypto/gf128hash.c:166:1: error: stack frame size (1192) exceeds limit (1024) in 'polyval_mul_generic' [-Werror,-Wframe-larger-than]
In addition to the possible risk of overflowing the kernel stack,
the generated object code surely performs very poorly.
This only happens on architectures that don't provide uint128_t
(which should be all 32-bit architectures on modern compilers), but
though I tested random x86 and arm configs, I only saw this with arm's
CONFIG_THUMB2_KERNEL, which adds more pressure to the register allocator.
The testing was done using clang-22, I don't know if gcc has the same
problem. Marking clmul32() as noinline_for_stack experimentally shows
all of the affected builds to completely solve the problem, reducing
the stack usage to a few bytes as expected.
Since u64 arithmetic frequently leads to compilers badly optimizing
32-bit targets, keeping clmul32 out of line is likely to help on
other 32-bit configurations as well when they run into this problem,
though it may also result in a small performance degradation in
configurations that would benefit from inlining.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
---
lib/crypto/gf128hash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/crypto/gf128hash.c b/lib/crypto/gf128hash.c
index 2650603d8ba8..8dcdf5ec98be 100644
--- a/lib/crypto/gf128hash.c
+++ b/lib/crypto/gf128hash.c
@@ -109,7 +109,7 @@ static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi)
#else /* CONFIG_ARCH_SUPPORTS_INT128 */
/* Do a 32 x 32 => 64 bit carryless multiplication. */
-static u64 clmul32(u32 a, u32 b)
+static noinline_for_stack u64 clmul32(u32 a, u32 b)
{
/*
* With 32-bit multiplicands and one term every 4 bits, there are up to
--
2.39.5
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] lib/crypto: gf128hash: mark clmul32() as noinline_for_stack
2026-06-11 12:59 [PATCH] lib/crypto: gf128hash: mark clmul32() as noinline_for_stack Arnd Bergmann
@ 2026-06-11 20:06 ` Eric Biggers
0 siblings, 0 replies; 2+ messages in thread
From: Eric Biggers @ 2026-06-11 20:06 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Jason A. Donenfeld, Ard Biesheuvel, Nathan Chancellor,
Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
linux-crypto, linux-kernel, llvm
On Thu, Jun 11, 2026 at 02:59:39PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> During randconfig testing, I came across a lot of warnings for the newly
> added carryless multiplication function triggering excessive stack usage
> from spilling temporary variables to the stack:
>
> lib/crypto/gf128hash.c:166:1: error: stack frame size (1192) exceeds limit (1024) in 'polyval_mul_generic' [-Werror,-Wframe-larger-than]
>
> In addition to the possible risk of overflowing the kernel stack,
> the generated object code surely performs very poorly.
>
> This only happens on architectures that don't provide uint128_t
> (which should be all 32-bit architectures on modern compilers), but
> though I tested random x86 and arm configs, I only saw this with arm's
> CONFIG_THUMB2_KERNEL, which adds more pressure to the register allocator.
>
> The testing was done using clang-22, I don't know if gcc has the same
> problem. Marking clmul32() as noinline_for_stack experimentally shows
> all of the affected builds to completely solve the problem, reducing
> the stack usage to a few bytes as expected.
>
> Since u64 arithmetic frequently leads to compilers badly optimizing
> 32-bit targets, keeping clmul32 out of line is likely to help on
> other 32-bit configurations as well when they run into this problem,
> though it may also result in a small performance degradation in
> configurations that would benefit from inlining.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next
- Eric
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-11 20:06 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 12:59 [PATCH] lib/crypto: gf128hash: mark clmul32() as noinline_for_stack Arnd Bergmann
2026-06-11 20:06 ` Eric Biggers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox