From: David Laight <david.laight.linux@gmail.com>
To: "Arnd Bergmann" <arnd@arndb.de>
Cc: "Lukas Wunner" <lukas@wunner.de>,
"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
"Herbert Xu" <herbert@gondor.apana.org.au>,
"David S . Miller" <davem@davemloft.net>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Andrey Ryabinin" <ryabinin.a.a@gmail.com>,
"Ignat Korchagin" <ignat@linux.win>,
"Stefan Berger" <stefanb@linux.ibm.com>,
linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
kasan-dev@googlegroups.com,
"Alexander Potapenko" <glider@google.com>,
"Andrey Konovalov" <andreyknvl@gmail.com>,
"Dmitry Vyukov" <dvyukov@google.com>,
"Vincenzo Frascino" <vincenzo.frascino@arm.com>
Subject: Re: [PATCH] crypto: ecc - Unbreak the build on arm with CONFIG_KASAN_STACK=y
Date: Tue, 14 Apr 2026 11:26:00 +0100 [thread overview]
Message-ID: <20260414112600.553e7c44@pumpkin> (raw)
In-Reply-To: <d82181fe-a70d-4c64-a411-4bf80c51f58f@app.fastmail.com>
On Mon, 13 Apr 2026 22:32:24 +0200
"Arnd Bergmann" <arnd@arndb.de> wrote:
> On Mon, Apr 13, 2026, at 21:46, Lukas Wunner wrote:
> > On Mon, Apr 13, 2026 at 05:42:39PM +0200, Arnd Bergmann wrote:
> >> On Wed, Apr 8, 2026, at 15:36, Lukas Wunner wrote:
> >
> > Attached please find the Assembler output created by gcc -save-temps,
> > both the original version and the one with limited inlining.
> >
> > The former requires a 1360 bytes stack frame, the latter 1232 bytes.
> > E.g. xycz_initial_double() is not inlined into ecc_point_mult(),
> > together with all its recursive baggage, so the latter version
> > contains two branch instructions to that function which the former
> > (original) version does not contain.
>
> Thanks!
>
> So it indeed appears that the problem does not go away but only
> stays below the arbitrary threshold of 1280 bytes (which was
> recently raised). I would not trust that to actually be the
> case across all architectures then, as there are some targets
> like mips or parisc tend to use even more stack space than
> arm. With your current patch, that means there is a good chance
> the problem will come back later.
Not only that, the 'stack frome size' is just a proxy for total
stack use - which is a lot harder to calculate.
I've a cunning plan to use clangs function prototype hashing
to do a static stack calculation that includes indirect calls.
(I did one many years ago for some embedded code that had none.)
I suspect it will find all sorts of code paths that 'blow' the
kernel stack out of the water.
A good bet will be snprintf() calls in unusual error paths
(even after ignoring recursive snprintf() calls and all the %px
modifiers).
> > At the beginning of the function, it looks like the same register values
> > are stored to multiple locations on the stack. I assume that's what you
> > mean by awful code generation? This odd behavior seems more subdued in
> > the version with limited inlining.
>
> Right. As far as I can tell, the source code is heavily optimized
> for performance, but with the sanitizer active this would likely
> be several times slower, both from the actual sanitizing and
> from the register spilling. I can see how the use of 'u64'
> arrays makes this harder for a 32-bit target with limited
> available registers.
gcc make a right 'pigs breakfast' of handling u64 items on 32bit.
It gets really horrid on x86 (which has 8 registers including %sp
and %bp).
I got the impression it sometimes treats a u64 as being two 32bit
values, and other times as a 64bit value held in two registers.
The former tends to generate better code, but that latter happens
if an asm() block (or probably anything else) ends up with an 'A'
constraint for a value in %edx:%eax.
It will spill constant zero words to stack, and do multiplies by
values that are constant zero.
(I think the code generated for a single call to mul_64_64()
will show it all.)
I've just looked at that source.
It seems to be doing 'very wide' arithmetic using u64[].
That will be really horrid on 32bit - it needs to use u32[].
Stopping some of those function being inlined will help.
Even on 64bit I doubt it'll make that much difference to
overall performance.
David
>
> Arnd
>
next prev parent reply other threads:[~2026-04-14 10:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 6:15 [PATCH] crypto: ecc - Unbreak the build on arm with CONFIG_KASAN_STACK=y Lukas Wunner
2026-04-08 11:31 ` Andy Shevchenko
2026-04-08 13:36 ` Lukas Wunner
2026-04-08 14:32 ` Andy Shevchenko
2026-04-13 15:42 ` Arnd Bergmann
2026-04-13 19:46 ` Lukas Wunner
2026-04-13 20:32 ` Arnd Bergmann
2026-04-14 4:57 ` Lukas Wunner
2026-04-14 10:26 ` David Laight [this message]
2026-04-08 20:57 ` Nathan Chancellor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260414112600.553e7c44@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=arnd@arndb.de \
--cc=davem@davemloft.net \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=herbert@gondor.apana.org.au \
--cc=ignat@linux.win \
--cc=kasan-dev@googlegroups.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ryabinin.a.a@gmail.com \
--cc=stefanb@linux.ibm.com \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox