linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org,
	linux-s390@vger.kernel.org, x86@kernel.org,
	Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>
Subject: Re: [PATCH v4 08/13] crypto: s390/sha256 - implement library instead of shash
Date: Thu, 29 May 2025 14:16:39 -0700	[thread overview]
Message-ID: <20250529211639.GD23614@sol> (raw)
In-Reply-To: <CAHk-=whCp-nMWyLxAot4e6yVMCGANTUCWErGfvmwqNkEfTQ=Sw@mail.gmail.com>

On Thu, May 29, 2025 at 01:14:31PM -0700, Linus Torvalds wrote:
> On Thu, 29 May 2025 at 10:37, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Long-term, I'd like to find a clean way to consolidate the library code for each
> > algorithm into a single module.
> 
> No, while I think the current situation isn't great, I think the "make
> it one single module" is even worse.
> 
> For most architectures - including s390 - you end up being in the
> situation that these kinds of hw accelerated crypto things depend on
> some CPU capability, and aren't necessarily statically always
> available.
> 
> So these things end up having stupid extra overhead due to having some
> conditional.
> 
> That extra overhead is then in turn minimized with tricks like static
> branches, but that's all just just piling more ugly hacks on top
> because it picked a bad choice to begin with.
> 
> So what's the *right* thing to do?
> 
> The right thing to do is to just link the right routine in the first
> place, and *not* have static branch hackery at all. Because you didn't
> need it.
> 
> And we already do runtime linking at module loading time. So if it's a
> module, if the hardware acceleration doesn't exist, the module load
> should just fail, and the loader should go on to load the next option.

So using crc32c() + ext4 + x86 as an example (but SHA-256 would be very
similar), the current behavior is that ext4.ko depends on the crc32c_arch()
symbol.  That causes crc32-x86.ko to be loaded, which then depends on the
crc32c_base() symbol as a fallback, which causes crc32.ko to be loaded too.  My
idea is to consolidate the two crc32 modules into one (they always go together,
after all), keeping the same symbols.  The main challenge is just the current
directory structure.

Your suggestion sounds like: ext4.ko would depend on the crc32c() symbol, which
would be defined in *both* crc32-x86.ko and crc32.ko.  The module loader would
try to load crc32-x86.ko first.  If the CPU does not support any of the x86
accelerated CRC32 code, then loading that module would fail.  The module loader
would then load crc32.ko instead.

Does any of the infrastructure to handle "this symbol is in multiple modules and
they must be loaded in this particular order" actually exist, though?

And how do we avoid the issues the crypto API often has where the accelerated
modules don't get loaded, causing slow generic code to unnecessarily be used?

IMO this sounds questionable compared to just using static keys and/or branches,
which we'd need anyway to support the non-modular case.

> Not any silly "one module to rule them all" hackery that only results
> in worse code. Just a simple "if this module loads successfully,
> you'll link the optimal hw acceleration".
> 
> Now, the problem with this all is the *non*modular case.
> 
> For modules, we already have the optimal solution in the form of
> init-module error handling and runtime linking.
> 
> So I think the module case is "solved" (except the solution is not
> what we actually do).
> 
> For the non-module case, the problem is that "I linked this
> unconditionally, and now it turns out I run on hardware that doesn't
> have the capability to run this".
> 
> And that's when you need to do things like static_call_update() to
> basically do runtime re-linking of a static decision.
> 
> And currently we very much do this wrong. See how s390 and x86-64 (and
> presumably others) basically have the *exact* same problems, but they
> then mix static branches and static calls (in the case of x86-64) and
> just have non-optimal code in general.
> 
> What I think the generic code should do (for the built-in case) is just have
> 
>         DEFINE_STATIC_CALL(sha256_blocks_fn, sha256_blocks_generic);
> 
> and do
> 
>         static_call(sha256_blocks_fn)(args..);
> 
> and then architecture code can do the static_call_update() to set
> their optimal version.
> 
> And yeah, we'd presumably need multiple versions, since there's the
> whole "is simd usable" thing. Although maybe that's going away?

Moving the static_call into the generic code might make sense.  I don't think
it's a win in all cases currently, though.  Only x86 and PPC32 actually have a
real static_call implementation; everywhere else it's an indirect call which is
slower than a static branch.  Also, some arch code is just usable
unconditionally without any CPU feature check, e.g. the MIPS ChaCha code.  That
doesn't use (or need to use) a static call or branch at all.

Also, while the centralized static_call would *allow* for the generic code to be
loaded while the arch code is not, in the vast majority of cases that would be a
bug, not a feature.  The generic crypto infrastructure has that bug, and this
has caused a huge amount of pain over the years.  People have to go out of the
way to ensure that the arch-optimized crypto code gets loaded.  And they often
forget, resulting in the slow generic code being used unnecessarily...

Making the arch-optimized code be loaded through a direct symbol dependency
solves that problem.

- Eric

  reply	other threads:[~2025-05-29 21:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-28 17:00 [PATCH v4 00/13] Architecture-optimized SHA-256 library API Eric Biggers
2025-04-28 17:00 ` [PATCH v4 01/13] crypto: sha256 - support arch-optimized lib and expose through shash Eric Biggers
2025-04-30  3:48   ` Herbert Xu
2025-04-28 17:00 ` [PATCH v4 02/13] crypto: arm/sha256 - implement library instead of shash Eric Biggers
2025-04-28 17:00 ` [PATCH v4 03/13] crypto: arm64/sha256 - remove obsolete chunking logic Eric Biggers
2025-04-28 17:00 ` [PATCH v4 04/13] crypto: arm64/sha256 - implement library instead of shash Eric Biggers
2025-04-28 17:00 ` [PATCH v4 05/13] crypto: mips/sha256 " Eric Biggers
2025-04-28 17:00 ` [PATCH v4 06/13] crypto: powerpc/sha256 " Eric Biggers
2025-04-28 17:00 ` [PATCH v4 07/13] crypto: riscv/sha256 " Eric Biggers
2025-05-08 17:45   ` Palmer Dabbelt
2025-05-08 18:06     ` Eric Biggers
2025-04-28 17:00 ` [PATCH v4 08/13] crypto: s390/sha256 " Eric Biggers
2025-05-29 17:05   ` Alex Williamson
2025-05-29 17:37     ` Eric Biggers
2025-05-29 19:00       ` Eric Biggers
2025-05-29 20:14       ` Linus Torvalds
2025-05-29 21:16         ` Eric Biggers [this message]
2025-05-29 23:54           ` Linus Torvalds
2025-05-30  0:18             ` Eric Biggers
2025-06-01 23:00               ` Eric Biggers
2025-06-02 14:45                 ` Linus Torvalds
2025-04-28 17:00 ` [PATCH v4 09/13] crypto: sparc - move opcodes.h into asm directory Eric Biggers
2025-04-28 17:00 ` [PATCH v4 10/13] crypto: sparc/sha256 - implement library instead of shash Eric Biggers
2025-04-28 17:00 ` [PATCH v4 11/13] crypto: x86/sha256 " Eric Biggers
2025-04-28 17:00 ` [PATCH v4 12/13] crypto: sha256 - remove sha256_base.h Eric Biggers
2025-04-28 17:00 ` [PATCH v4 13/13] crypto: lib/sha256 - improve function prototypes Eric Biggers
2025-05-05 12:24 ` [PATCH v4 00/13] Architecture-optimized SHA-256 library API Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250529211639.GD23614@sol \
    --to=ebiggers@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=alex.williamson@redhat.com \
    --cc=ardb@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).