* Which, if any, of the async crypto drivers are ever useful in the real world?
@ 2026-05-18 10:11 Demi Marie Obenour
2026-05-19 20:36 ` Simon Richter
0 siblings, 1 reply; 3+ messages in thread
From: Demi Marie Obenour @ 2026-05-18 10:11 UTC (permalink / raw)
To: linux-crypto; +Cc: Eric Biggers, Herbert Xu
[-- Attachment #1.1.1: Type: text/plain, Size: 997 bytes --]
Looking at mailing list archives, I've yet to see real-world benchmarks
showing that the async crypto drivers actually help, outside of:
1. Hashing or encrypting a big file.
2. Situations where the kernel doesn't have the keys.
The first is not something I expect to be performance-critical.
The second is something I have yet to see outside of phmac on IBM
mainframes.
When WireGuard was being merged, there was a strong argument that
the Inside Secure driver (now Rambus) really does help bulk network
encryption. But I didn't see specific benchmarks, only rough numbers.
I plan to (someday) send a patch that makes drivers inaccessible via
AF_ALG. This means that the drivers will only be useable in-kernel.
And more and more in-kernel uses are moving to lib/crypto/ instead.
Is it really *always* better to do the cryptography inline or on the
CPU? If so, then the async drivers are pointless and can be removed.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Which, if any, of the async crypto drivers are ever useful in the real world?
2026-05-18 10:11 Which, if any, of the async crypto drivers are ever useful in the real world? Demi Marie Obenour
@ 2026-05-19 20:36 ` Simon Richter
2026-05-19 21:07 ` Eric Biggers
0 siblings, 1 reply; 3+ messages in thread
From: Simon Richter @ 2026-05-19 20:36 UTC (permalink / raw)
To: Demi Marie Obenour, linux-crypto; +Cc: Eric Biggers, Herbert Xu
[-- Attachment #1.1: Type: text/plain, Size: 2587 bytes --]
Hi,
On 5/18/26 19:11, Demi Marie Obenour wrote:
> Is it really *always* better to do the cryptography inline or on the
> CPU?
If there is an inline crypto engine, that is preferable, because we can
submit a single async request and have the hardware mediate the async
requests to the lower layers for us, reducing overhead.
The CPU is a good choice if there is some acceleration built into it
(like AES-NI or NEON), request sizes are small, there is no batching,
the CPU is otherwise idle and total throughput per stream is manageable
with a single core.
That's a lot of conditions, but they happen to be fulfilled in a typical
desktop PC use case, and usually there is no async offload option there
anyway, so we end up on a CPU.
For other platforms, it's different. For example, my always-on machine
is a passively cooled RISC-V board. Transferring data through a SSH
tunnel maxes out at about 20 MB/s, that's where the sshd process uses
100% CPU, and it will only receive that much time if there is nothing
else going on on that machine.
One of my plans for the next weeks is to finally set up offloading
through AF_ALG to see if that improves things. I'm fairly confident it will.
There is also a lot of potential for improvement inside the offload
stack. The dm-crypt batching changes should help quite a bit to keep
large requests together and have the hardware sequence them without help
from the CPU where possible.
fscrypt went the other direction, splitting requests from upper layers
into individual data objects, submitting each separately and waiting for
completion, which I can understand from a software complexity
perspective, but it maximizes overhead for offloading.
> If so, then the async drivers are pointless and can be removed.
In general, if an offload engine with an async driver exists, I would
expect that it provides a benefit over the CPU, in the worst case it
frees up a CPU core even if there is no significant performance
difference, and it uses less energy than a general-purpose core would.
What probably makes sense is changing the defaults. Desktops and servers
generally do not have offload hardware, and therefore do not require a
userspace interface to offload hardware either.
We can also expect people who have a need for offload hardware to do
some configuration work to enable it, and even more configuration work
if they need it to be accessible from userspace. The users are going to
be skilled sysadmins and system integrators anyway.
Simon
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Which, if any, of the async crypto drivers are ever useful in the real world?
2026-05-19 20:36 ` Simon Richter
@ 2026-05-19 21:07 ` Eric Biggers
0 siblings, 0 replies; 3+ messages in thread
From: Eric Biggers @ 2026-05-19 21:07 UTC (permalink / raw)
To: Simon Richter; +Cc: Demi Marie Obenour, linux-crypto, Herbert Xu
On Wed, May 20, 2026 at 05:36:04AM +0900, Simon Richter wrote:
> Hi,
>
> On 5/18/26 19:11, Demi Marie Obenour wrote:
>
> > Is it really *always* better to do the cryptography inline or on the
> > CPU?
>
> If there is an inline crypto engine, that is preferable, because we can
> submit a single async request and have the hardware mediate the async
> requests to the lower layers for us, reducing overhead.
>
> The CPU is a good choice if there is some acceleration built into it (like
> AES-NI or NEON), request sizes are small, there is no batching, the CPU is
> otherwise idle and total throughput per stream is manageable with a single
> core.
>
> That's a lot of conditions, but they happen to be fulfilled in a typical
> desktop PC use case, and usually there is no async offload option there
> anyway, so we end up on a CPU.
CPU is often preferable even when those conditions aren't met.
It's really the other way around. There's a long list of things that
would have to go right for a standalone symmetric crypto engine to be
worthwhile.
Here are the results of some real world tests:
- https://lore.kernel.org/linux-crypto/20250615184638.GA1480@sol/
- https://lore.kernel.org/linux-crypto/20250616164752.GB1373@sol/
- https://lore.kernel.org/linux-fscrypt/20250704070322.20692-1-ebiggers@kernel.org/
> fscrypt went the other direction, splitting requests from upper layers into
> individual data objects, submitting each separately and waiting for
> completion, which I can understand from a software complexity perspective,
> but it maximizes overhead for offloading.
Most kernel code that uses cryptography is synchronous. So this is the
norm, not the exception. Using the async callbacks is the exception,
and history has shown that it's very hard to implement correctly: it
typically results in lots of bug fixes being needed. It's also very
common for the async drivers themselves to have bugs, so anyone
prioritizing correctness can't really use them anyway.
> In general, if an offload engine with an async driver exists, I would expect
> that it provides a benefit over the CPU, in the worst case it frees up a CPU
> core even if there is no significant performance difference, and it uses
> less energy than a general-purpose core would.
For standalone symmetric crypto engines, real-world tests show
otherwise.
- Eric
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-19 21:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 10:11 Which, if any, of the async crypto drivers are ever useful in the real world? Demi Marie Obenour
2026-05-19 20:36 ` Simon Richter
2026-05-19 21:07 ` Eric Biggers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox