Linux cryptographic layer development
 help / color / mirror / Atom feed
* Which, if any, of the async crypto drivers are ever useful in the real world?
@ 2026-05-18 10:11 Demi Marie Obenour
  2026-05-19 20:36 ` Simon Richter
  0 siblings, 1 reply; 3+ messages in thread
From: Demi Marie Obenour @ 2026-05-18 10:11 UTC (permalink / raw)
  To: linux-crypto; +Cc: Eric Biggers, Herbert Xu


[-- Attachment #1.1.1: Type: text/plain, Size: 997 bytes --]

Looking at mailing list archives, I've yet to see real-world benchmarks
showing that the async crypto drivers actually help, outside of:

1. Hashing or encrypting a big file.
2. Situations where the kernel doesn't have the keys.

The first is not something I expect to be performance-critical.
The second is something I have yet to see outside of phmac on IBM
mainframes.

When WireGuard was being merged, there was a strong argument that
the Inside Secure driver (now Rambus) really does help bulk network
encryption.  But I didn't see specific benchmarks, only rough numbers.

I plan to (someday) send a patch that makes drivers inaccessible via
AF_ALG.  This means that the drivers will only be useable in-kernel.
And more and more in-kernel uses are moving to lib/crypto/ instead.

Is it really *always* better to do the cryptography inline or on the
CPU?  If so, then the async drivers are pointless and can be removed.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Which, if any, of the async crypto drivers are ever useful in the real world?
  2026-05-18 10:11 Which, if any, of the async crypto drivers are ever useful in the real world? Demi Marie Obenour
@ 2026-05-19 20:36 ` Simon Richter
  2026-05-19 21:07   ` Eric Biggers
  0 siblings, 1 reply; 3+ messages in thread
From: Simon Richter @ 2026-05-19 20:36 UTC (permalink / raw)
  To: Demi Marie Obenour, linux-crypto; +Cc: Eric Biggers, Herbert Xu


[-- Attachment #1.1: Type: text/plain, Size: 2587 bytes --]

Hi,

On 5/18/26 19:11, Demi Marie Obenour wrote:

> Is it really *always* better to do the cryptography inline or on the
> CPU?

If there is an inline crypto engine, that is preferable, because we can 
submit a single async request and have the hardware mediate the async 
requests to the lower layers for us, reducing overhead.

The CPU is a good choice if there is some acceleration built into it 
(like AES-NI or NEON), request sizes are small, there is no batching, 
the CPU is otherwise idle and total throughput per stream is manageable 
with a single core.

That's a lot of conditions, but they happen to be fulfilled in a typical 
desktop PC use case, and usually there is no async offload option there 
anyway, so we end up on a CPU.

For other platforms, it's different. For example, my always-on machine 
is a passively cooled RISC-V board. Transferring data through a SSH 
tunnel maxes out at about 20 MB/s, that's where the sshd process uses 
100% CPU, and it will only receive that much time if there is nothing 
else going on on that machine.

One of my plans for the next weeks is to finally set up offloading 
through AF_ALG to see if that improves things. I'm fairly confident it will.

There is also a lot of potential for improvement inside the offload 
stack. The dm-crypt batching changes should help quite a bit to keep 
large requests together and have the hardware sequence them without help 
from the CPU where possible.

fscrypt went the other direction, splitting requests from upper layers 
into individual data objects, submitting each separately and waiting for 
completion, which I can understand from a software complexity 
perspective, but it maximizes overhead for offloading.

> If so, then the async drivers are pointless and can be removed.

In general, if an offload engine with an async driver exists, I would 
expect that it provides a benefit over the CPU, in the worst case it 
frees up a CPU core even if there is no significant performance 
difference, and it uses less energy than a general-purpose core would.

What probably makes sense is changing the defaults. Desktops and servers 
generally do not have offload hardware, and therefore do not require a 
userspace interface to offload hardware either.

We can also expect people who have a need for offload hardware to do 
some configuration work to enable it, and even more configuration work 
if they need it to be accessible from userspace. The users are going to 
be skilled sysadmins and system integrators anyway.

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Which, if any, of the async crypto drivers are ever useful in the real world?
  2026-05-19 20:36 ` Simon Richter
@ 2026-05-19 21:07   ` Eric Biggers
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Biggers @ 2026-05-19 21:07 UTC (permalink / raw)
  To: Simon Richter; +Cc: Demi Marie Obenour, linux-crypto, Herbert Xu

On Wed, May 20, 2026 at 05:36:04AM +0900, Simon Richter wrote:
> Hi,
> 
> On 5/18/26 19:11, Demi Marie Obenour wrote:
> 
> > Is it really *always* better to do the cryptography inline or on the
> > CPU?
> 
> If there is an inline crypto engine, that is preferable, because we can
> submit a single async request and have the hardware mediate the async
> requests to the lower layers for us, reducing overhead.
> 
> The CPU is a good choice if there is some acceleration built into it (like
> AES-NI or NEON), request sizes are small, there is no batching, the CPU is
> otherwise idle and total throughput per stream is manageable with a single
> core.
> 
> That's a lot of conditions, but they happen to be fulfilled in a typical
> desktop PC use case, and usually there is no async offload option there
> anyway, so we end up on a CPU.

CPU is often preferable even when those conditions aren't met.

It's really the other way around.  There's a long list of things that
would have to go right for a standalone symmetric crypto engine to be
worthwhile.

Here are the results of some real world tests:

    - https://lore.kernel.org/linux-crypto/20250615184638.GA1480@sol/
    - https://lore.kernel.org/linux-crypto/20250616164752.GB1373@sol/
    - https://lore.kernel.org/linux-fscrypt/20250704070322.20692-1-ebiggers@kernel.org/

> fscrypt went the other direction, splitting requests from upper layers into
> individual data objects, submitting each separately and waiting for
> completion, which I can understand from a software complexity perspective,
> but it maximizes overhead for offloading.

Most kernel code that uses cryptography is synchronous.  So this is the
norm, not the exception.  Using the async callbacks is the exception,
and history has shown that it's very hard to implement correctly: it
typically results in lots of bug fixes being needed.  It's also very
common for the async drivers themselves to have bugs, so anyone
prioritizing correctness can't really use them anyway.
    
> In general, if an offload engine with an async driver exists, I would expect
> that it provides a benefit over the CPU, in the worst case it frees up a CPU
> core even if there is no significant performance difference, and it uses
> less energy than a general-purpose core would.

For standalone symmetric crypto engines, real-world tests show
otherwise.

- Eric

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-19 21:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 10:11 Which, if any, of the async crypto drivers are ever useful in the real world? Demi Marie Obenour
2026-05-19 20:36 ` Simon Richter
2026-05-19 21:07   ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox