* Which, if any, of the async crypto drivers are ever useful in the real world? @ 2026-05-18 10:11 Demi Marie Obenour 2026-05-19 20:36 ` Simon Richter 0 siblings, 1 reply; 3+ messages in thread From: Demi Marie Obenour @ 2026-05-18 10:11 UTC (permalink / raw) To: linux-crypto; +Cc: Eric Biggers, Herbert Xu [-- Attachment #1.1.1: Type: text/plain, Size: 997 bytes --] Looking at mailing list archives, I've yet to see real-world benchmarks showing that the async crypto drivers actually help, outside of: 1. Hashing or encrypting a big file. 2. Situations where the kernel doesn't have the keys. The first is not something I expect to be performance-critical. The second is something I have yet to see outside of phmac on IBM mainframes. When WireGuard was being merged, there was a strong argument that the Inside Secure driver (now Rambus) really does help bulk network encryption. But I didn't see specific benchmarks, only rough numbers. I plan to (someday) send a patch that makes drivers inaccessible via AF_ALG. This means that the drivers will only be useable in-kernel. And more and more in-kernel uses are moving to lib/crypto/ instead. Is it really *always* better to do the cryptography inline or on the CPU? If so, then the async drivers are pointless and can be removed. -- Sincerely, Demi Marie Obenour (she/her/hers) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 7253 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Which, if any, of the async crypto drivers are ever useful in the real world? 2026-05-18 10:11 Which, if any, of the async crypto drivers are ever useful in the real world? Demi Marie Obenour @ 2026-05-19 20:36 ` Simon Richter 2026-05-19 21:07 ` Eric Biggers 0 siblings, 1 reply; 3+ messages in thread From: Simon Richter @ 2026-05-19 20:36 UTC (permalink / raw) To: Demi Marie Obenour, linux-crypto; +Cc: Eric Biggers, Herbert Xu [-- Attachment #1.1: Type: text/plain, Size: 2587 bytes --] Hi, On 5/18/26 19:11, Demi Marie Obenour wrote: > Is it really *always* better to do the cryptography inline or on the > CPU? If there is an inline crypto engine, that is preferable, because we can submit a single async request and have the hardware mediate the async requests to the lower layers for us, reducing overhead. The CPU is a good choice if there is some acceleration built into it (like AES-NI or NEON), request sizes are small, there is no batching, the CPU is otherwise idle and total throughput per stream is manageable with a single core. That's a lot of conditions, but they happen to be fulfilled in a typical desktop PC use case, and usually there is no async offload option there anyway, so we end up on a CPU. For other platforms, it's different. For example, my always-on machine is a passively cooled RISC-V board. Transferring data through a SSH tunnel maxes out at about 20 MB/s, that's where the sshd process uses 100% CPU, and it will only receive that much time if there is nothing else going on on that machine. One of my plans for the next weeks is to finally set up offloading through AF_ALG to see if that improves things. I'm fairly confident it will. There is also a lot of potential for improvement inside the offload stack. The dm-crypt batching changes should help quite a bit to keep large requests together and have the hardware sequence them without help from the CPU where possible. fscrypt went the other direction, splitting requests from upper layers into individual data objects, submitting each separately and waiting for completion, which I can understand from a software complexity perspective, but it maximizes overhead for offloading. > If so, then the async drivers are pointless and can be removed. In general, if an offload engine with an async driver exists, I would expect that it provides a benefit over the CPU, in the worst case it frees up a CPU core even if there is no significant performance difference, and it uses less energy than a general-purpose core would. What probably makes sense is changing the defaults. Desktops and servers generally do not have offload hardware, and therefore do not require a userspace interface to offload hardware either. We can also expect people who have a need for offload hardware to do some configuration work to enable it, and even more configuration work if they need it to be accessible from userspace. The users are going to be skilled sysadmins and system integrators anyway. Simon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Which, if any, of the async crypto drivers are ever useful in the real world? 2026-05-19 20:36 ` Simon Richter @ 2026-05-19 21:07 ` Eric Biggers 0 siblings, 0 replies; 3+ messages in thread From: Eric Biggers @ 2026-05-19 21:07 UTC (permalink / raw) To: Simon Richter; +Cc: Demi Marie Obenour, linux-crypto, Herbert Xu On Wed, May 20, 2026 at 05:36:04AM +0900, Simon Richter wrote: > Hi, > > On 5/18/26 19:11, Demi Marie Obenour wrote: > > > Is it really *always* better to do the cryptography inline or on the > > CPU? > > If there is an inline crypto engine, that is preferable, because we can > submit a single async request and have the hardware mediate the async > requests to the lower layers for us, reducing overhead. > > The CPU is a good choice if there is some acceleration built into it (like > AES-NI or NEON), request sizes are small, there is no batching, the CPU is > otherwise idle and total throughput per stream is manageable with a single > core. > > That's a lot of conditions, but they happen to be fulfilled in a typical > desktop PC use case, and usually there is no async offload option there > anyway, so we end up on a CPU. CPU is often preferable even when those conditions aren't met. It's really the other way around. There's a long list of things that would have to go right for a standalone symmetric crypto engine to be worthwhile. Here are the results of some real world tests: - https://lore.kernel.org/linux-crypto/20250615184638.GA1480@sol/ - https://lore.kernel.org/linux-crypto/20250616164752.GB1373@sol/ - https://lore.kernel.org/linux-fscrypt/20250704070322.20692-1-ebiggers@kernel.org/ > fscrypt went the other direction, splitting requests from upper layers into > individual data objects, submitting each separately and waiting for > completion, which I can understand from a software complexity perspective, > but it maximizes overhead for offloading. Most kernel code that uses cryptography is synchronous. So this is the norm, not the exception. Using the async callbacks is the exception, and history has shown that it's very hard to implement correctly: it typically results in lots of bug fixes being needed. It's also very common for the async drivers themselves to have bugs, so anyone prioritizing correctness can't really use them anyway. > In general, if an offload engine with an async driver exists, I would expect > that it provides a benefit over the CPU, in the worst case it frees up a CPU > core even if there is no significant performance difference, and it uses > less energy than a general-purpose core would. For standalone symmetric crypto engines, real-world tests show otherwise. - Eric ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-19 21:07 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-18 10:11 Which, if any, of the async crypto drivers are ever useful in the real world? Demi Marie Obenour 2026-05-19 20:36 ` Simon Richter 2026-05-19 21:07 ` Eric Biggers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox