* [RFC] crypto: skcipher multi-data-unit requests for dm-crypt
@ 2026-04-27 9:56 Leonid Ravich
2026-04-27 11:28 ` Herbert Xu
0 siblings, 1 reply; 2+ messages in thread
From: Leonid Ravich @ 2026-04-27 9:56 UTC (permalink / raw)
To: Herbert Xu, David S. Miller
Cc: Mike Snitzer, Mikulas Patocka, Alasdair Kergon, Ard Biesheuvel,
Eric Biggers, Jens Axboe, Horia Geanta, Gilad Ben-Yossef,
linux-crypto, dm-devel, linux-block
dm-crypt submits one skcipher request per sector. For XTS mode with
512-byte sectors, a large bio can contain hundreds of sectors, each
requiring a separate crypto_skcipher_encrypt() call with its own
request allocation, IV generation, and async callback.
On systems with asynchronous hardware crypto accelerators, the
actual encryption is fast and this per-request setup overhead
becomes the bottleneck. Reducing the number of crypto API calls
per bio significantly improves throughput for these configurations.
This problem was discussed in December 2016 by Binoy Jayan, Milan
Broz, and Herbert Xu:
https://lkml.indiana.edu/hypermail/linux/kernel/1612.2/01912.html
Herbert suggested moving IV generation into the crypto API so that
dm-crypt could submit larger blocks. The essiv template (by Ard
Biesheuvel, merged in 5.4) was the first step. The multi-data-unit
request concept was never implemented.
Existing hardware support
=========================
Several upstream crypto drivers already have hardware support for
per-data-unit tweak management in XTS mode but cannot use it
because the crypto API has no way to communicate the data unit
size:
NXP CAAM (drivers/crypto/caam/) has a hardware sector_size
register in its XTS shared descriptor. The driver currently
hardcodes it to 32768 bytes with this comment:
"Set sector size to a big value, practically disabling
sector size segmentation in xts implementation. We cannot
take full advantage of this HW feature with existing
crypto API / dm-crypt SW architecture."
Arm CryptoCell (drivers/crypto/ccree/) programs the hardware
data unit size via set_xex_data_unit_size() in its HW descriptor.
The driver template structure already has a data_unit field,
currently unused.
HiSilicon SEC2 (drivers/crypto/hisilicon/sec2/) submits the full
cryptlen to hardware with internal tweak management and SG DMA
via hardware scatter-gather lists.
Intel QAT (drivers/crypto/intel/qat/) submits full cryptlen to
hardware XTS mode in one operation with SG buffer lists.
Proposal
========
Add a data_unit_size field to struct skcipher_request:
struct skcipher_request {
unsigned int cryptlen;
u8 *iv;
struct scatterlist *src;
struct scatterlist *dst;
+ unsigned int data_unit_size;
struct crypto_async_request base;
void *__ctx[] CRYPTO_MINALIGN_ATTR;
};
When data_unit_size is 0, behavior is unchanged (cryptlen is one
data unit). When data_unit_size is nonzero, cryptlen must be a
multiple of data_unit_size. The IV applies to the first data unit.
The crypto driver is responsible for incrementing the tweak per
data unit according to the mode.
This mirrors the data_unit_size concept already present in struct
blk_crypto_config for inline encryption. In blk-crypto the size
is a property of the key configuration. Here it is per-request
because dm-crypt may use different sector sizes across different
device-mapper tables sharing the same tfm.
Required changes
=================
1. crypto: skcipher - add data_unit_size to skcipher_request
as described above. The skcipher layer validates that
cryptlen is a multiple of data_unit_size before dispatching
to the driver.
2. crypto: xts - handle data_unit_size > 0 in the generic
software XTS template by looping internally per data unit.
This provides a universal fallback so every xts(...)
instantiation supports multi-data-unit. The tweak increment
(gf128mul_x_ble) is already implemented in the template.
Hardware drivers override this with native support.
3. crypto: testmgr - add multi-data-unit XTS test vectors
that cross-validate against individual per-unit encryption.
4. crypto: drivers - CAAM, CryptoCell, and other drivers with
hardware data-unit support can read req->data_unit_size and
program their hardware registers accordingly. For CAAM this
means setting the sector_size register to the actual value
instead of the current 32768 workaround.
5. dm-crypt - build a multi-entry scatterlist from the bio's
bio_vecs, generate the IV for the first sector, set
data_unit_size to the sector size, and submit one request.
The existing per-sector path remains for IV modes that
require post-processing (lmk, tcw, elephant) and for AEAD
integrity modes.
Thanks,
Leonid Ravich
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [RFC] crypto: skcipher multi-data-unit requests for dm-crypt
2026-04-27 9:56 [RFC] crypto: skcipher multi-data-unit requests for dm-crypt Leonid Ravich
@ 2026-04-27 11:28 ` Herbert Xu
0 siblings, 0 replies; 2+ messages in thread
From: Herbert Xu @ 2026-04-27 11:28 UTC (permalink / raw)
To: Leonid Ravich
Cc: David S. Miller, Mike Snitzer, Mikulas Patocka, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
On Mon, Apr 27, 2026 at 09:56:22AM +0000, Leonid Ravich wrote:
>
> Proposal
> ========
>
> Add a data_unit_size field to struct skcipher_request:
>
> struct skcipher_request {
> unsigned int cryptlen;
> u8 *iv;
> struct scatterlist *src;
> struct scatterlist *dst;
> + unsigned int data_unit_size;
> struct crypto_async_request base;
> void *__ctx[] CRYPTO_MINALIGN_ATTR;
> };
>
> When data_unit_size is 0, behavior is unchanged (cryptlen is one
> data unit). When data_unit_size is nonzero, cryptlen must be a
> multiple of data_unit_size. The IV applies to the first data unit.
> The crypto driver is responsible for incrementing the tweak per
> data unit according to the mode.
>
> This mirrors the data_unit_size concept already present in struct
> blk_crypto_config for inline encryption. In blk-crypto the size
> is a property of the key configuration. Here it is per-request
> because dm-crypt may use different sector sizes across different
> device-mapper tables sharing the same tfm.
Yes I'm happy with this since it could also work for IPsec.
But before you invest too much energy in it it would be helpful
if you can get some proof-of-concept performance numbers so that
your effort is not wasted down the track.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-27 11:28 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 9:56 [RFC] crypto: skcipher multi-data-unit requests for dm-crypt Leonid Ravich
2026-04-27 11:28 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox