* [PATCH 0/4] crypto: skcipher - per-tfm multi-data-unit batching
2026-04-28 10:12 ` Leonid Ravich
@ 2026-05-19 11:59 ` Leonid Ravich
2026-05-19 11:59 ` [PATCH 1/4] crypto: skcipher - add per-tfm data_unit_size for batched requests Leonid Ravich
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Leonid Ravich @ 2026-05-19 11:59 UTC (permalink / raw)
To: Herbert Xu
Cc: David S . Miller, Mike Snitzer, Mikulas Patocka, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
This implements the multi-data-unit skcipher request flow proposed in
the RFC thread [1], following Herbert's ack of the IPsec-friendly
shape and the proof-of-concept performance numbers I posted in [2]
(+19% throughput / -40% CPU on a single-core arm64 system with a
hardware XTS-AES-256 accelerator running fio 4 KiB sequential writes
through dm-crypt).
The series adds a per-tfm "data unit size" to the skcipher API so a
caller can submit several data units in one crypto request, mirroring
the data_unit_size concept already exposed by struct blk_crypto_config
for inline encryption hardware.
The first user is dm-crypt, which today issues one skcipher request
per sector and so pays a per-sector cost in request allocation,
callback dispatch, completion handling, and scatterlist setup.
Allowing the cipher to consume a whole bio per request removes that
overhead. As shown in [2], the per-sector cost dominates the profile
(~25% of CPU cycles) on a hardware accelerator where AES rounds
themselves are nearly free.
[1] https://lore.kernel.org/linux-crypto/... (RFC: crypto: skcipher
multi-data-unit requests for dm-crypt)
[2] Message-Id: 20260428101225.24316-1-lravich@amazon.com
Design overview
---------------
* Patch 1 adds an `unsigned int data_unit_size` field to
`struct crypto_skcipher` (per-tfm: invariant for the consumer's
lifetime, set once via `crypto_skcipher_set_data_unit_size()`),
plus a capability flag CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT in
`cra_flags` (type-specific high-byte range, mirroring the
CRYPTO_AHASH_ALG_BLOCK_ONLY precedent). `crypto_skcipher_encrypt()`
and `crypto_skcipher_decrypt()` validate that `cryptlen` is a
positive multiple of `data_unit_size`. The setter rejects
sub-blocksize values; algorithm registration rejects the flag for
algorithms with `ivsize != 16`.
Also exposes `skcipher_walk_data_units()` in
<crypto/internal/skcipher.h> as a default per-DU dispatcher for
drivers that don't want to roll their own.
* Patch 2 lets the generic `xts(...)` template advertise the flag
when the inner cipher is synchronous. This is the in-tree
software producer of the new capability.
* Patch 3 extends `testmgr` with a self-comparison test that fires
automatically for every alg advertising the flag. The test
encrypts random plaintext two ways - one batched request vs N
back-to-back single-DU requests with derived IVs - and rejects
the algorithm if the ciphertexts differ.
* Patch 4 turns dm-crypt on automatically when all of the following
hold at table load: skcipher (not aead), `tfms_count == 1`, IV
mode is plain or plain64, no per-sector `iv_gen_ops->post()`, no
dm-integrity stacking, and the underlying cipher advertises the
capability. Heap-allocated scatterlists are stashed in
`dm_crypt_request` and freed in `crypt_free_req_skcipher()`,
initialised to NULL on every request alloc to keep the free path
safe on the per-sector code path that does not use them.
This series intentionally does NOT add the capability flag to any
arch crypto driver. Arch maintainers can opt in independently by
wrapping their xts(aes) entry points with skcipher_walk_data_units()
or, for hardware engines, by submitting one HW command for the whole
multi-DU request. The contract documented in
crypto_skcipher_set_data_unit_size() is the only obligation.
Why per-tfm and why cra_flags
-----------------------------
`data_unit_size` is invariant for the tfm's lifetime in every
plausible consumer. dm-crypt picks one sector size per mapped
target at table load. fscrypt would pick one per master key.
IPsec would pick one per SA. Putting the field on
`crypto_skcipher` (rather than on every `skcipher_request`) avoids
growing a hot per-request struct used by fscrypt, IPsec ESP,
AF_ALG, etc. It also lets the driver validate the value once in
`setkey()` and keeps the encrypt/decrypt fast path single-branch
(`likely(!data_unit_size)`).
The capability lives in `cra_flags` for consistency with existing
skcipher capabilities, so it surfaces in `/proc/crypto` and templates
can OR it into derived algorithms.
IV semantics
------------
The contract documented in `crypto_skcipher_set_data_unit_size()`:
the algorithm treats the caller-supplied IV as a 128-bit
little-endian counter and adds the data-unit index for each
subsequent data unit. This is what dm-crypt's plain and plain64
generators already produce, so no IV translation is needed at the
boundary. For modes that don't fit (essiv, lmk, tcw, eboiv,
plain64be, random, null, benbi, elephant) dm-crypt falls back to the
existing per-sector path.
Verification
------------
* checkpatch.pl --strict: clean on all 4 patches.
* Builds clean on x86_64 and arm64.
* QEMU boots; existing xts-aes-aesni / xts-aes-ce / xts-aes-neon
crypto self-tests pass.
* In-kernel testmgr self-comparison passes for any algorithm
advertising the flag.
* dm-crypt round-trip with plain64: encrypt+decrypt produces correct
data through both the existing per-sector path and the multi-DU
path (the latter exercised against an out-of-tree arm64 / x86 xts
enablement during development).
* dm-crypt activation gating: plain -> enabled, plain64 -> enabled,
essiv:sha256 -> fallback (correctly rejected), plain64be ->
fallback.
* Byte-equivalence: 256 MB of ciphertext written through the
multi-DU path is bit-identical to ciphertext written through the
per-sector path (sha256
4913910b1aa6f8859fcb8f4adec20230274993a3ade8f4dd0140a323dc43efc0
on plain64+xts-aes). The on-disk format is unchanged.
Leonid Ravich (4):
crypto: skcipher - add per-tfm data_unit_size for batched requests
crypto: xts - support multiple data units per request in template
crypto: testmgr - exercise multi-data-unit path for skcipher
dm crypt: batch all sectors of a bio per crypto request
crypto/skcipher.c | 120 ++++++++++++++
crypto/testmgr.c | 129 +++++++++++++++
crypto/xts.c | 25 ++-
drivers/md/dm-crypt.c | 248 ++++++++++++++++++++++++++++-
include/crypto/internal/skcipher.h | 34 ++++
include/crypto/skcipher.h | 85 ++++++++++
6 files changed, 632 insertions(+), 9 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/4] crypto: skcipher - add per-tfm data_unit_size for batched requests
2026-04-28 10:12 ` Leonid Ravich
2026-05-19 11:59 ` [PATCH 0/4] crypto: skcipher - per-tfm multi-data-unit batching Leonid Ravich
@ 2026-05-19 11:59 ` Leonid Ravich
2026-05-19 11:59 ` [PATCH 2/4] crypto: xts - support multiple data units per request in template Leonid Ravich
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Leonid Ravich @ 2026-05-19 11:59 UTC (permalink / raw)
To: Herbert Xu
Cc: David S . Miller, Mike Snitzer, Mikulas Patocka, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
Add a per-tfm data_unit_size and an algorithm capability flag that
together allow a caller to submit several data units in a single
skcipher request. The IV passed in the request applies to the first
data unit; the algorithm advances the tweak between data units
according to the mode specification (e.g., LE128 multiply for XTS per
IEEE 1619).
This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption hardware, but at the
software skcipher layer. The first user is dm-crypt, which today
issues one request per sector and so pays a per-sector cost in
request allocation, IV generation, callback dispatch, and completion
handling. Allowing the cipher to consume a whole bio per request
removes that overhead for drivers that can chain across data units
internally.
The data_unit_size lives on struct crypto_skcipher rather than on
struct skcipher_request because it does not change between requests
for any plausible consumer: dm-crypt picks one sector size per
mapped target at table load time; fscrypt would pick one per master
key. Anchoring it to the tfm also lets the driver validate it once
at setkey() time and avoids per-request initialisation hazards on
mempool-recycled requests.
Capability is advertised with CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT
in cra_flags (type-specific high-byte range, mirroring the
CRYPTO_AHASH_ALG_* convention). This makes the capability visible
in /proc/crypto and lets templates OR it into their derived
algorithms.
crypto_skcipher_set_data_unit_size() returns -EOPNOTSUPP if the
algorithm does not advertise the flag, and accepts 0 (the default)
unconditionally so callers can re-disable batching cheaply.
crypto_skcipher_encrypt()/decrypt() reject requests whose cryptlen
is not a multiple of the configured data_unit_size with -EINVAL.
The check is gated on data_unit_size != 0 so it costs nothing for
the common single-data-unit case.
No in-tree algorithm advertises the flag yet; subsequent patches
add the generic xts() template, arm64, and x86 producers as well
as the dm-crypt consumer.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
crypto/skcipher.c | 120 +++++++++++++++++++++++++++++
include/crypto/internal/skcipher.h | 34 ++++++++
include/crypto/skcipher.h | 85 ++++++++++++++++++++
3 files changed, 239 insertions(+)
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 8fa5d9686d08..9155a4d9ea6d 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -183,13 +183,119 @@ int crypto_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
}
EXPORT_SYMBOL_GPL(crypto_skcipher_setkey);
+int crypto_skcipher_set_data_unit_size(struct crypto_skcipher *tfm,
+ unsigned int data_unit_size)
+{
+ unsigned int blocksize;
+
+ if (!data_unit_size) {
+ tfm->data_unit_size = 0;
+ return 0;
+ }
+
+ if (!crypto_skcipher_supports_multi_data_unit(tfm))
+ return -EOPNOTSUPP;
+
+ blocksize = crypto_skcipher_blocksize(tfm);
+ if (data_unit_size < blocksize || data_unit_size % blocksize)
+ return -EINVAL;
+
+ tfm->data_unit_size = data_unit_size;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_skcipher_set_data_unit_size);
+
+static int crypto_skcipher_check_data_unit_size(struct crypto_skcipher *tfm,
+ struct skcipher_request *req)
+{
+ unsigned int du = tfm->data_unit_size;
+
+ if (likely(!du))
+ return 0;
+ if (req->cryptlen % du)
+ return -EINVAL;
+ return 0;
+}
+
+/*
+ * Increment a 16-byte little-endian counter held in @iv. See
+ * crypto_skcipher_set_data_unit_size() for the convention.
+ */
+static inline void skcipher_iv_inc_le128(u8 *iv)
+{
+ __le64 lo_le, hi_le;
+ u64 lo;
+
+ memcpy(&lo_le, iv, 8);
+ memcpy(&hi_le, iv + 8, 8);
+ lo = le64_to_cpu(lo_le) + 1;
+ lo_le = cpu_to_le64(lo);
+ memcpy(iv, &lo_le, 8);
+ if (unlikely(lo == 0)) {
+ hi_le = cpu_to_le64(le64_to_cpu(hi_le) + 1);
+ memcpy(iv + 8, &hi_le, 8);
+ }
+}
+
+int skcipher_walk_data_units(struct skcipher_request *req,
+ int (*body)(struct skcipher_request *))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const unsigned int du = tfm->data_unit_size;
+ const unsigned int total = req->cryptlen;
+ struct scatterlist *orig_src = req->src;
+ struct scatterlist *orig_dst = req->dst;
+ struct scatterlist src_sg[2], dst_sg[2];
+ u8 iv_save[16];
+ unsigned int off;
+ int err = 0;
+
+ if (likely(!du))
+ return body(req);
+
+ /*
+ * Registration of an algorithm advertising
+ * CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT enforces ivsize == 16
+ * (see skcipher_prepare_alg_common()), so this is purely
+ * defensive against algorithm-registration bugs.
+ */
+ if (WARN_ON_ONCE(crypto_skcipher_ivsize(tfm) != 16))
+ return -EINVAL;
+
+ memcpy(iv_save, req->iv, 16);
+
+ for (off = 0; off < total; off += du) {
+ req->cryptlen = du;
+ req->src = scatterwalk_ffwd(src_sg, orig_src, off);
+ req->dst = (orig_src == orig_dst) ? req->src :
+ scatterwalk_ffwd(dst_sg, orig_dst, off);
+
+ err = body(req);
+ if (err)
+ break;
+
+ skcipher_iv_inc_le128(iv_save);
+ memcpy(req->iv, iv_save, 16);
+ }
+
+ req->src = orig_src;
+ req->dst = orig_dst;
+ req->cryptlen = total;
+ return err;
+}
+EXPORT_SYMBOL_GPL(skcipher_walk_data_units);
+
int crypto_skcipher_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+ int err;
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
+ err = crypto_skcipher_check_data_unit_size(tfm, req);
+ if (err)
+ return err;
if (alg->co.base.cra_type != &crypto_skcipher_type)
return crypto_lskcipher_encrypt_sg(req);
return alg->encrypt(req);
@@ -200,9 +306,13 @@ int crypto_skcipher_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+ int err;
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
+ err = crypto_skcipher_check_data_unit_size(tfm, req);
+ if (err)
+ return err;
if (alg->co.base.cra_type != &crypto_skcipher_type)
return crypto_lskcipher_decrypt_sg(req);
return alg->decrypt(req);
@@ -432,6 +542,16 @@ int skcipher_prepare_alg_common(struct skcipher_alg_common *alg)
(alg->ivsize + alg->statesize) > PAGE_SIZE / 2)
return -EINVAL;
+ /*
+ * Algorithms advertising multi-data-unit support must use the
+ * 16-byte little-endian counter convention documented in
+ * crypto_skcipher_set_data_unit_size(); see also
+ * skcipher_walk_data_units().
+ */
+ if ((base->cra_flags & CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT) &&
+ alg->ivsize != 16)
+ return -EINVAL;
+
if (!alg->chunksize)
alg->chunksize = base->cra_blocksize;
diff --git a/include/crypto/internal/skcipher.h b/include/crypto/internal/skcipher.h
index d5aa535263f6..bfabc97f34ef 100644
--- a/include/crypto/internal/skcipher.h
+++ b/include/crypto/internal/skcipher.h
@@ -22,6 +22,40 @@
*/
#define CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE CRYPTO_ALG_OPTIONAL_KEY
+/**
+ * skcipher_walk_data_units - dispatch a request as one body call per data unit
+ * @req: the caller's skcipher request
+ * @body: the algorithm's single-data-unit encrypt or decrypt function
+ *
+ * When tfm->data_unit_size is zero this is a tail call into @body with
+ * @req unchanged. Otherwise the request is split into
+ * cryptlen / data_unit_size sub-ranges and @body is called once per
+ * sub-range with req->cryptlen, req->src, req->dst, and req->iv adjusted
+ * for that sub-range. The IV passed to data unit n is the caller-
+ * supplied IV plus n, where + is a 128-bit little-endian add — this
+ * matches the convention documented in
+ * crypto_skcipher_set_data_unit_size().
+ *
+ * Many single-data-unit XTS bodies modify the IV buffer in place during
+ * processing (the tweak is walked block by block). This helper saves
+ * the caller's IV before each call and rewrites the next data unit's
+ * IV from the saved value, so the body always sees a fresh per-DU IV
+ * regardless of any in-place mutation it performs.
+ *
+ * The body MUST run to completion synchronously. Drivers that use this
+ * helper therefore advertise CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT only
+ * for synchronous configurations.
+ *
+ * After the call returns, the contents of req->iv are unspecified per
+ * the documented contract. src/dst/cryptlen are restored to the
+ * caller's values to keep skcipher request post-conditions intact.
+ *
+ * Return: 0 on success, or the body's negative errno on the first
+ * data unit that returned non-zero.
+ */
+int skcipher_walk_data_units(struct skcipher_request *req,
+ int (*body)(struct skcipher_request *));
+
struct aead_request;
struct rtattr;
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 9e5853464345..c4112c57f6a2 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -26,6 +26,15 @@
/* Set this bit if the skcipher operation is not final. */
#define CRYPTO_SKCIPHER_REQ_NOTFINAL 0x00000002
+/*
+ * Set in cra_flags by an skcipher algorithm that supports processing
+ * multiple data units in a single request. See
+ * crypto_skcipher_set_data_unit_size().
+ *
+ * Type-specific flag in the 0xff000000 reserved range.
+ */
+#define CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT 0x01000000
+
struct scatterlist;
/**
@@ -53,6 +62,22 @@ struct skcipher_request {
struct crypto_skcipher {
unsigned int reqsize;
+ /*
+ * Number of bytes in one data unit when batching multiple data units
+ * per request. 0 means "single data unit per request" (legacy
+ * behaviour). Set via crypto_skcipher_set_data_unit_size().
+ *
+ * When non-zero, cryptlen must be a multiple of data_unit_size. The
+ * IV passed in skcipher_request::iv applies to the first data unit;
+ * the algorithm advances the tweak between data units according to
+ * the mode specification (e.g., LE128 multiply for XTS per
+ * IEEE 1619).
+ *
+ * Only algorithms that advertise CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT
+ * in cra_flags accept a non-zero value.
+ */
+ unsigned int data_unit_size;
+
struct crypto_tfm base;
};
@@ -491,6 +516,66 @@ static inline unsigned int crypto_lskcipher_chunksize(
return crypto_lskcipher_alg(tfm)->co.chunksize;
}
+/**
+ * crypto_skcipher_supports_multi_data_unit() - test multi-data-unit support
+ * @tfm: cipher handle
+ *
+ * Return: true if the algorithm advertises that it can process multiple
+ * data units in a single skcipher_request.
+ */
+static inline bool
+crypto_skcipher_supports_multi_data_unit(struct crypto_skcipher *tfm)
+{
+ return crypto_skcipher_alg_common(tfm)->base.cra_flags &
+ CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT;
+}
+
+/**
+ * crypto_skcipher_set_data_unit_size() - set data unit size for the tfm
+ * @tfm: cipher handle
+ * @data_unit_size: data unit size in bytes; 0 disables multi-data-unit mode
+ *
+ * Configure the tfm to process multiple data units per request. When set
+ * to a non-zero value, every subsequent encrypt/decrypt request must have
+ * cryptlen that is a multiple of @data_unit_size. Each data unit is
+ * processed as if it were a separate request whose IV is derived from the
+ * preceding data unit's IV by the algorithm-specific tweak update rule:
+ * the implementation treats the caller-supplied IV as a 128-bit
+ * little-endian counter and adds the data-unit index for each subsequent
+ * data unit.
+ *
+ * The contents of req->iv after a multi-data-unit request returns are
+ * unspecified — callers MUST NOT rely on it being either the original
+ * value or the final-data-unit value. Set a fresh IV before every
+ * request.
+ *
+ * The algorithm must advertise CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT in its
+ * cra_flags. @data_unit_size must be a positive multiple of the
+ * algorithm's cra_blocksize, otherwise -EINVAL is returned.
+ *
+ * Setting @data_unit_size to 0 reverts the tfm to single-data-unit
+ * behaviour and is always permitted.
+ *
+ * Return: 0 on success; -EOPNOTSUPP if the algorithm does not advertise
+ * multi-data-unit support; -EINVAL if @data_unit_size is not a
+ * positive multiple of the cipher block size.
+ */
+int crypto_skcipher_set_data_unit_size(struct crypto_skcipher *tfm,
+ unsigned int data_unit_size);
+
+/**
+ * crypto_skcipher_data_unit_size() - obtain data unit size
+ * @tfm: cipher handle
+ *
+ * Return: configured data unit size in bytes; 0 if multi-data-unit mode
+ * is disabled.
+ */
+static inline unsigned int
+crypto_skcipher_data_unit_size(struct crypto_skcipher *tfm)
+{
+ return tfm->data_unit_size;
+}
+
/**
* crypto_skcipher_statesize() - obtain state size
* @tfm: cipher handle
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 2/4] crypto: xts - support multiple data units per request in template
2026-04-28 10:12 ` Leonid Ravich
2026-05-19 11:59 ` [PATCH 0/4] crypto: skcipher - per-tfm multi-data-unit batching Leonid Ravich
2026-05-19 11:59 ` [PATCH 1/4] crypto: skcipher - add per-tfm data_unit_size for batched requests Leonid Ravich
@ 2026-05-19 11:59 ` Leonid Ravich
2026-05-19 11:59 ` [PATCH 3/4] crypto: testmgr - exercise multi-data-unit path for skcipher Leonid Ravich
2026-05-19 12:00 ` [PATCH 4/4] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
4 siblings, 0 replies; 9+ messages in thread
From: Leonid Ravich @ 2026-05-19 11:59 UTC (permalink / raw)
To: Herbert Xu
Cc: David S . Miller, Mike Snitzer, Mikulas Patocka, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
Teach the generic xts() template to consume cryptlen larger than one
data unit when the caller has configured a non-zero data_unit_size on
the tfm. Each data unit is processed with its own IV, derived from
the caller-supplied IV by treating it as a 128-bit little-endian
counter and adding the data-unit index. This matches the
sector-indexed XTS used by dm-crypt's plain64 IV mode and by typical
inline-encryption hardware.
The single-data-unit body is unchanged and is now reached via a thin
xts_crypt_multi() dispatcher that skips straight to the body when
data_unit_size is zero (the legacy default), so existing users see
no extra cost.
Advertise CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT in cra_flags only when
the inner cipher is synchronous. An async inner cipher would require
a per-DU completion chain which is out of scope for the slow software
template; consumers that need multi-DU on async hardware will use one
of the arch-specific drivers added later in this series.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
crypto/xts.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/crypto/xts.c b/crypto/xts.c
index 3da8f5e053d6..2b7233311dad 100644
--- a/crypto/xts.c
+++ b/crypto/xts.c
@@ -258,7 +258,7 @@ static int xts_init_crypt(struct skcipher_request *req,
return 0;
}
-static int xts_encrypt(struct skcipher_request *req)
+static int xts_encrypt_one(struct skcipher_request *req)
{
struct xts_request_ctx *rctx = skcipher_request_ctx(req);
struct skcipher_request *subreq = &rctx->subreq;
@@ -275,7 +275,7 @@ static int xts_encrypt(struct skcipher_request *req)
return xts_cts_final(req, crypto_skcipher_encrypt);
}
-static int xts_decrypt(struct skcipher_request *req)
+static int xts_decrypt_one(struct skcipher_request *req)
{
struct xts_request_ctx *rctx = skcipher_request_ctx(req);
struct skcipher_request *subreq = &rctx->subreq;
@@ -292,6 +292,16 @@ static int xts_decrypt(struct skcipher_request *req)
return xts_cts_final(req, crypto_skcipher_decrypt);
}
+static int xts_encrypt(struct skcipher_request *req)
+{
+ return skcipher_walk_data_units(req, xts_encrypt_one);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ return skcipher_walk_data_units(req, xts_decrypt_one);
+}
+
static int xts_init_tfm(struct crypto_skcipher *tfm)
{
struct skcipher_instance *inst = skcipher_alg_instance(tfm);
@@ -427,6 +437,17 @@ static int xts_create(struct crypto_template *tmpl, struct rtattr **tb)
inst->alg.base.cra_alignmask = alg->base.cra_alignmask |
(__alignof__(u64) - 1);
+ /*
+ * Advertise multi-data-unit support only when the inner cipher is
+ * synchronous. The dispatcher in skcipher_walk_data_units() calls
+ * the single-DU body in a loop and assumes synchronous completion;
+ * supporting async would require a per-DU callback chain, which
+ * the slow software template does not need.
+ */
+ if (!(alg->base.cra_flags & CRYPTO_ALG_ASYNC))
+ inst->alg.base.cra_flags |=
+ CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT;
+
inst->alg.ivsize = XTS_BLOCK_SIZE;
inst->alg.min_keysize = alg->min_keysize * 2;
inst->alg.max_keysize = alg->max_keysize * 2;
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 3/4] crypto: testmgr - exercise multi-data-unit path for skcipher
2026-04-28 10:12 ` Leonid Ravich
` (2 preceding siblings ...)
2026-05-19 11:59 ` [PATCH 2/4] crypto: xts - support multiple data units per request in template Leonid Ravich
@ 2026-05-19 11:59 ` Leonid Ravich
2026-05-19 12:00 ` [PATCH 4/4] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
4 siblings, 0 replies; 9+ messages in thread
From: Leonid Ravich @ 2026-05-19 11:59 UTC (permalink / raw)
To: Herbert Xu
Cc: David S . Miller, Mike Snitzer, Mikulas Patocka, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
Add a self-comparison test that runs whenever an skcipher algorithm
advertises CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT in cra_flags. The test
encrypts the same random plaintext two ways:
1. as one batched request with data_unit_size set, and
2. as N back-to-back single-data-unit requests with IVs derived from
the original IV by adding the data-unit index (treated as a
128-bit little-endian counter, matching the convention documented
in crypto_skcipher_set_data_unit_size()).
Both encrypts must produce byte-identical ciphertext, otherwise the
algorithm's multi-DU implementation is inconsistent with its single-DU
behaviour. Iterates over a fixed set of typical data unit sizes
(512, 1024, 2048, 4096) which cover the dm-crypt sector-size range.
The test is gated on ivsize == 16 (XTS, the only multi-DU consumer in
the kernel today) and on the algorithm advertising the capability,
so it costs nothing for the existing fleet of skcipher drivers.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
crypto/testmgr.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 129 insertions(+)
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 6a490aaa71b9..45cc7acc85ee 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3217,6 +3217,123 @@ static int test_skcipher(int enc, const struct cipher_test_suite *suite,
return 0;
}
+/*
+ * For algorithms that advertise CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT,
+ * verify that one request batching N data units produces the same
+ * ciphertext as N back-to-back single-data-unit requests with IVs
+ * derived from the original IV by adding the data-unit index (treated
+ * as a 128-bit little-endian counter).
+ *
+ * This is a self-comparison: it does not depend on test-vector
+ * authoritativeness, only on the algorithm being internally consistent
+ * between its single-DU and multi-DU paths.
+ */
+#define TEST_MDU_NR_UNITS 4
+static int test_skcipher_multi_du(struct crypto_skcipher *tfm,
+ unsigned int du_size)
+{
+ const char *driver = crypto_skcipher_driver_name(tfm);
+ const unsigned int ivsize = crypto_skcipher_ivsize(tfm);
+ const unsigned int total = du_size * TEST_MDU_NR_UNITS;
+ struct skcipher_request *req = NULL;
+ struct scatterlist sg_in, sg_out;
+ DECLARE_CRYPTO_WAIT(wait);
+ u8 iv_orig[16] = {0};
+ u8 iv_work[16];
+ u8 *plain = NULL, *batched = NULL, *unit = NULL;
+ unsigned int i;
+ int err;
+
+ if (ivsize != 16)
+ return 0;
+
+ plain = kmalloc(total, GFP_KERNEL);
+ batched = kmalloc(total, GFP_KERNEL);
+ unit = kmalloc(total, GFP_KERNEL);
+ req = skcipher_request_alloc(tfm, GFP_KERNEL);
+ if (!plain || !batched || !unit || !req) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ get_random_bytes(plain, total);
+ get_random_bytes(iv_orig, ivsize);
+
+ /* Pass 1: one batched encrypt with data_unit_size set. */
+ err = crypto_skcipher_set_data_unit_size(tfm, du_size);
+ if (err) {
+ pr_err("alg: skcipher: %s set_data_unit_size(%u) failed: %d\n",
+ driver, du_size, err);
+ goto out;
+ }
+ memcpy(batched, plain, total);
+ memcpy(iv_work, iv_orig, ivsize);
+ sg_init_one(&sg_in, batched, total);
+ sg_out = sg_in;
+ skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP,
+ crypto_req_done, &wait);
+ skcipher_request_set_crypt(req, &sg_in, &sg_out, total, iv_work);
+ err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+ if (err) {
+ pr_err("alg: skcipher: %s multi-DU batched encrypt failed: %d\n",
+ driver, err);
+ goto out_clear_du;
+ }
+
+ /* Pass 2: TEST_MDU_NR_UNITS single-DU encrypts with derived IVs. */
+ err = crypto_skcipher_set_data_unit_size(tfm, 0);
+ if (err)
+ goto out;
+ memcpy(unit, plain, total);
+ memcpy(iv_work, iv_orig, ivsize);
+ for (i = 0; i < TEST_MDU_NR_UNITS; i++) {
+ sg_init_one(&sg_in, unit + i * du_size, du_size);
+ sg_out = sg_in;
+ skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP,
+ crypto_req_done, &wait);
+ skcipher_request_set_crypt(req, &sg_in, &sg_out, du_size,
+ iv_work);
+ err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+ if (err) {
+ pr_err("alg: skcipher: %s single-DU[%u] encrypt failed: %d\n",
+ driver, i, err);
+ goto out;
+ }
+ /* Increment iv_work as a 128-bit little-endian counter. */
+ {
+ __le64 lo_le, hi_le;
+ u64 lo;
+
+ memcpy(&lo_le, iv_work, 8);
+ memcpy(&hi_le, iv_work + 8, 8);
+ lo = le64_to_cpu(lo_le) + 1;
+ lo_le = cpu_to_le64(lo);
+ memcpy(iv_work, &lo_le, 8);
+ if (lo == 0) {
+ hi_le = cpu_to_le64(le64_to_cpu(hi_le) + 1);
+ memcpy(iv_work + 8, &hi_le, 8);
+ }
+ }
+ }
+
+ if (memcmp(batched, unit, total) != 0) {
+ pr_err("alg: skcipher: %s multi-DU mismatch (du=%u, n=%u)\n",
+ driver, du_size, TEST_MDU_NR_UNITS);
+ err = -EINVAL;
+ }
+
+out_clear_du:
+ (void)crypto_skcipher_set_data_unit_size(tfm, 0);
+out:
+ skcipher_request_free(req);
+ kfree(unit);
+ kfree(batched);
+ kfree(plain);
+ return err;
+}
+
static int alg_test_skcipher(const struct alg_test_desc *desc,
const char *driver, u32 type, u32 mask)
{
@@ -3265,6 +3382,18 @@ static int alg_test_skcipher(const struct alg_test_desc *desc,
if (err)
goto out;
+ if (crypto_skcipher_supports_multi_data_unit(tfm)) {
+ static const unsigned int du_sizes[] = { 512, 1024, 2048, 4096 };
+ unsigned int j;
+
+ for (j = 0; j < ARRAY_SIZE(du_sizes); j++) {
+ err = test_skcipher_multi_du(tfm, du_sizes[j]);
+ if (err)
+ goto out;
+ cond_resched();
+ }
+ }
+
err = test_skcipher_vs_generic_impl(desc->generic_driver, req, tsgls);
out:
free_cipher_test_sglists(tsgls);
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 4/4] dm crypt: batch all sectors of a bio per crypto request
2026-04-28 10:12 ` Leonid Ravich
` (3 preceding siblings ...)
2026-05-19 11:59 ` [PATCH 3/4] crypto: testmgr - exercise multi-data-unit path for skcipher Leonid Ravich
@ 2026-05-19 12:00 ` Leonid Ravich
2026-05-25 12:02 ` Mikulas Patocka
4 siblings, 1 reply; 9+ messages in thread
From: Leonid Ravich @ 2026-05-19 12:00 UTC (permalink / raw)
To: Herbert Xu
Cc: David S . Miller, Mike Snitzer, Mikulas Patocka, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
When the underlying skcipher driver advertises support for multiple
data units in a single request (CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT),
configure the cipher with cc->sector_size as data_unit_size and
submit one request per bio instead of one request per sector. This
removes per-sector overhead in the crypto API hot path: request
allocation, callback dispatch, completion handling, and SG setup.
The optimisation is enabled automatically at table load when all
of the following hold:
- the cipher is non-aead (i.e. skcipher);
- tfms_count is 1 (interleaved per-sector keys would break batching);
- the IV mode is plain or plain64 (the only modes whose generator
produces a sequential 64-bit little-endian counter that the cipher
can extend by adding the data-unit index, matching the convention
documented in crypto_skcipher_set_data_unit_size());
- the iv_gen_ops->post() hook is unset (lmk and tcw use it; both are
already excluded by the IV-mode test, but the explicit check makes
the assumption durable against future IV modes);
- dm-integrity is not stacked (no integrity tag or integrity IV);
- the cipher driver advertises multi-data-unit support.
A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
time, gates the multi-data-unit path. The existing per-sector path
in crypt_convert_block_skcipher() is unchanged; the new
crypt_convert_block_skcipher_multi() is reached from a small dispatch
in crypt_convert() and shares the same backlog/-EBUSY/-EINPROGRESS
flow control with the per-sector path.
Heap-allocated scatterlists are stashed in dm_crypt_request and freed
in crypt_free_req_skcipher() to avoid races between the synchronous-
success free path and async-completion reuse from the request pool.
On -ENOMEM during scatterlist allocation, the bio is requeued via
BLK_STS_DEV_RESOURCE rather than failed, matching the behaviour of
the existing -ENOMEM path for crypto request allocation.
Verified end-to-end with a byte-equivalence test: encrypted output of
plain64 dm-crypt with the multi-data-unit path matches output of the
single-data-unit path bit-for-bit over a 256 MB device.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
drivers/md/dm-crypt.c | 248 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 241 insertions(+), 7 deletions(-)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 5ef43231fe77..b35831d43f0e 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -98,6 +98,14 @@ struct dm_crypt_request {
struct scatterlist sg_in[4];
struct scatterlist sg_out[4];
u64 iv_sector;
+ /*
+ * Heap-allocated scatterlists used by the multi-data-unit path
+ * when one bio is processed in a single skcipher request. NULL
+ * when the inline sg_in[]/sg_out[] arrays above are sufficient
+ * (single-data-unit path). Freed in crypt_free_req_skcipher().
+ */
+ struct scatterlist *sg_in_ext;
+ struct scatterlist *sg_out_ext;
};
struct crypt_config;
@@ -149,6 +157,7 @@ enum cipher_flags {
CRYPT_IV_LARGE_SECTORS, /* Calculate IV from sector_size, not 512B sectors */
CRYPT_ENCRYPT_PREPROCESS, /* Must preprocess data for encryption (elephant) */
CRYPT_KEY_MAC_SIZE_SET, /* The integrity_key_size option was used */
+ CRYPT_MULTI_DATA_UNIT, /* Batch all sectors of a bio per crypto request */
};
/*
@@ -1501,12 +1510,139 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
return r;
}
+/*
+ * Multi-data-unit variant of crypt_convert_block_skcipher. Submits all
+ * remaining sectors of the current bio in one skcipher request whose
+ * data_unit_size is cc->sector_size. The cipher walks the IV between
+ * data units (see crypto_skcipher_set_data_unit_size()).
+ *
+ * Returns the same set of values as crypt_convert_block_skcipher:
+ * 0 on synchronous success (full chunk processed),
+ * -EINPROGRESS / -EBUSY on asynchronous dispatch,
+ * -ENOMEM if scatterlist allocation fails (caller maps to
+ * BLK_STS_DEV_RESOURCE so the bio is requeued, not failed),
+ * negative errno otherwise.
+ *
+ * On success the bio iterators have been advanced by the chunk size.
+ */
+static int crypt_convert_block_skcipher_multi(struct crypt_config *cc,
+ struct convert_context *ctx,
+ struct skcipher_request *req,
+ unsigned int *out_processed)
+{
+ const unsigned int sector_size = cc->sector_size;
+ unsigned int total_in = ctx->iter_in.bi_size;
+ unsigned int total_out = ctx->iter_out.bi_size;
+ unsigned int total = min(total_in, total_out);
+ unsigned int n_sectors;
+ unsigned int n_sg_in = 0, n_sg_out = 0;
+ struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+ struct scatterlist *sg_in = NULL, *sg_out = NULL;
+ struct bvec_iter iter_in, iter_out;
+ struct bio_vec bv;
+ u8 *iv, *org_iv;
+ int r;
+
+ if (unlikely(total < sector_size))
+ return -EIO;
+ n_sectors = total / sector_size;
+ total = n_sectors * sector_size;
+
+ /*
+ * Walk the bio_vec iterators to count how many SG entries we need
+ * for exactly @total bytes. bi_size of the iterators is at least
+ * @total by construction above.
+ */
+ iter_in = ctx->iter_in;
+ iter_in.bi_size = total;
+ __bio_for_each_segment(bv, ctx->bio_in, iter_in, iter_in)
+ n_sg_in++;
+
+ iter_out = ctx->iter_out;
+ iter_out.bi_size = total;
+ __bio_for_each_segment(bv, ctx->bio_out, iter_out, iter_out)
+ n_sg_out++;
+
+ sg_in = kmalloc_array(n_sg_in, sizeof(*sg_in), GFP_NOIO);
+ sg_out = (ctx->bio_in == ctx->bio_out) ? sg_in :
+ kmalloc_array(n_sg_out, sizeof(*sg_out), GFP_NOIO);
+ if (!sg_in || !sg_out) {
+ kfree(sg_in);
+ if (sg_out != sg_in)
+ kfree(sg_out);
+ return -ENOMEM;
+ }
+
+ sg_init_table(sg_in, n_sg_in);
+ {
+ unsigned int i = 0;
+
+ iter_in = ctx->iter_in;
+ iter_in.bi_size = total;
+ __bio_for_each_segment(bv, ctx->bio_in, iter_in, iter_in)
+ sg_set_page(&sg_in[i++], bv.bv_page, bv.bv_len,
+ bv.bv_offset);
+ }
+
+ if (sg_out != sg_in) {
+ unsigned int i = 0;
+
+ sg_init_table(sg_out, n_sg_out);
+ iter_out = ctx->iter_out;
+ iter_out.bi_size = total;
+ __bio_for_each_segment(bv, ctx->bio_out, iter_out, iter_out)
+ sg_set_page(&sg_out[i++], bv.bv_page, bv.bv_len,
+ bv.bv_offset);
+ }
+
+ /*
+ * Compute the IV for the first data unit. The cipher will derive
+ * IVs for subsequent data units by treating this one as a 128-bit
+ * little-endian counter and adding the data-unit index, which
+ * matches the layout produced by plain and plain64.
+ */
+ dmreq->iv_sector = ctx->cc_sector;
+ if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+ dmreq->iv_sector >>= cc->sector_shift;
+ dmreq->ctx = ctx;
+
+ iv = iv_of_dmreq(cc, dmreq);
+ org_iv = org_iv_of_dmreq(cc, dmreq);
+ r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
+ if (r < 0)
+ goto out_free_sg;
+ memcpy(iv, org_iv, cc->iv_size);
+
+ /* Stash the SG arrays for cleanup on completion / free. */
+ dmreq->sg_in_ext = sg_in;
+ dmreq->sg_out_ext = (sg_out == sg_in) ? NULL : sg_out;
+
+ skcipher_request_set_crypt(req, sg_in, sg_out, total, iv);
+
+ if (bio_data_dir(ctx->bio_in) == WRITE)
+ r = crypto_skcipher_encrypt(req);
+ else
+ r = crypto_skcipher_decrypt(req);
+
+ *out_processed = total;
+ return r;
+
+out_free_sg:
+ kfree(sg_in);
+ if (sg_out != sg_in)
+ kfree(sg_out);
+ dmreq->sg_in_ext = NULL;
+ dmreq->sg_out_ext = NULL;
+ return r;
+}
+
static void kcryptd_async_done(void *async_req, int error);
static int crypt_alloc_req_skcipher(struct crypt_config *cc,
struct convert_context *ctx)
{
unsigned int key_index = ctx->cc_sector & (cc->tfms_count - 1);
+ struct dm_crypt_request *dmreq;
if (!ctx->r.req) {
ctx->r.req = mempool_alloc(&cc->req_pool, in_interrupt() ? GFP_ATOMIC : GFP_NOIO);
@@ -1516,6 +1652,18 @@ static int crypt_alloc_req_skcipher(struct crypt_config *cc,
skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
+ /*
+ * Initialise the heap-allocated scatterlist pointers so that
+ * crypt_free_req_skcipher() does not read uninitialised memory
+ * for paths that don't take the multi-data-unit branch. The
+ * dmreq trailer lives in the per-bio data area which is not
+ * zeroed by the dm core, and the request is reused from the
+ * mempool across many bios.
+ */
+ dmreq = dmreq_of_req(cc, ctx->r.req);
+ dmreq->sg_in_ext = NULL;
+ dmreq->sg_out_ext = NULL;
+
/*
* Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
* requests if driver request queue is full.
@@ -1562,6 +1710,12 @@ static void crypt_free_req_skcipher(struct crypt_config *cc,
struct skcipher_request *req, struct bio *base_bio)
{
struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
+ struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+
+ kfree(dmreq->sg_in_ext);
+ dmreq->sg_in_ext = NULL;
+ kfree(dmreq->sg_out_ext);
+ dmreq->sg_out_ext = NULL;
if ((struct skcipher_request *)(io + 1) != req)
mempool_free(req, &cc->req_pool);
@@ -1590,7 +1744,9 @@ static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_
static blk_status_t crypt_convert(struct crypt_config *cc,
struct convert_context *ctx, bool atomic, bool reset_pending)
{
- unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+ const unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+ const bool multi_du = test_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+ unsigned int processed;
int r;
/*
@@ -1611,8 +1767,13 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
atomic_inc(&ctx->cc_pending);
+ processed = cc->sector_size;
if (crypt_integrity_aead(cc))
r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, ctx->tag_offset);
+ else if (multi_du)
+ r = crypt_convert_block_skcipher_multi(cc, ctx,
+ ctx->r.req,
+ &processed);
else
r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, ctx->tag_offset);
@@ -1634,8 +1795,19 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
* exit and continue processing in a workqueue
*/
ctx->r.req = NULL;
- ctx->tag_offset++;
- ctx->cc_sector += sector_step;
+ if (!multi_du) {
+ ctx->tag_offset++;
+ ctx->cc_sector += sector_step;
+ } else {
+ bio_advance_iter(ctx->bio_in,
+ &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out,
+ &ctx->iter_out,
+ processed);
+ ctx->cc_sector +=
+ processed >> SECTOR_SHIFT;
+ }
return BLK_STS_DEV_RESOURCE;
}
} else {
@@ -1649,19 +1821,42 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
*/
case -EINPROGRESS:
ctx->r.req = NULL;
- ctx->tag_offset++;
- ctx->cc_sector += sector_step;
+ if (!multi_du) {
+ ctx->tag_offset++;
+ ctx->cc_sector += sector_step;
+ } else {
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+ processed);
+ ctx->cc_sector += processed >> SECTOR_SHIFT;
+ }
continue;
/*
* The request was already processed (synchronously).
*/
case 0:
atomic_dec(&ctx->cc_pending);
- ctx->cc_sector += sector_step;
- ctx->tag_offset++;
+ if (!multi_du) {
+ ctx->cc_sector += sector_step;
+ ctx->tag_offset++;
+ } else {
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+ processed);
+ ctx->cc_sector += processed >> SECTOR_SHIFT;
+ }
if (!atomic)
cond_resched();
continue;
+ /*
+ * Out of memory for the multi-DU SG arrays — bounce back
+ * to the caller for requeue rather than failing the bio.
+ */
+ case -ENOMEM:
+ atomic_dec(&ctx->cc_pending);
+ return BLK_STS_DEV_RESOURCE;
/*
* There was a data integrity error.
*/
@@ -3142,6 +3337,45 @@ static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
}
}
+ /*
+ * Enable multi-data-unit batching when the cipher supports it and
+ * the IV layout is one we can derive per-DU from a single starting
+ * IV: plain or plain64 produce a sequential 64-bit little-endian
+ * counter, which matches the convention of
+ * crypto_skcipher_set_data_unit_size(). Restrict to the simple
+ * case (single tfm, no integrity, no per-sector post() callback)
+ * to keep the consumer path small; modes like essiv, lmk, tcw,
+ * eboiv, plain64be, random, null, benbi, and elephant are
+ * deliberately excluded because their generators or post-IV hooks
+ * cannot be re-derived by the cipher between data units.
+ */
+ if (!crypt_integrity_aead(cc) && cc->tfms_count == 1 &&
+ cc->iv_gen_ops &&
+ (cc->iv_gen_ops == &crypt_iv_plain_ops ||
+ cc->iv_gen_ops == &crypt_iv_plain64_ops) &&
+ !cc->iv_gen_ops->post &&
+ !cc->integrity_tag_size && !cc->integrity_iv_size &&
+ crypto_skcipher_supports_multi_data_unit(cc->cipher_tfm.tfms[0])) {
+ ret = crypto_skcipher_set_data_unit_size(cc->cipher_tfm.tfms[0],
+ cc->sector_size);
+ if (!ret) {
+ set_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+ DMINFO("Using multi-data-unit crypto offload (du=%u)",
+ cc->sector_size);
+ } else {
+ /*
+ * The driver advertised the capability via cra_flags
+ * but rejected the requested data unit size. This is
+ * a driver bug worth seeing in dmesg; fall back to
+ * the per-sector path so the device still activates.
+ */
+ DMWARN_LIMIT("multi-DU offload disabled: %s rejected du=%u (%d)",
+ crypto_skcipher_driver_name(cc->cipher_tfm.tfms[0]),
+ cc->sector_size, ret);
+ ret = 0;
+ }
+ }
+
/* wipe the kernel key payload copy */
if (cc->key_string)
memset(cc->key, 0, cc->key_size * sizeof(u8));
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 4/4] dm crypt: batch all sectors of a bio per crypto request
2026-05-19 12:00 ` [PATCH 4/4] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
@ 2026-05-25 12:02 ` Mikulas Patocka
0 siblings, 0 replies; 9+ messages in thread
From: Mikulas Patocka @ 2026-05-25 12:02 UTC (permalink / raw)
To: Leonid Ravich
Cc: Herbert Xu, David S . Miller, Mike Snitzer, Alasdair Kergon,
Ard Biesheuvel, Eric Biggers, Jens Axboe, Horia Geanta,
Gilad Ben-Yossef, linux-crypto, dm-devel, linux-block
[-- Attachment #1: Type: text/plain, Size: 15091 bytes --]
Hi
On Tue, 19 May 2026, Leonid Ravich wrote:
> When the underlying skcipher driver advertises support for multiple
> data units in a single request (CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT),
> configure the cipher with cc->sector_size as data_unit_size and
> submit one request per bio instead of one request per sector. This
> removes per-sector overhead in the crypto API hot path: request
> allocation, callback dispatch, completion handling, and SG setup.
>
> The optimisation is enabled automatically at table load when all
> of the following hold:
>
> - the cipher is non-aead (i.e. skcipher);
> - tfms_count is 1 (interleaved per-sector keys would break batching);
> - the IV mode is plain or plain64 (the only modes whose generator
> produces a sequential 64-bit little-endian counter that the cipher
> can extend by adding the data-unit index, matching the convention
> documented in crypto_skcipher_set_data_unit_size());
> - the iv_gen_ops->post() hook is unset (lmk and tcw use it; both are
> already excluded by the IV-mode test, but the explicit check makes
> the assumption durable against future IV modes);
> - dm-integrity is not stacked (no integrity tag or integrity IV);
> - the cipher driver advertises multi-data-unit support.
>
> A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
> time, gates the multi-data-unit path. The existing per-sector path
> in crypt_convert_block_skcipher() is unchanged; the new
> crypt_convert_block_skcipher_multi() is reached from a small dispatch
> in crypt_convert() and shares the same backlog/-EBUSY/-EINPROGRESS
> flow control with the per-sector path.
>
> Heap-allocated scatterlists are stashed in dm_crypt_request and freed
> in crypt_free_req_skcipher() to avoid races between the synchronous-
> success free path and async-completion reuse from the request pool.
>
> On -ENOMEM during scatterlist allocation, the bio is requeued via
> BLK_STS_DEV_RESOURCE rather than failed, matching the behaviour of
> the existing -ENOMEM path for crypto request allocation.
You should make sure that you do not attempt to use the multi-data-unit
mode when you retry the bio, otherwise it could loop indefinitely. Note
that there are people who swap to dm-crypt - and so, it must work even if
the memory is totally exhausted.
You should also use GFP_NOIO | __GFP_NORETRY instead of GFP_NOIO, so that
the code doesn't loop in the allocator forever.
Perhaps __bio_for_each_bvec would be better than __bio_for_each_segment,
so that it works faster with folios.
Mikulas
> Verified end-to-end with a byte-equivalence test: encrypted output of
> plain64 dm-crypt with the multi-data-unit path matches output of the
> single-data-unit path bit-for-bit over a 256 MB device.
>
> Signed-off-by: Leonid Ravich <lravich@amazon.com>
> ---
> drivers/md/dm-crypt.c | 248 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 241 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 5ef43231fe77..b35831d43f0e 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -98,6 +98,14 @@ struct dm_crypt_request {
> struct scatterlist sg_in[4];
> struct scatterlist sg_out[4];
> u64 iv_sector;
> + /*
> + * Heap-allocated scatterlists used by the multi-data-unit path
> + * when one bio is processed in a single skcipher request. NULL
> + * when the inline sg_in[]/sg_out[] arrays above are sufficient
> + * (single-data-unit path). Freed in crypt_free_req_skcipher().
> + */
> + struct scatterlist *sg_in_ext;
> + struct scatterlist *sg_out_ext;
> };
>
> struct crypt_config;
> @@ -149,6 +157,7 @@ enum cipher_flags {
> CRYPT_IV_LARGE_SECTORS, /* Calculate IV from sector_size, not 512B sectors */
> CRYPT_ENCRYPT_PREPROCESS, /* Must preprocess data for encryption (elephant) */
> CRYPT_KEY_MAC_SIZE_SET, /* The integrity_key_size option was used */
> + CRYPT_MULTI_DATA_UNIT, /* Batch all sectors of a bio per crypto request */
> };
>
> /*
> @@ -1501,12 +1510,139 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
> return r;
> }
>
> +/*
> + * Multi-data-unit variant of crypt_convert_block_skcipher. Submits all
> + * remaining sectors of the current bio in one skcipher request whose
> + * data_unit_size is cc->sector_size. The cipher walks the IV between
> + * data units (see crypto_skcipher_set_data_unit_size()).
> + *
> + * Returns the same set of values as crypt_convert_block_skcipher:
> + * 0 on synchronous success (full chunk processed),
> + * -EINPROGRESS / -EBUSY on asynchronous dispatch,
> + * -ENOMEM if scatterlist allocation fails (caller maps to
> + * BLK_STS_DEV_RESOURCE so the bio is requeued, not failed),
> + * negative errno otherwise.
> + *
> + * On success the bio iterators have been advanced by the chunk size.
> + */
> +static int crypt_convert_block_skcipher_multi(struct crypt_config *cc,
> + struct convert_context *ctx,
> + struct skcipher_request *req,
> + unsigned int *out_processed)
> +{
> + const unsigned int sector_size = cc->sector_size;
> + unsigned int total_in = ctx->iter_in.bi_size;
> + unsigned int total_out = ctx->iter_out.bi_size;
> + unsigned int total = min(total_in, total_out);
> + unsigned int n_sectors;
> + unsigned int n_sg_in = 0, n_sg_out = 0;
> + struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
> + struct scatterlist *sg_in = NULL, *sg_out = NULL;
> + struct bvec_iter iter_in, iter_out;
> + struct bio_vec bv;
> + u8 *iv, *org_iv;
> + int r;
> +
> + if (unlikely(total < sector_size))
> + return -EIO;
> + n_sectors = total / sector_size;
> + total = n_sectors * sector_size;
> +
> + /*
> + * Walk the bio_vec iterators to count how many SG entries we need
> + * for exactly @total bytes. bi_size of the iterators is at least
> + * @total by construction above.
> + */
> + iter_in = ctx->iter_in;
> + iter_in.bi_size = total;
> + __bio_for_each_segment(bv, ctx->bio_in, iter_in, iter_in)
> + n_sg_in++;
> +
> + iter_out = ctx->iter_out;
> + iter_out.bi_size = total;
> + __bio_for_each_segment(bv, ctx->bio_out, iter_out, iter_out)
> + n_sg_out++;
> +
> + sg_in = kmalloc_array(n_sg_in, sizeof(*sg_in), GFP_NOIO);
> + sg_out = (ctx->bio_in == ctx->bio_out) ? sg_in :
> + kmalloc_array(n_sg_out, sizeof(*sg_out), GFP_NOIO);
> + if (!sg_in || !sg_out) {
> + kfree(sg_in);
> + if (sg_out != sg_in)
> + kfree(sg_out);
> + return -ENOMEM;
> + }
> +
> + sg_init_table(sg_in, n_sg_in);
> + {
> + unsigned int i = 0;
> +
> + iter_in = ctx->iter_in;
> + iter_in.bi_size = total;
> + __bio_for_each_segment(bv, ctx->bio_in, iter_in, iter_in)
> + sg_set_page(&sg_in[i++], bv.bv_page, bv.bv_len,
> + bv.bv_offset);
> + }
> +
> + if (sg_out != sg_in) {
> + unsigned int i = 0;
> +
> + sg_init_table(sg_out, n_sg_out);
> + iter_out = ctx->iter_out;
> + iter_out.bi_size = total;
> + __bio_for_each_segment(bv, ctx->bio_out, iter_out, iter_out)
> + sg_set_page(&sg_out[i++], bv.bv_page, bv.bv_len,
> + bv.bv_offset);
> + }
> +
> + /*
> + * Compute the IV for the first data unit. The cipher will derive
> + * IVs for subsequent data units by treating this one as a 128-bit
> + * little-endian counter and adding the data-unit index, which
> + * matches the layout produced by plain and plain64.
> + */
> + dmreq->iv_sector = ctx->cc_sector;
> + if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
> + dmreq->iv_sector >>= cc->sector_shift;
> + dmreq->ctx = ctx;
> +
> + iv = iv_of_dmreq(cc, dmreq);
> + org_iv = org_iv_of_dmreq(cc, dmreq);
> + r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
> + if (r < 0)
> + goto out_free_sg;
> + memcpy(iv, org_iv, cc->iv_size);
> +
> + /* Stash the SG arrays for cleanup on completion / free. */
> + dmreq->sg_in_ext = sg_in;
> + dmreq->sg_out_ext = (sg_out == sg_in) ? NULL : sg_out;
> +
> + skcipher_request_set_crypt(req, sg_in, sg_out, total, iv);
> +
> + if (bio_data_dir(ctx->bio_in) == WRITE)
> + r = crypto_skcipher_encrypt(req);
> + else
> + r = crypto_skcipher_decrypt(req);
> +
> + *out_processed = total;
> + return r;
> +
> +out_free_sg:
> + kfree(sg_in);
> + if (sg_out != sg_in)
> + kfree(sg_out);
> + dmreq->sg_in_ext = NULL;
> + dmreq->sg_out_ext = NULL;
> + return r;
> +}
> +
> static void kcryptd_async_done(void *async_req, int error);
>
> static int crypt_alloc_req_skcipher(struct crypt_config *cc,
> struct convert_context *ctx)
> {
> unsigned int key_index = ctx->cc_sector & (cc->tfms_count - 1);
> + struct dm_crypt_request *dmreq;
>
> if (!ctx->r.req) {
> ctx->r.req = mempool_alloc(&cc->req_pool, in_interrupt() ? GFP_ATOMIC : GFP_NOIO);
> @@ -1516,6 +1652,18 @@ static int crypt_alloc_req_skcipher(struct crypt_config *cc,
>
> skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
>
> + /*
> + * Initialise the heap-allocated scatterlist pointers so that
> + * crypt_free_req_skcipher() does not read uninitialised memory
> + * for paths that don't take the multi-data-unit branch. The
> + * dmreq trailer lives in the per-bio data area which is not
> + * zeroed by the dm core, and the request is reused from the
> + * mempool across many bios.
> + */
> + dmreq = dmreq_of_req(cc, ctx->r.req);
> + dmreq->sg_in_ext = NULL;
> + dmreq->sg_out_ext = NULL;
> +
> /*
> * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
> * requests if driver request queue is full.
> @@ -1562,6 +1710,12 @@ static void crypt_free_req_skcipher(struct crypt_config *cc,
> struct skcipher_request *req, struct bio *base_bio)
> {
> struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
> + struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
> +
> + kfree(dmreq->sg_in_ext);
> + dmreq->sg_in_ext = NULL;
> + kfree(dmreq->sg_out_ext);
> + dmreq->sg_out_ext = NULL;
>
> if ((struct skcipher_request *)(io + 1) != req)
> mempool_free(req, &cc->req_pool);
> @@ -1590,7 +1744,9 @@ static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_
> static blk_status_t crypt_convert(struct crypt_config *cc,
> struct convert_context *ctx, bool atomic, bool reset_pending)
> {
> - unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
> + const unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
> + const bool multi_du = test_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
> + unsigned int processed;
> int r;
>
> /*
> @@ -1611,8 +1767,13 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
>
> atomic_inc(&ctx->cc_pending);
>
> + processed = cc->sector_size;
> if (crypt_integrity_aead(cc))
> r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, ctx->tag_offset);
> + else if (multi_du)
> + r = crypt_convert_block_skcipher_multi(cc, ctx,
> + ctx->r.req,
> + &processed);
> else
> r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, ctx->tag_offset);
>
> @@ -1634,8 +1795,19 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
> * exit and continue processing in a workqueue
> */
> ctx->r.req = NULL;
> - ctx->tag_offset++;
> - ctx->cc_sector += sector_step;
> + if (!multi_du) {
> + ctx->tag_offset++;
> + ctx->cc_sector += sector_step;
> + } else {
> + bio_advance_iter(ctx->bio_in,
> + &ctx->iter_in,
> + processed);
> + bio_advance_iter(ctx->bio_out,
> + &ctx->iter_out,
> + processed);
> + ctx->cc_sector +=
> + processed >> SECTOR_SHIFT;
> + }
> return BLK_STS_DEV_RESOURCE;
> }
> } else {
> @@ -1649,19 +1821,42 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
> */
> case -EINPROGRESS:
> ctx->r.req = NULL;
> - ctx->tag_offset++;
> - ctx->cc_sector += sector_step;
> + if (!multi_du) {
> + ctx->tag_offset++;
> + ctx->cc_sector += sector_step;
> + } else {
> + bio_advance_iter(ctx->bio_in, &ctx->iter_in,
> + processed);
> + bio_advance_iter(ctx->bio_out, &ctx->iter_out,
> + processed);
> + ctx->cc_sector += processed >> SECTOR_SHIFT;
> + }
> continue;
> /*
> * The request was already processed (synchronously).
> */
> case 0:
> atomic_dec(&ctx->cc_pending);
> - ctx->cc_sector += sector_step;
> - ctx->tag_offset++;
> + if (!multi_du) {
> + ctx->cc_sector += sector_step;
> + ctx->tag_offset++;
> + } else {
> + bio_advance_iter(ctx->bio_in, &ctx->iter_in,
> + processed);
> + bio_advance_iter(ctx->bio_out, &ctx->iter_out,
> + processed);
> + ctx->cc_sector += processed >> SECTOR_SHIFT;
> + }
> if (!atomic)
> cond_resched();
> continue;
> + /*
> + * Out of memory for the multi-DU SG arrays — bounce back
> + * to the caller for requeue rather than failing the bio.
> + */
> + case -ENOMEM:
> + atomic_dec(&ctx->cc_pending);
> + return BLK_STS_DEV_RESOURCE;
> /*
> * There was a data integrity error.
> */
> @@ -3142,6 +3337,45 @@ static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
> }
> }
>
> + /*
> + * Enable multi-data-unit batching when the cipher supports it and
> + * the IV layout is one we can derive per-DU from a single starting
> + * IV: plain or plain64 produce a sequential 64-bit little-endian
> + * counter, which matches the convention of
> + * crypto_skcipher_set_data_unit_size(). Restrict to the simple
> + * case (single tfm, no integrity, no per-sector post() callback)
> + * to keep the consumer path small; modes like essiv, lmk, tcw,
> + * eboiv, plain64be, random, null, benbi, and elephant are
> + * deliberately excluded because their generators or post-IV hooks
> + * cannot be re-derived by the cipher between data units.
> + */
> + if (!crypt_integrity_aead(cc) && cc->tfms_count == 1 &&
> + cc->iv_gen_ops &&
> + (cc->iv_gen_ops == &crypt_iv_plain_ops ||
> + cc->iv_gen_ops == &crypt_iv_plain64_ops) &&
> + !cc->iv_gen_ops->post &&
> + !cc->integrity_tag_size && !cc->integrity_iv_size &&
> + crypto_skcipher_supports_multi_data_unit(cc->cipher_tfm.tfms[0])) {
> + ret = crypto_skcipher_set_data_unit_size(cc->cipher_tfm.tfms[0],
> + cc->sector_size);
> + if (!ret) {
> + set_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
> + DMINFO("Using multi-data-unit crypto offload (du=%u)",
> + cc->sector_size);
> + } else {
> + /*
> + * The driver advertised the capability via cra_flags
> + * but rejected the requested data unit size. This is
> + * a driver bug worth seeing in dmesg; fall back to
> + * the per-sector path so the device still activates.
> + */
> + DMWARN_LIMIT("multi-DU offload disabled: %s rejected du=%u (%d)",
> + crypto_skcipher_driver_name(cc->cipher_tfm.tfms[0]),
> + cc->sector_size, ret);
> + ret = 0;
> + }
> + }
> +
> /* wipe the kernel key payload copy */
> if (cc->key_string)
> memset(cc->key, 0, cc->key_size * sizeof(u8));
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 9+ messages in thread