Linux block layer
 help / color / mirror / Atom feed
* [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
@ 2026-06-15 11:14 Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
	Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
	linux-block

This is v4, addressing Herbert's review of v3.  Two architectural
changes:

  - data_unit_size is now per-request (on struct skcipher_request)
    rather than per-tfm.  Reverts to the v1 placement.

  - The crypto API auto-splits multi-data-unit requests when the
    underlying algorithm does not advertise
    CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU.  Consumers no longer test
    for multi-DU support before submitting; setting data_unit_size
    on any skcipher request whose algorithm uses the 128-bit LE
    counter IV convention "just works".

These two changes shrink the series from 4 patches to 3 (the
generic xts(...) template needs no special handling - the
auto-splitter calls its single-DU encrypt/decrypt once per data
unit) and simplify the dm-crypt consumer (no advertise-flag check,
no per-tfm setup).

v3: https://lore.kernel.org/linux-crypto/20260601085641.16028-1-lravich@amazon.com/
v2: https://lore.kernel.org/linux-crypto/20260527065021.19525-1-lravich@amazon.com/
v1: https://lore.kernel.org/linux-crypto/20260519115955.27267-1-lravich@amazon.com/

The series adds a per-request "data unit size" to the skcipher API
so a caller can submit several data units (typically 512..4096-byte
sectors) sharing one starting IV in a single request.  Algorithms
derive each data unit's IV from the caller-supplied IV by treating
it as a 128-bit little-endian counter and adding the data-unit
index, matching the layout produced by dm-crypt's plain64 IV mode
and by typical inline-encryption hardware.

This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.

The first user is dm-crypt, which today issues one skcipher request
per sector and so pays a per-sector cost in request allocation,
callback dispatch, completion handling, and scatterlist setup.

Proof-of-concept performance numbers from the RFC reply [1]: +19%
throughput / -40% CPU on a single-core arm64 system with a hardware
XTS-AES-256 accelerator running fio 4 KiB sequential writes through
dm-crypt, when an out-of-tree arm64 xts driver advertises
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU.  This series itself does not
include arch enablement; the fast path is opt-in per driver, the
slow path is universal via the auto-splitter.

The native fast path amortises both per-sector dispatch and per-sector
crypto setup across a bio - the measured win above, on an engine that
offloads the AES compute.  The auto-splitter is for correctness and
reach: any consumer can set data_unit_size and get correct output with
the per-request allocation/callback/completion cost removed, but it
still issues one alg->encrypt per data unit, so on a software cipher it
saves only dispatch overhead (no throughput figure claimed - that is
hardware- and workload-dependent).  What it guarantees unconditionally
is byte-identical output (Verification below) at O(entries + units),
walking the scatterlists with a pair of struct scatter_walk cursors
rather than rescanning from the head per unit.

[1] https://lore.kernel.org/linux-crypto/20260428101225.24316-1-lravich@amazon.com/

Changes since v3
----------------

- data_unit_size moved from struct crypto_skcipher (per-tfm) to
  struct skcipher_request (per-request).  (Herbert)

- Crypto API auto-splits multi-data-unit requests when the algorithm
  does not advertise CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU.  Drops the
  per-tfm setter/probe in favour of a single
  skcipher_request_set_data_unit_size() usable by every consumer.
  (Herbert)

- CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU is a type-specific cra_flags
  bit (0x01000000) in crypto/internal/skcipher.h, not a generic bit
  in the public header; drivers set it to opt OUT of auto-splitting.

- The auto-splitter advances through src/dst with a pair of struct
  scatter_walk cursors (scatterwalk_start / scatterwalk_get_sglist /
  scatterwalk_skip) instead of scatterwalk_ffwd() per unit, which
  rescans from the head and is O(units^2) under fragmentation; the
  cursors give a single linear pass.  (Eric)

- crypto_skcipher_validate_multi_du() reports -EINVAL for a malformed
  geometry (du not a power of two, cryptlen not a positive multiple)
  and -EOPNOTSUPP for a target that cannot do multi-DU (ivsize != 16,
  lskcipher, or async without the native flag), so a caller can fall
  back.  Gates the native path too, not just the auto-splitter.
  (Eric)

- testmgr cross-checks the batched dispatch against an independent
  N x single-DU reference with LE128-walked IVs over a fragmented
  scatterlist (pins the IV convention and exercises the cursor),
  round-trips, and checks IV preservation.  Ineligible algorithms
  skip via -EOPNOTSUPP; a real mismatch returns -EBADMSG.

- dm-crypt enables batching only for IV modes flagged sector_iv_le128
  (a new bool on struct crypt_iv_operations, set on plain64 only),
  plus ivsize 16, sync, single-tfm, no integrity, no post() hook.  The
  flag replaces a hardcoded plain64 pointer-compare, so eligibility is
  a self-documenting property of the IV mode rather than a special
  case.  plain stays excluded (its 32-bit counter wraps differently
  past 2^32 sectors).  Sets req->data_unit_size = sector_size and
  submits; -EOPNOTSUPP/-EAGAIN fall back to the per-sector path.
  Mikulas's v2 Reviewed-by is dropped as the dm-crypt patch was
  substantially rewritten.

- The generic xts(...) template needs no separate handling, dropping
  the v3 crypto/xts.c patch (4 -> 3 patches).

Design overview
---------------

* Patch 1 adds the data_unit_size field, the setter, the
  CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU flag, and the auto-splitter in
  crypto_skcipher_encrypt()/decrypt().  skcipher_request_set_tfm()
  resets the field so a reused request defaults to single-DU.

* Patch 2 adds the testmgr multi-DU test (every ivsize == 16
  skcipher).

* Patch 3 turns dm-crypt batching on automatically under the
  conditions above and sets req->data_unit_size = cc->sector_size.

This series does NOT add the capability flag to any arch driver; the
auto-splitter ensures correctness without that opt-in.

Verification
------------

A regression protocol is included in the project tree
(.claude/regression-protocol.md, .claude/run-regression.sh).  The
reference run reports 12/12 PASS:

  - x86 + arm64 build clean; checkpatch.pl --strict clean.
  - testmgr multi-DU: PASS for every ivsize == 16 skcipher in-tree.
  - dm-crypt activation gating: plain64 enabled; essiv:sha256 /
    plain64be / plain fall back.
  - dm-crypt round-trip plain64 with multi-DU via the auto-splitter
    (xts-aes-aesni, no native flag): PASS.
  - dm-crypt round-trip essiv:sha256 (per-sector path): PASS.
  - dm-crypt low-memory (mem=128M): PASS, no OOM kill.
  - Byte-equivalence: 256 MB of ciphertext through the auto-splitter
    is bit-identical to an unpatched axboe/for-next baseline (sha256
    4913910b1aa6f8859fcb8f4adec20230274993a3ade8f4dd0140a323dc43efc0).
  - arm64 functional under qemu-aarch64: PASS.



Leonid Ravich (3):
  crypto: skcipher - add per-request data_unit_size with auto-splitting
  crypto: testmgr - test for multi-data-unit dispatch
  dm crypt: batch all sectors of a bio per crypto request

 crypto/skcipher.c                  | 132 +++++++++++++++++++
 crypto/testmgr.c                   | 192 +++++++++++++++++++++++++
 drivers/md/dm-crypt.c              | 215 +++++++++++++++++++++++++++--
 include/crypto/internal/skcipher.h |  10 ++
 include/crypto/skcipher.h          |  28 ++++
 5 files changed, 569 insertions(+), 8 deletions(-)


base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
--
2.47.3


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting
  2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
@ 2026-06-15 11:14 ` Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
  2 siblings, 0 replies; 4+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
	Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
	linux-block

Add a data_unit_size field to struct skcipher_request that lets a
caller submit several data units (typically 512..4096-byte sectors)
sharing one starting IV in a single request.  Algorithms derive each
data unit's IV from the caller-supplied IV by treating it as a
128-bit little-endian counter and adding the data-unit index, which
matches the layout produced by dm-crypt's plain64 IV mode and by
typical inline-encryption hardware.

This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.

The crypto API auto-splits a multi-data-unit request into per-DU
sub-requests when the underlying algorithm does not advertise
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU (a type-specific cra_flags bit,
defined in crypto/internal/skcipher.h).  A consumer sets
data_unit_size and submits: a native driver handles all units in one
pass, otherwise the core splits transparently.  The split derives
per-DU IVs as a 128-bit LE counter, so this is correct only for
algorithms using that IV convention (e.g. XTS with plain64-style
IVs); callers are responsible for that match, as they already are for
the IV itself.

skcipher_request_set_tfm() resets the field to 0 so a request reused
from a pool or stack defaults to single-data-unit semantics; callers
that want batching set it explicitly via
skcipher_request_set_data_unit_size() after configuring the tfm.

crypto_skcipher_encrypt()/decrypt() call
crypto_skcipher_validate_multi_du() before any algorithm dispatch.
data_unit_size must be a power of two when non-zero (realistic sizes
are 512..4096, letting the per-DU loop and the cryptlen alignment
check use a mask instead of a divide) and cryptlen a positive
multiple of it; a malformed geometry is rejected with -EINVAL.  A
target that cannot do multi-DU - ivsize != SKCIPHER_MDU_IVSIZE (16),
an lskcipher, or an async algorithm without the native flag - is
rejected with -EOPNOTSUPP so a caller can fall back.  Async is
excluded because the splitter dispatches synchronously: an
-EINPROGRESS return would leave later units unsubmitted while the
driver still owned the request's scatterlists and IV.  The check
gates the native path too, so algorithms never see a malformed
multi-DU request.

No in-tree algorithm sets CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU yet;
subsequent patches add the testmgr coverage and the dm-crypt
consumer.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 crypto/skcipher.c                  | 132 +++++++++++++++++++++++++++++
 include/crypto/internal/skcipher.h |  10 +++
 include/crypto/skcipher.h          |  28 ++++++
 3 files changed, 170 insertions(+)

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 2b31d1d5d268..9262b47acfb9 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -17,6 +17,7 @@
 #include <linux/cryptouser.h>
 #include <linux/err.h>
 #include <linux/kernel.h>
+#include <linux/log2.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/seq_file.h>
@@ -432,15 +433,139 @@ int crypto_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 }
 EXPORT_SYMBOL_GPL(crypto_skcipher_setkey);
 
+/* IV size for the 128-bit LE-counter multi-data-unit convention. */
+#define SKCIPHER_MDU_IVSIZE	16
+
+static inline void skcipher_iv_inc_le128(u8 *iv)
+{
+	__le64 lo_le, hi_le;
+	u64 lo;
+
+	memcpy(&lo_le, iv, 8);
+	memcpy(&hi_le, iv + 8, 8);
+	lo = le64_to_cpu(lo_le) + 1;
+	lo_le = cpu_to_le64(lo);
+	memcpy(iv, &lo_le, 8);
+	if (unlikely(lo == 0)) {
+		hi_le = cpu_to_le64(le64_to_cpu(hi_le) + 1);
+		memcpy(iv + 8, &hi_le, 8);
+	}
+}
+
+/*
+ * Dispatch a multi-data-unit request as one single-DU sub-request per
+ * unit.  Each unit's IV is the caller's IV plus the unit index, taken
+ * as a 128-bit little-endian counter.  A pair of scatter_walks advances
+ * through src/dst in a single linear pass (O(entries + units)); building
+ * each sub-request's view with scatterwalk_ffwd() would instead rescan
+ * from the head every unit, i.e. O(units^2).
+ */
+static int skcipher_split_data_units(struct skcipher_request *req,
+				     int (*body)(struct skcipher_request *))
+{
+	const unsigned int du = req->data_unit_size;
+	const unsigned int total = req->cryptlen;
+	struct scatterlist *orig_src = req->src;
+	struct scatterlist *orig_dst = req->dst;
+	bool inplace = orig_src == orig_dst;
+	struct scatter_walk src_walk, dst_walk;
+	struct scatterlist src_sg[2], dst_sg[2];
+	u8 iv_orig[SKCIPHER_MDU_IVSIZE];
+	u8 iv_work[SKCIPHER_MDU_IVSIZE];
+	unsigned int off;
+	int err = 0;
+
+	memcpy(iv_orig, req->iv, sizeof(iv_orig));
+	memcpy(iv_work, iv_orig, sizeof(iv_orig));
+
+	sg_init_table(src_sg, 2);
+	scatterwalk_start(&src_walk, orig_src);
+	if (!inplace) {
+		sg_init_table(dst_sg, 2);
+		scatterwalk_start(&dst_walk, orig_dst);
+	}
+
+	/* Stop the per-DU body from re-entering the splitter. */
+	req->data_unit_size = 0;
+	req->src = src_sg;
+	req->dst = inplace ? src_sg : dst_sg;
+
+	for (off = 0; off < total; off += du) {
+		req->cryptlen = du;
+		scatterwalk_get_sglist(&src_walk, src_sg);
+		scatterwalk_skip(&src_walk, du);
+		if (!inplace) {
+			scatterwalk_get_sglist(&dst_walk, dst_sg);
+			scatterwalk_skip(&dst_walk, du);
+		}
+
+		err = body(req);
+		if (err)
+			break;
+
+		skcipher_iv_inc_le128(iv_work);
+		memcpy(req->iv, iv_work, sizeof(iv_work));
+	}
+
+	/* Caller-visible IV is the starting IV regardless of outcome. */
+	memcpy(req->iv, iv_orig, sizeof(iv_orig));
+	req->src = orig_src;
+	req->dst = orig_dst;
+	req->cryptlen = total;
+	req->data_unit_size = du;
+	return err;
+}
+
+static int crypto_skcipher_validate_multi_du(struct skcipher_request *req)
+{
+	const unsigned int du = req->data_unit_size;
+	struct crypto_skcipher *tfm;
+	struct skcipher_alg *alg;
+	u32 cra_flags;
+
+	if (likely(!du))
+		return 0;
+	if (!is_power_of_2(du) || du < SKCIPHER_MDU_IVSIZE)
+		return -EINVAL;
+	if (!req->cryptlen || (req->cryptlen & (du - 1)))
+		return -EINVAL;
+
+	tfm = crypto_skcipher_reqtfm(req);
+	alg = crypto_skcipher_alg(tfm);
+
+	/* lskcipher's *_sg path doesn't honour data_unit_size. */
+	if (alg->co.base.cra_type != &crypto_skcipher_type)
+		return -EOPNOTSUPP;
+
+	/* Capability mismatch, not a malformed request: report -EOPNOTSUPP. */
+	if (crypto_skcipher_ivsize(tfm) != SKCIPHER_MDU_IVSIZE)
+		return -EOPNOTSUPP;
+
+	/* The auto-splitter is sync-only; native drivers own async dispatch. */
+	cra_flags = alg->co.base.cra_flags;
+	if ((cra_flags & CRYPTO_ALG_ASYNC) &&
+	    !(cra_flags & CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 int crypto_skcipher_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+	int err;
 
 	if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
 		return -ENOKEY;
+	err = crypto_skcipher_validate_multi_du(req);
+	if (err)
+		return err;
 	if (alg->co.base.cra_type != &crypto_skcipher_type)
 		return crypto_lskcipher_encrypt_sg(req);
+	if (req->data_unit_size &&
+	    !(alg->co.base.cra_flags & CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU))
+		return skcipher_split_data_units(req, alg->encrypt);
 	return alg->encrypt(req);
 }
 EXPORT_SYMBOL_GPL(crypto_skcipher_encrypt);
@@ -449,11 +574,18 @@ int crypto_skcipher_decrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+	int err;
 
 	if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
 		return -ENOKEY;
+	err = crypto_skcipher_validate_multi_du(req);
+	if (err)
+		return err;
 	if (alg->co.base.cra_type != &crypto_skcipher_type)
 		return crypto_lskcipher_decrypt_sg(req);
+	if (req->data_unit_size &&
+	    !(alg->co.base.cra_flags & CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU))
+		return skcipher_split_data_units(req, alg->decrypt);
 	return alg->decrypt(req);
 }
 EXPORT_SYMBOL_GPL(crypto_skcipher_decrypt);
diff --git a/include/crypto/internal/skcipher.h b/include/crypto/internal/skcipher.h
index a965b6aabf61..4c826f3bc715 100644
--- a/include/crypto/internal/skcipher.h
+++ b/include/crypto/internal/skcipher.h
@@ -21,6 +21,16 @@
  */
 #define CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE CRYPTO_ALG_OPTIONAL_KEY
 
+/*
+ * Set by an skcipher that handles skcipher_request::data_unit_size > 0
+ * natively in one pass; otherwise the API splits the request.  Lives in
+ * the type-specific 0xff000000 cra_flags range.  A native driver must
+ * derive per-DU IVs as a 128-bit LE counter and leave @iv at the
+ * caller-supplied starting value on return, success or error, matching
+ * the auto-splitter so the two paths are observably identical.
+ */
+#define CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU	0x01000000
+
 struct aead_request;
 struct rtattr;
 
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 4efe2ca8c4d1..ced1fae08147 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -31,6 +31,11 @@ struct scatterlist;
 /**
  *	struct skcipher_request - Symmetric key cipher request
  *	@cryptlen: Number of bytes to encrypt or decrypt
+ *	@data_unit_size: Size in bytes of each data unit, or 0 for a
+ *		single-data-unit request (the default).  When non-zero,
+ *		must be a power of two, @cryptlen must be a positive
+ *		multiple of it, and per-DU IVs are derived from @iv as a
+ *		128-bit little-endian counter.
  *	@iv: Initialisation Vector
  *	@src: Source SG list
  *	@dst: Destination SG list
@@ -39,6 +44,7 @@ struct scatterlist;
  */
 struct skcipher_request {
 	unsigned int cryptlen;
+	unsigned int data_unit_size;
 
 	u8 *iv;
 
@@ -225,6 +231,7 @@ struct lskcipher_alg {
 	struct skcipher_request *name = \
 		(((struct skcipher_request *)__##name##_desc)->base.tfm = \
 			crypto_sync_skcipher_tfm((_tfm)), \
+		 ((struct skcipher_request *)__##name##_desc)->data_unit_size = 0, \
 		 (void *)__##name##_desc)
 
 /**
@@ -819,6 +826,8 @@ static inline void skcipher_request_set_tfm(struct skcipher_request *req,
 					    struct crypto_skcipher *tfm)
 {
 	req->base.tfm = crypto_skcipher_tfm(tfm);
+	/* Reused requests default to single-data-unit. */
+	req->data_unit_size = 0;
 }
 
 static inline void skcipher_request_set_sync_tfm(struct skcipher_request *req,
@@ -937,5 +946,24 @@ static inline void skcipher_request_set_crypt(
 	req->iv = iv;
 }
 
+/**
+ * skcipher_request_set_data_unit_size() - submit as multiple data units
+ * @req: request handle
+ * @data_unit_size: data-unit size in bytes (power of two), or 0 to disable
+ *
+ * Process @req as @cryptlen / @data_unit_size data units sharing one starting
+ * @iv, with per-DU IVs derived as a 128-bit little-endian counter.  @cryptlen
+ * must be a positive multiple of @data_unit_size, else the encrypt/decrypt
+ * call returns -EINVAL; a target that cannot do multi-DU (ivsize != 16, an
+ * lskcipher, or async without native support) returns -EOPNOTSUPP.  Unlike
+ * the single-DU path, @iv is preserved across the call regardless of outcome.
+ */
+static inline void
+skcipher_request_set_data_unit_size(struct skcipher_request *req,
+				    unsigned int data_unit_size)
+{
+	req->data_unit_size = data_unit_size;
+}
+
 #endif	/* _CRYPTO_SKCIPHER_H */
 

base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch
  2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
@ 2026-06-15 11:14 ` Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
  2 siblings, 0 replies; 4+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
	Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
	linux-block

Add a test that runs on every skcipher with ivsize == 16.  It
encrypts random plaintext two ways and compares:

  1. one batched request with skcipher_request_set_data_unit_size()
     set, over a deliberately fragmented scatterlist whose entries do
     not align to the data-unit size (so per-DU views cross SG entries
     and exercise the scatter_walk cursor), and
  2. an independent reference of N single-DU requests with IVs walked
     as a 128-bit LE counter, matching the convention documented in
     skcipher_request_set_data_unit_size().

The two must produce byte-identical ciphertext; this pins the IV
convention rather than only checking encrypt/decrypt symmetry.  The
batched ciphertext is then round-tripped back to plaintext, and the
caller IV is checked unchanged.  Iterates over typical data unit
sizes (512, 1024, 2048, 4096).

Algorithms the validator rejects for multi-DU return -EOPNOTSUPP on
the first call and skip cleanly; a genuine mismatch returns -EBADMSG
so it cannot be confused with a skip.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 crypto/testmgr.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4d86efae65b2..5cbd0f4b070e 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3211,6 +3211,194 @@ static int test_skcipher(int enc, const struct cipher_test_suite *suite,
 	return 0;
 }
 
+/* Increment a 16-byte IV as a little-endian 128-bit counter. */
+static void test_mdu_iv_inc(u8 iv[16])
+{
+	int i;
+
+	for (i = 0; i < 16; i++)
+		if (++iv[i])
+			break;
+}
+
+/*
+ * Encrypt one du_size block with a plain single-DU request; used to
+ * build an independent reference for the batched dispatch.
+ */
+static int test_mdu_ref_encrypt(struct crypto_skcipher *tfm, const u8 *in,
+				u8 *out, unsigned int du_size, const u8 iv[16])
+{
+	struct skcipher_request *req;
+	struct scatterlist sg_in, sg_out;
+	DECLARE_CRYPTO_WAIT(wait);
+	u8 ivbuf[16];
+	int err;
+
+	req = skcipher_request_alloc(tfm, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+	memcpy(ivbuf, iv, 16);
+	memcpy(out, in, du_size);
+	sg_init_one(&sg_in, out, du_size);
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+	skcipher_request_set_crypt(req, &sg_in, &sg_in, du_size, ivbuf);
+	err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+	skcipher_request_free(req);
+	return err;
+}
+
+/*
+ * Build a deliberately fragmented SG over @buf: entries that do not
+ * align to du_size, so the splitter's per-DU views cross SG entries
+ * and exercise the scatter_walk cursor.
+ */
+static void test_mdu_sg_fragment(struct scatterlist *sg, unsigned int nents,
+				 u8 *buf, unsigned int total)
+{
+	unsigned int chunk = total / nents;
+	unsigned int off = 0, i;
+
+	sg_init_table(sg, nents);
+	for (i = 0; i < nents; i++) {
+		unsigned int len = (i == nents - 1) ? total - off : chunk;
+
+		sg_set_buf(&sg[i], buf + off, len);
+		off += len;
+	}
+}
+
+/*
+ * Multi-DU test: verify the batched dispatch produces byte-identical
+ * ciphertext to an independent N x single-DU reference with per-DU IVs
+ * walked as a 128-bit LE counter (pins the IV convention, not just
+ * enc/dec symmetry), over a fragmented SG, then round-trips.  Real
+ * mismatches return -EBADMSG; ineligible algorithms skip via the
+ * validator's -EOPNOTSUPP.
+ */
+#define TEST_MDU_NR_UNITS	4
+#define TEST_MDU_NR_FRAGS	5
+static int test_skcipher_multi_du_one(struct crypto_skcipher *tfm,
+				      unsigned int du_size)
+{
+	const char *driver = crypto_skcipher_driver_name(tfm);
+	const unsigned int total = du_size * TEST_MDU_NR_UNITS;
+	struct skcipher_request *req = NULL;
+	struct scatterlist sg[TEST_MDU_NR_FRAGS];
+	DECLARE_CRYPTO_WAIT(wait);
+	u8 iv_orig[16], iv_work[16], iv_ref[16];
+	u8 *plain = NULL, *buf = NULL, *ref = NULL;
+	unsigned int u;
+	int err;
+
+	plain = kmalloc(total, GFP_KERNEL);
+	buf = kmalloc(total, GFP_KERNEL);
+	ref = kmalloc(total, GFP_KERNEL);
+	req = skcipher_request_alloc(tfm, GFP_KERNEL);
+	if (!plain || !buf || !ref || !req) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	get_random_bytes(plain, total);
+	get_random_bytes(iv_orig, sizeof(iv_orig));
+
+	/* Reference: per-DU single requests with LE128-walked IVs. */
+	memcpy(iv_ref, iv_orig, sizeof(iv_orig));
+	for (u = 0; u < TEST_MDU_NR_UNITS; u++) {
+		err = test_mdu_ref_encrypt(tfm, plain + u * du_size,
+					   ref + u * du_size, du_size, iv_ref);
+		/* First single-DU call reveals an ineligible algorithm. */
+		if (err == -EOPNOTSUPP && u == 0)
+			goto out;
+		if (err) {
+			pr_err("alg: skcipher: %s multi-DU ref encrypt failed (du=%u): %d\n",
+			       driver, du_size, err);
+			goto out;
+		}
+		test_mdu_iv_inc(iv_ref);
+	}
+
+	/* Batched: one request over a fragmented SG. */
+	memcpy(buf, plain, total);
+	memcpy(iv_work, iv_orig, sizeof(iv_orig));
+	test_mdu_sg_fragment(sg, TEST_MDU_NR_FRAGS, buf, total);
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+	skcipher_request_set_crypt(req, sg, sg, total, iv_work);
+	skcipher_request_set_data_unit_size(req, du_size);
+	err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+	if (err == -EOPNOTSUPP)
+		goto out;
+	if (err) {
+		pr_err("alg: skcipher: %s multi-DU encrypt failed (du=%u): %d\n",
+		       driver, du_size, err);
+		goto out;
+	}
+	if (memcmp(buf, ref, total) != 0) {
+		pr_err("alg: skcipher: %s multi-DU ciphertext differs from single-DU reference (du=%u)\n",
+		       driver, du_size);
+		err = -EBADMSG;
+		goto out;
+	}
+	/* req->iv must be unchanged after multi-DU dispatch. */
+	if (memcmp(iv_work, iv_orig, sizeof(iv_orig)) != 0) {
+		pr_err("alg: skcipher: %s multi-DU encrypt mutated caller IV (du=%u)\n",
+		       driver, du_size);
+		err = -EBADMSG;
+		goto out;
+	}
+
+	/* Round-trip the batched ciphertext back to plaintext. */
+	test_mdu_sg_fragment(sg, TEST_MDU_NR_FRAGS, buf, total);
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+	skcipher_request_set_crypt(req, sg, sg, total, iv_work);
+	skcipher_request_set_data_unit_size(req, du_size);
+	err = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
+	if (err) {
+		pr_err("alg: skcipher: %s multi-DU decrypt failed (du=%u): %d\n",
+		       driver, du_size, err);
+		goto out;
+	}
+	if (memcmp(buf, plain, total) != 0) {
+		pr_err("alg: skcipher: %s multi-DU round-trip mismatch (du=%u)\n",
+		       driver, du_size);
+		err = -EBADMSG;
+	}
+
+out:
+	skcipher_request_free(req);
+	kfree(ref);
+	kfree(buf);
+	kfree(plain);
+	return err;
+}
+
+static int test_skcipher_multi_du(struct crypto_skcipher *tfm)
+{
+	static const unsigned int du_sizes[] = { 512, 1024, 2048, 4096 };
+	unsigned int j;
+	int err;
+
+	if (crypto_skcipher_ivsize(tfm) != 16)
+		return 0;
+
+	for (j = 0; j < ARRAY_SIZE(du_sizes); j++) {
+		err = test_skcipher_multi_du_one(tfm, du_sizes[j]);
+		/* Ineligible algorithms skip; real failures propagate. */
+		if (err == -EOPNOTSUPP)
+			return 0;
+		if (err)
+			return err;
+		cond_resched();
+	}
+	return 0;
+}
+
 static int alg_test_skcipher(const struct alg_test_desc *desc,
 			     const char *driver, u32 type, u32 mask)
 {
@@ -3259,6 +3447,10 @@ static int alg_test_skcipher(const struct alg_test_desc *desc,
 	if (err)
 		goto out;
 
+	err = test_skcipher_multi_du(tfm);
+	if (err)
+		goto out;
+
 	err = test_skcipher_vs_generic_impl(desc->generic_driver, req, tsgls);
 out:
 	free_cipher_test_sglists(tsgls);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request
  2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
  2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
@ 2026-06-15 11:14 ` Leonid Ravich
  2 siblings, 0 replies; 4+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
	Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
	linux-block

Submit one skcipher request per bio with
skcipher_request_set_data_unit_size(req, cc->sector_size) instead of
issuing one request per sector.  This removes per-sector overhead in
the crypto API hot path: request allocation, callback dispatch,
completion handling, and SG setup.

The optimisation is enabled automatically at table load when all
of the following hold:

 - the cipher is non-aead (i.e. skcipher), sync, tfms_count 1;
 - the IV mode advertises sector_iv_le128, i.e. its per-sector IV
   advances as a 128-bit LE counter, matching the convention
   documented in skcipher_request_set_data_unit_size().  Only plain64
   sets it today (its 64-bit LE counter extends correctly); plain is
   excluded as its 32-bit counter wraps differently across a
   2^32-sector boundary;
 - ivsize is 16 (the core rejects other sizes with -EOPNOTSUPP);
 - the iv_gen_ops->post() hook is unset;
 - dm-integrity is not stacked (no integrity tag or integrity IV).

The cipher driver does not need to advertise anything: the crypto
API auto-splits multi-data-unit requests for drivers that cannot
handle them natively, so dm-crypt sees the same fast batched
submission contract regardless of the underlying driver.

A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
time, gates the multi-data-unit dispatch.  The existing per-sector
path in crypt_convert_block_skcipher() is unchanged; the new
crypt_convert_block_skcipher_multi() is reached from a small
dispatch in crypt_convert() and shares the same backlog/-EBUSY/
-EINPROGRESS flow control with the per-sector path.

Heap-allocated scatterlists are stashed in dm_crypt_request and
freed in crypt_free_req_skcipher() to avoid races between the
synchronous-success free path and async-completion reuse from the
request pool.  On scatterlist allocation failure the helper returns
-EAGAIN, and the core returns -EOPNOTSUPP if a driver turns out
unable to do multi-DU; crypt_convert() handles both by clearing its
local multi_du flag and falling back to the per-sector path for the
rest of the current crypt_convert() invocation, ensuring forward progress
on the swap-out-to-dm-crypt path even under total memory exhaustion
(the per-sector path uses only cc->req_pool, a mempool with
reservoir set up at table-load time, and the inline
dmreq->sg_in[]/sg_out[] arrays — no allocation that could fail).

Verified end-to-end with a byte-equivalence test: encrypted output
of plain64 dm-crypt with the multi-data-unit path matches output of
the single-data-unit path bit-for-bit over a 256 MB device, with
xts-aes-aesni driving the auto-split path.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 drivers/md/dm-crypt.c | 215 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 207 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 608b617fb817..bfb98dd876d7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -101,6 +101,9 @@ struct dm_crypt_request {
 	struct scatterlist sg_in[4];
 	struct scatterlist sg_out[4];
 	u64 iv_sector;
+	/* Multi-data-unit SG arrays, NULL when sg_in[]/sg_out[] suffice. */
+	struct scatterlist *sg_in_ext;
+	struct scatterlist *sg_out_ext;
 };
 
 struct crypt_config;
@@ -115,6 +118,12 @@ struct crypt_iv_operations {
 			 struct dm_crypt_request *dmreq);
 	void (*post)(struct crypt_config *cc, u8 *iv,
 		     struct dm_crypt_request *dmreq);
+	/*
+	 * The per-sector IV advances as a 128-bit LE counter, so a bio's
+	 * consecutive sectors share one starting IV and can be batched into
+	 * a single skcipher request via data_unit_size.
+	 */
+	bool sector_iv_le128;
 };
 
 struct iv_benbi_private {
@@ -151,6 +160,7 @@ enum cipher_flags {
 	CRYPT_IV_LARGE_SECTORS,		/* Calculate IV from sector_size, not 512B sectors */
 	CRYPT_ENCRYPT_PREPROCESS,	/* Must preprocess data for encryption (elephant) */
 	CRYPT_KEY_MAC_SIZE_SET,		/* The integrity_key_size option was used */
+	CRYPT_MULTI_DATA_UNIT,		/* Batch all sectors of a bio per crypto request */
 };
 
 /*
@@ -1018,7 +1028,8 @@ static const struct crypt_iv_operations crypt_iv_plain_ops = {
 };
 
 static const struct crypt_iv_operations crypt_iv_plain64_ops = {
-	.generator = crypt_iv_plain64_gen
+	.generator = crypt_iv_plain64_gen,
+	.sector_iv_le128 = true,
 };
 
 static const struct crypt_iv_operations crypt_iv_plain64be_ops = {
@@ -1426,12 +1437,126 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	return r;
 }
 
+/*
+ * Submit all remaining sectors of the current bio in one skcipher request.
+ * Same return convention as crypt_convert_block_skcipher() except for
+ * -EAGAIN, which the caller must treat as "disable multi-DU and re-enter
+ * the per-sector path" so swap-out-to-dm-crypt always makes forward
+ * progress on the mempool reserve.
+ */
+static int crypt_convert_block_skcipher_multi(struct crypt_config *cc,
+					      struct convert_context *ctx,
+					      struct skcipher_request *req,
+					      unsigned int *out_processed)
+{
+	const unsigned int sector_size = cc->sector_size;
+	const gfp_t gfp = GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN;
+	unsigned int total = ctx->iter_in.bi_size;
+	unsigned int n_sg_in = 0, n_sg_out = 0;
+	struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+	struct scatterlist *sg_in = NULL, *sg_out = NULL;
+	struct bvec_iter iter_in, iter_out;
+	struct bio_vec bv;
+	u8 *iv, *org_iv;
+	int r;
+
+	if (WARN_ON_ONCE(ctx->iter_in.bi_size != ctx->iter_out.bi_size))
+		return -EIO;
+	if (unlikely(total & (sector_size - 1)))
+		return -EIO;
+
+	iter_in = ctx->iter_in;
+	iter_in.bi_size = total;
+	__bio_for_each_bvec(bv, ctx->bio_in, iter_in, iter_in)
+		n_sg_in++;
+
+	iter_out = ctx->iter_out;
+	iter_out.bi_size = total;
+	__bio_for_each_bvec(bv, ctx->bio_out, iter_out, iter_out)
+		n_sg_out++;
+
+	sg_in = kmalloc_array(n_sg_in, sizeof(*sg_in), gfp);
+	sg_out = (ctx->bio_in == ctx->bio_out) ? sg_in :
+		 kmalloc_array(n_sg_out, sizeof(*sg_out), gfp);
+	if (!sg_in || !sg_out) {
+		kfree(sg_in);
+		if (sg_out != sg_in)
+			kfree(sg_out);
+		return -EAGAIN;
+	}
+
+	sg_init_table(sg_in, n_sg_in);
+	{
+		unsigned int i = 0;
+
+		iter_in = ctx->iter_in;
+		iter_in.bi_size = total;
+		__bio_for_each_bvec(bv, ctx->bio_in, iter_in, iter_in)
+			sg_set_page(&sg_in[i++], bv.bv_page, bv.bv_len,
+				    bv.bv_offset);
+	}
+
+	if (sg_out != sg_in) {
+		unsigned int i = 0;
+
+		sg_init_table(sg_out, n_sg_out);
+		iter_out = ctx->iter_out;
+		iter_out.bi_size = total;
+		__bio_for_each_bvec(bv, ctx->bio_out, iter_out, iter_out)
+			sg_set_page(&sg_out[i++], bv.bv_page, bv.bv_len,
+				    bv.bv_offset);
+	}
+
+	dmreq->iv_sector = ctx->cc_sector;
+	if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+		dmreq->iv_sector >>= cc->sector_shift;
+	dmreq->ctx = ctx;
+
+	iv = iv_of_dmreq(cc, dmreq);
+	org_iv = org_iv_of_dmreq(cc, dmreq);
+	r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
+	if (r < 0)
+		goto out_free_sg;
+	memcpy(iv, org_iv, cc->iv_size);
+
+	dmreq->sg_in_ext = sg_in;
+	dmreq->sg_out_ext = (sg_out == sg_in) ? NULL : sg_out;
+
+	skcipher_request_set_crypt(req, sg_in, sg_out, total, iv);
+	skcipher_request_set_data_unit_size(req, sector_size);
+
+	if (bio_data_dir(ctx->bio_in) == WRITE)
+		r = crypto_skcipher_encrypt(req);
+	else
+		r = crypto_skcipher_decrypt(req);
+
+	/*
+	 * Sync error: kcryptd_async_done won't run, so free the SG
+	 * arrays here.  Async returns (-EINPROGRESS, -EBUSY) hand
+	 * ownership to the completion callback.
+	 */
+	if (r && r != -EINPROGRESS && r != -EBUSY)
+		goto out_free_sg;
+
+	*out_processed = total;
+	return r;
+
+out_free_sg:
+	kfree(sg_in);
+	if (sg_out != sg_in)
+		kfree(sg_out);
+	dmreq->sg_in_ext = NULL;
+	dmreq->sg_out_ext = NULL;
+	return r;
+}
+
 static void kcryptd_async_done(void *async_req, int error);
 
 static int crypt_alloc_req_skcipher(struct crypt_config *cc,
 				     struct convert_context *ctx)
 {
 	unsigned int key_index = ctx->cc_sector & (cc->tfms_count - 1);
+	struct dm_crypt_request *dmreq;
 
 	if (!ctx->r.req) {
 		ctx->r.req = mempool_alloc(&cc->req_pool, in_interrupt() ? GFP_ATOMIC : GFP_NOIO);
@@ -1441,6 +1566,11 @@ static int crypt_alloc_req_skcipher(struct crypt_config *cc,
 
 	skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
 
+	/* Multi-DU SG arrays are owned by the helper that allocates them. */
+	dmreq = dmreq_of_req(cc, ctx->r.req);
+	dmreq->sg_in_ext = NULL;
+	dmreq->sg_out_ext = NULL;
+
 	/*
 	 * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 	 * requests if driver request queue is full.
@@ -1487,6 +1617,12 @@ static void crypt_free_req_skcipher(struct crypt_config *cc,
 				    struct skcipher_request *req, struct bio *base_bio)
 {
 	struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
+	struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+
+	kfree(dmreq->sg_in_ext);
+	dmreq->sg_in_ext = NULL;
+	kfree(dmreq->sg_out_ext);
+	dmreq->sg_out_ext = NULL;
 
 	if ((struct skcipher_request *)(io + 1) != req)
 		mempool_free(req, &cc->req_pool);
@@ -1515,7 +1651,9 @@ static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_
 static blk_status_t crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx, bool atomic, bool reset_pending)
 {
-	unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+	const unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+	bool multi_du = test_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+	unsigned int processed;
 	int r;
 
 	/*
@@ -1536,8 +1674,13 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
 
 		atomic_inc(&ctx->cc_pending);
 
+		processed = cc->sector_size;
 		if (crypt_integrity_aead(cc))
 			r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, ctx->tag_offset);
+		else if (multi_du)
+			r = crypt_convert_block_skcipher_multi(cc, ctx,
+							       ctx->r.req,
+							       &processed);
 		else
 			r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, ctx->tag_offset);
 
@@ -1559,8 +1702,19 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
 					 * exit and continue processing in a workqueue
 					 */
 					ctx->r.req = NULL;
-					ctx->tag_offset++;
-					ctx->cc_sector += sector_step;
+					if (!multi_du) {
+						ctx->tag_offset++;
+						ctx->cc_sector += sector_step;
+					} else {
+						bio_advance_iter(ctx->bio_in,
+								 &ctx->iter_in,
+								 processed);
+						bio_advance_iter(ctx->bio_out,
+								 &ctx->iter_out,
+								 processed);
+						ctx->cc_sector +=
+							processed >> SECTOR_SHIFT;
+					}
 					return BLK_STS_DEV_RESOURCE;
 				}
 			} else {
@@ -1574,19 +1728,41 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
 		 */
 		case -EINPROGRESS:
 			ctx->r.req = NULL;
-			ctx->tag_offset++;
-			ctx->cc_sector += sector_step;
+			if (!multi_du) {
+				ctx->tag_offset++;
+				ctx->cc_sector += sector_step;
+			} else {
+				bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+						 processed);
+				bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+						 processed);
+				ctx->cc_sector += processed >> SECTOR_SHIFT;
+			}
 			continue;
 		/*
 		 * The request was already processed (synchronously).
 		 */
 		case 0:
 			atomic_dec(&ctx->cc_pending);
-			ctx->cc_sector += sector_step;
-			ctx->tag_offset++;
+			if (!multi_du) {
+				ctx->cc_sector += sector_step;
+				ctx->tag_offset++;
+			} else {
+				bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+						 processed);
+				bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+						 processed);
+				ctx->cc_sector += processed >> SECTOR_SHIFT;
+			}
 			if (!atomic)
 				cond_resched();
 			continue;
+		/* Multi-DU rejected (no memory or sync-only mismatch): fall back. */
+		case -EAGAIN:
+		case -EOPNOTSUPP:
+			atomic_dec(&ctx->cc_pending);
+			multi_du = false;
+			continue;
 		/*
 		 * There was a data integrity error.
 		 */
@@ -3063,6 +3239,29 @@ static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
 		}
 	}
 
+	/*
+	 * Enable multi-data-unit batching only when per-DU IVs can be
+	 * derived from one starting IV as a 128-bit LE counter, matching
+	 * skcipher_request_set_data_unit_size().  Only IV modes flagged
+	 * sector_iv_le128 qualify (plain64; not plain, whose 32-bit counter
+	 * wraps differently across a 2^32-sector boundary).  ivsize must be
+	 * 16 (the core rejects otherwise) and the cipher must be sync,
+	 * single-tfm, no integrity, no per-sector post() hook.  The driver
+	 * advertises nothing: the core auto-splits for drivers that lack
+	 * native support.
+	 */
+	if (!crypt_integrity_aead(cc) && cc->tfms_count == 1 &&
+	    cc->iv_gen_ops && cc->iv_gen_ops->sector_iv_le128 &&
+	    !cc->iv_gen_ops->post &&
+	    !cc->integrity_tag_size && !cc->integrity_iv_size &&
+	    crypto_skcipher_ivsize(any_tfm(cc)) == 16 &&
+	    !(crypto_skcipher_alg(any_tfm(cc))->base.cra_flags &
+	      CRYPTO_ALG_ASYNC)) {
+		set_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+		DMINFO("Using multi-data-unit crypto offload (du=%u)",
+		       cc->sector_size);
+	}
+
 	/* wipe the kernel key payload copy */
 	if (cc->key_string)
 		memset(cc->key, 0, cc->key_size * sizeof(u8));
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-15 11:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox