Linux cryptographic layer development
 help / color / mirror / Atom feed
* [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template
@ 2026-06-30  8:34 Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size Leonid Ravich
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Leonid Ravich @ 2026-06-30  8:34 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: linux-block, linux-kernel, herbert, davem, ebiggers, snitzer,
	mpatocka, axboe

This is v5. It reworks the multi-data-unit support from the in-core
auto-splitter of v4 into a crypto template, dun(...), addressing the v4
review: there is now no added cost on the core skcipher path, no
per-algorithm capability flag, and the per-data-unit split lives in an
algorithm rather than in crypto_skcipher_encrypt/decrypt the shape
Herbert suggested, which removes the "overhead for everyone" Eric
objected to.

v4: https://lore.kernel.org/linux-crypto/20260615105022.8025-1-lravich@amazon.com/

Model
---

A skcipher_request gains a data_unit_size field (patch 1). When set,
the request covers cryptlen / data_unit_size data units sharing one
starting IV; per-unit IVs are derived from the IV as a wide data-unit-
number (DUN) counter the convention blk-crypto already uses for
inline encryption.

dun(...) (patch 2) is a template that wraps an inner skcipher whose IV
is that counter (e.g. dun(xts(aes),le)). Its ->encrypt/->decrypt split
the request into one inner call per data unit, walking the IV +1 each
unit; each inner call is direct, so only the outer dispatch into the API
is indirect. A plain skcipher is unchanged and ignores data_unit_size,
so existing callers pay nothing the field is inert and the core
en/decrypt path is untouched.

The second template parameter selects how the per-unit IV advances. A
neighbour relates by a +1 step in exactly one of two ways, little- or
big-endian, so dun(...,le) / dun(...,be) is a closed parameter space,
not an open-ended set of "IV types". Internally each is one row of a
small struct dun_mode op table (an iv_next walk plus an ivsize
predicate); adding a future convention e.g. a width-bounded counter,
or an affine sector<<shift+k step is one row, with the dispatch loop
unchanged. IV constructions that are not such a counter are simply not
wrapped (the consumer keeps its per-unit path); an IV that is encrypted
(essiv) composes as the inner algorithm, dun(essiv(...),le), since the
encryption already lives in that inner template.

Why a template
--------------

  - No core cost for anyone. crypto_skcipher_encrypt/decrypt are stock;
    only a dun() tfm reads data_unit_size. (addresses Eric's "adds
    checks/overhead for everyone")

  - No capability flag. A hardware engine that handles a whole multi-DU
    request in one pass registers its own dun(xts(aes),le) at a higher
    cra_priority and is picked automatically exactly how
    xts-aes-aesni already beats generic xts. No CRYPTO_ALG_* bit, no
    core branch choosing native-vs-split. Such a native driver may also
    be async (it owns its dispatch); only the generic template is
    sync-only.

  - The split is in the algorithm. (the direction Herbert described)

  - It is the same kind of wrapper crypto/ already has. Like cryptd()
    (async dispatch) and pcrypt() (parallel dispatch), dun() wraps an
    inner skcipher and changes only how the request is dispatched 
    here, split across data units performing no cipher transform of
    its own.

  - It is a reusable primitive, not a dm-crypt feature. Two in-tree
    consumers are included: dm-crypt (patch 4) and blk-crypto-fallback
    (patch 5), which both open-code the per-DUN loop today; fscrypt's
    direct (non-inline) path open-codes the same loop and could follow.
    A HW engine is a provider via cra_priority. Consumers and providers
    are decoupled through one named algorithm.

What it does and does not buy
-----------------------------

On a software cipher this is not a throughput win: the generic template
still issues one inner encrypt per data unit, so the AES compute is
unchanged. It removes per-request overhead and the consumer's
open-coded per-unit loop, and is byte-for-byte identical to the
per-sector path (Verification). The win is for a one-pass provider; no
software throughput is claimed.

dm-crypt consumer (patch 4)
---------------------------

dm-crypt submits one request per contiguous bio segment with
data_unit_size = cc->sector_size (e.g. the default 512-byte sector with
a 4 KiB bio_vec -> one request of 8 data units), using only its existing
inline single-entry scatterlist no per-bio allocation, no regression.
It allocates dun(<cipher>,<endian>) instead of the bare cipher when the
config can form the DUN counter: a counter IV mode (plain64 -> le,
plain64be -> be; essiv/lmk/tcw etc. are not plain counters and stay
per-sector), single-tfm, non-aead, sector_size 512 or iv_large_sectors.
DM_CRYPT selects CRYPTO_DUN and the template resolves against a sync
inner, so there is no acceptable wrap failure the bare cipher would
survive; an integrity config keeps an inert dun() wrapper but never
batches (one inner call per request == the per-sector path).

blk-crypto-fallback consumer (patch 5)
--------------------------------------

Every blk-crypto inline-encryption mode feeds the DUN as a little-endian
counter, so the fallback wraps its cipher as dun(<cipher>,le)
unconditionally (BLK_INLINE_ENCRYPTION_FALLBACK selects CRYPTO_DUN).
Because the template handles any counter width up to 32 bytes, this
covers all four modes AES-256-XTS, AES-128-CBC-ESSIV, Adiantum
(32-byte IV) and SM4-XTS and the open-coded per-unit loop is removed
from both the encrypt and decrypt paths.

Verification
------------

Regression protocol in the tree, on x86 + arm64 under qemu: build clean
and checkpatch strict clean (the lone warning is the new-file
MAINTAINERS reminder; crypto/ is an F: catch-all); testmgr dun()
cross-check (batched == N x single-DU reference over a fragmented
scatterlist, plus a boundary-seeded IV that forces a carry across a
64-bit limb / byte run) for every accepted ivsize including 32 (Adiantum)
in BOTH dun(...,le) and dun(...,be), so the big-endian counter path is
exercised independently of any consumer; an AF_ALG probe forces the
dun() cross-check to run for each blk-crypto inner cipher
(dun(essiv(cbc(aes),sha256),le), dun(adiantum(xchacha12,aes),le), ...);
dm-crypt plain64/plain64be activate dun() (le/be), essiv / plain fall
back; negative gates (multikey and integrity not batched); plain64 and
plain64be round-trips and a 4096-byte iv_large_sectors round-trip;
low-memory; arm64 functional; an end-to-end blk-crypto-fallback test
(ext4 + fscrypt -o inlinecrypt with no inline HW, driving dun(xts,le)
and verifying a post-cache-drop round-trip); and byte-equivalence:
ciphertext is bit-identical to an unpatched axboe/for-next baseline
(sha256 4913910b...43efc0 le, da0869a9...63004 be).

Changes since v4
----------------

- The in-core auto-splitter and validator are gone; multi-DU dispatch is
  the dun(...) template. crypto_skcipher_encrypt/decrypt revert to
  stock, so there is no added cost on the core path.
- The CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU capability flag is dropped; a
  native one-pass driver is selected by cra_priority instead.
- The template is dun(<inner>,<endian>) in the cryptd()/pcrypt() family
  of dispatch-only wrappers; the counter endianness (le/be) is its
  second parameter, backed by a struct dun_mode op table so a future
  counter convention is one table row. It handles any counter width up
 to 32 bytes (covering Adiantum) and rejects a data_unit_size 0 /
 cryptlen 0 request.
- dm-crypt allocates dun(<cipher>,le|be) when eligible (selecting the IV
 mode before tfm allocation); plain64 -> le, plain64be -> be. An
  integrity config keeps an inert dun() wrapper but never batches.
  DM_CRYPT selects CRYPTO_DUN.
- blk-crypto-fallback is a second consumer (patch 5), demonstrating the
  template is a shared primitive, not dm-crypt-only; it wraps every mode
  as dun(<cipher>,le) and BLK_INLINE_ENCRYPTION_FALLBACK selects
  CRYPTO_DUN.
- testmgr exercises the template via dun(<inner>,le) and dun(<inner>,be),
  including ivsize 32 and a carry-boundary IV; an end-to-end fscrypt
  -o inlinecrypt test drives the blk-crypto-fallback consumer.

Leonid Ravich (5):
  crypto: skcipher - add per-request data_unit_size
  crypto: dun - data-unit-number dispatch template
  crypto: testmgr - test dun() dispatch
  dm crypt: batch a bio segment's sectors via dun()
  blk-crypto: fallback - batch a segment's data units via dun()

 block/Kconfig               |   1 +
 block/blk-crypto-fallback.c |  74 ++++----
 crypto/Kconfig              |  14 ++
 crypto/Makefile             |   1 +
 crypto/dun.c                | 359 ++++++++++++++++++++++++++++++++++++
 crypto/testmgr.c            | 289 +++++++++++++++++++++++++++++
 drivers/md/Kconfig          |   1 +
 drivers/md/dm-crypt.c       | 208 ++++++++++++++++-----
 include/crypto/skcipher.h   |  34 ++++
 9 files changed, 899 insertions(+), 82 deletions(-)
 create mode 100644 crypto/dun.c


base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size
  2026-06-30  8:34 [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template Leonid Ravich
@ 2026-06-30  8:34 ` Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 2/5] crypto: dun - data-unit-number dispatch template Leonid Ravich
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Leonid Ravich @ 2026-06-30  8:34 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: linux-block, linux-kernel, herbert, davem, ebiggers, snitzer,
	mpatocka, axboe

Add a data_unit_size field to struct skcipher_request.  When non-zero,
the request covers cryptlen / data_unit_size data units that share one
starting IV; per-unit IVs are derived from the request IV as a wide
data-unit-number counter (the convention also used by blk-crypto for
inline encryption).  cryptlen must be a positive multiple of
data_unit_size.

The field is honoured by an skcipher that understands data units -- an
instance of the dun(...) template (added next), or a driver that handles
a whole multi-DU request natively.  A plain skcipher ignores it, so the
field is inert for every existing caller; the core en/decrypt path is
unchanged.  skcipher_request_set_tfm() and the on-stack request
initialiser reset it to 0 so a reused request defaults to single-DU.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 include/crypto/skcipher.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 4efe2ca8c4d1..1121be80cb53 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -31,6 +31,13 @@ struct scatterlist;
 /**
  *	struct skcipher_request - Symmetric key cipher request
  *	@cryptlen: Number of bytes to encrypt or decrypt
+ *	@data_unit_size: Size in bytes of each data unit, or 0 for a
+ *		single-data-unit request (the default).  When non-zero,
+ *		must be a multiple of the cipher block size, @cryptlen must
+ *		be a positive multiple of it, and per-DU IVs are derived from
+ *		@iv as a wide counter (the data-unit-number convention); the
+ *		counter width and endianness are chosen by the consumer (e.g.
+ *		the dun() template's second parameter).
  *	@iv: Initialisation Vector
  *	@src: Source SG list
  *	@dst: Destination SG list
@@ -39,6 +46,7 @@ struct scatterlist;
  */
 struct skcipher_request {
 	unsigned int cryptlen;
+	unsigned int data_unit_size;
 
 	u8 *iv;
 
@@ -225,6 +233,7 @@ struct lskcipher_alg {
 	struct skcipher_request *name = \
 		(((struct skcipher_request *)__##name##_desc)->base.tfm = \
 			crypto_sync_skcipher_tfm((_tfm)), \
+		 ((struct skcipher_request *)__##name##_desc)->data_unit_size = 0, \
 		 (void *)__##name##_desc)
 
 /**
@@ -819,6 +828,8 @@ static inline void skcipher_request_set_tfm(struct skcipher_request *req,
 					    struct crypto_skcipher *tfm)
 {
 	req->base.tfm = crypto_skcipher_tfm(tfm);
+	/* Reused requests default to single-data-unit. */
+	req->data_unit_size = 0;
 }
 
 static inline void skcipher_request_set_sync_tfm(struct skcipher_request *req,
@@ -937,5 +948,28 @@ static inline void skcipher_request_set_crypt(
 	req->iv = iv;
 }
 
+/**
+ * skcipher_request_set_data_unit_size() - submit as multiple data units
+ * @req: request handle
+ * @data_unit_size: data-unit size in bytes (a multiple of the cipher block
+ *		    size), or 0 to disable
+ *
+ * Process @req as @cryptlen / @data_unit_size data units sharing one starting
+ * @iv, with per-DU IVs derived by treating @iv as a wide counter (the data-
+ * unit-number convention).  @cryptlen must be a positive multiple of
+ * @data_unit_size.  This is honoured only by a tfm that understands data
+ * units -- an instance of the dun(...) template (which splits the request
+ * into one inner call per unit, with the counter endianness given as its
+ * second parameter), or a driver that consumes a whole multi-DU request
+ * natively, which rejects a request violating these constraints with -EINVAL.
+ * A plain skcipher ignores the field.
+ */
+static inline void
+skcipher_request_set_data_unit_size(struct skcipher_request *req,
+				    unsigned int data_unit_size)
+{
+	req->data_unit_size = data_unit_size;
+}
+
 #endif	/* _CRYPTO_SKCIPHER_H */
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 2/5] crypto: dun - data-unit-number dispatch template
  2026-06-30  8:34 [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size Leonid Ravich
@ 2026-06-30  8:34 ` Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 3/5] crypto: testmgr - test dun() dispatch Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 4/5] dm crypt: batch a bio segment's sectors via dun() Leonid Ravich
  3 siblings, 0 replies; 5+ messages in thread
From: Leonid Ravich @ 2026-06-30  8:34 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: linux-block, linux-kernel, herbert, davem, ebiggers, snitzer,
	mpatocka, axboe

Add a dun(...) skcipher template that wraps an inner skcipher whose IV
is a wide data-unit-number counter (e.g. dun(xts(aes),le)).  When the
caller sets skcipher_request::data_unit_size, the template splits the
request into cryptlen / data_unit_size sub-requests on the inner cipher,
walking the IV +1 per unit.  Each inner ->encrypt/->decrypt is a direct
call, so only the outer dispatch into the crypto API is indirect -- the
per-unit work is not.

The second template parameter selects the counter endianness: dun(...,le)
for a little-endian counter (dm-crypt plain64, blk-crypto inline
encryption) and dun(...,be) for a big-endian one (dm-crypt plain64be).
Those are the only two ways a per-unit IV relates to its neighbour by a
+1 step; IV modes that are not such a counter are simply not wrapped.
Like cryptd() and pcrypt(), dun() wraps an inner skcipher and changes
only how the request is dispatched -- here, split across data units --
performing no cipher transform of its own.

A dun() tfm exists solely for multi-DU dispatch, so a request with
data_unit_size 0 is rejected with -EINVAL; a caller wanting plain
single-DU encryption uses the inner skcipher.

A hardware engine that consumes a whole multi-DU request in one pass
registers its own dun(...) at a higher cra_priority and is selected
automatically by the existing priority mechanism; no per-algorithm
capability flag is needed.  The generic template is sync-only (the split
loop treats any non-zero inner return as terminal), so it resolves against
a sync inner cipher (mask | CRYPTO_ALG_ASYNC); async is left to such
native drivers.

The inner IV must be a whole number of 64-bit limbs and no wider than 32
bytes: 16 covers xts(...), 32 covers the widest inline-encryption mode
(Adiantum).

Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 crypto/Kconfig  |  14 ++
 crypto/Makefile |   1 +
 crypto/dun.c    | 359 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 374 insertions(+)
 create mode 100644 crypto/dun.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 103d1f58cb7c..4f90a780c4fc 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -746,6 +746,20 @@ config CRYPTO_XTS
 	  implementation currently can't handle a sectorsize which is not a
 	  multiple of 16 bytes.
 
+config CRYPTO_DUN
+	tristate "Data-unit-number (DUN) dispatch template"
+	select CRYPTO_SKCIPHER
+	select CRYPTO_MANAGER
+	help
+	  dun(...) wraps an skcipher whose IV is a wide data-unit-number
+	  counter (e.g. xts(aes)) and lets a caller submit several data units
+	  sharing one starting IV in a single request, via
+	  skcipher_request::data_unit_size.  The counter endianness is the
+	  second parameter: dun(xts(aes),le) or dun(xts(aes),be).  The template
+	  splits the request into one inner call per data unit; a hardware
+	  driver may register a higher-priority dun(...) that handles the whole
+	  request in one pass.  The first user is dm-crypt.
+
 endmenu
 
 menu "AEAD (authenticated encryption with associated data) ciphers"
diff --git a/crypto/Makefile b/crypto/Makefile
index 162242593c7c..584d9e8c4347 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -93,6 +93,7 @@ obj-$(CONFIG_CRYPTO_PCBC) += pcbc.o
 obj-$(CONFIG_CRYPTO_CTS) += cts.o
 obj-$(CONFIG_CRYPTO_LRW) += lrw.o
 obj-$(CONFIG_CRYPTO_XTS) += xts.o
+obj-$(CONFIG_CRYPTO_DUN) += dun.o
 obj-$(CONFIG_CRYPTO_CTR) += ctr.o
 obj-$(CONFIG_CRYPTO_XCTR) += xctr.o
 obj-$(CONFIG_CRYPTO_HCTR2) += hctr2.o
diff --git a/crypto/dun.c b/crypto/dun.c
new file mode 100644
index 000000000000..4fcb81a025b9
--- /dev/null
+++ b/crypto/dun.c
@@ -0,0 +1,359 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * dun: data-unit-number dispatch template for skcipher
+ *
+ * Wraps an inner skcipher (e.g. xts(aes)) and, when the caller sets
+ * skcipher_request::data_unit_size, splits the request into cryptlen /
+ * data_unit_size sub-requests, each unit's IV the previous one +1 -- the
+ * data-unit-number (DUN) convention.  The second parameter selects the IV
+ * walk (see struct dun_mode): dun(xts(aes),le) or dun(xts(aes),be).
+ *
+ * Like cryptd()/pcrypt(), dun() only changes how a request is dispatched and
+ * performs no transform of its own; a native one-pass multi-DU driver wins by
+ * cra_priority.  Callers that never set data_unit_size pay nothing.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/scatterwalk.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+/* Bounds the on-stack IV buffers: 16 covers xts(...), 32 covers Adiantum. */
+#define DUN_MAX_IVSIZE		32
+
+/*
+ * A dun() mode is the rule for deriving each data unit's IV from the request's
+ * starting IV.  @name is the template's second parameter; @iv_next advances the
+ * @ivsize-byte @iv in place to the next data unit.  @ivsize_ok rejects IV sizes
+ * the walk can't handle.  Add a row to dun_modes[] to support a new convention.
+ */
+struct dun_mode {
+	const char *name;
+	void (*iv_next)(u8 *iv, unsigned int ivsize);
+	bool (*ivsize_ok)(unsigned int ivsize);
+};
+
+struct dun_tfm_ctx {
+	struct crypto_skcipher *child;
+	const struct dun_mode *mode;
+};
+
+struct dun_inst_ctx {
+	struct crypto_skcipher_spawn spawn;
+	const struct dun_mode *mode;
+};
+
+struct dun_request_ctx {
+	/* Must be last; the child request is appended with its own reqsize. */
+	struct skcipher_request subreq;
+};
+
+/* Little-endian counter: increment the IV per __le64 limb, low limb first. */
+static void dun_iv_next_le(u8 *iv, unsigned int ivsize)
+{
+	unsigned int i;
+
+	for (i = 0; i < ivsize; i += sizeof(__le64)) {
+		__le64 limb;
+		u64 v;
+
+		memcpy(&limb, iv + i, sizeof(limb));
+		v = le64_to_cpu(limb) + 1;
+		limb = cpu_to_le64(v);
+		memcpy(iv + i, &limb, sizeof(limb));
+		if (likely(v != 0))
+			break;			/* no carry into the next limb */
+	}
+}
+
+/* Big-endian counter: increment the IV byte-wise from the last byte. */
+static void dun_iv_next_be(u8 *iv, unsigned int ivsize)
+{
+	unsigned int i = ivsize;
+
+	while (i--) {
+		if (likely(++iv[i]))
+			break;			/* no carry into the next byte */
+	}
+}
+
+/*
+ * le requires this: it walks the IV in __le64 limbs, so the size must be a
+ * whole number of limbs.  be increments byte-wise and would accept any size,
+ * but reuses the same check for a uniform value-space.
+ */
+static bool dun_ivsize_whole_limbs(unsigned int ivsize)
+{
+	return IS_ALIGNED(ivsize, sizeof(__le64));
+}
+
+static const struct dun_mode dun_modes[] = {
+	{ "le", dun_iv_next_le, dun_ivsize_whole_limbs },
+	{ "be", dun_iv_next_be, dun_ivsize_whole_limbs },
+};
+
+static const struct dun_mode *dun_find_mode(const char *name)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(dun_modes); i++)
+		if (!strcmp(name, dun_modes[i].name))
+			return &dun_modes[i];
+	return NULL;
+}
+
+static int dun_setkey(struct crypto_skcipher *parent, const u8 *key,
+		      unsigned int keylen)
+{
+	struct dun_tfm_ctx *ctx = crypto_skcipher_ctx(parent);
+	struct crypto_skcipher *child = ctx->child;
+
+	crypto_skcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
+	crypto_skcipher_set_flags(child, crypto_skcipher_get_flags(parent) &
+					 CRYPTO_TFM_REQ_MASK);
+	return crypto_skcipher_setkey(child, key, keylen);
+}
+
+/*
+ * Run one inner ->crypt per data unit, walking the IV as a wide counter.
+ * @req->iv is never modified; the inner cipher only sees the iv_unit copy.
+ */
+static int dun_split(struct skcipher_request *req,
+		     int (*crypt)(struct skcipher_request *))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct dun_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct dun_request_ctx *rctx = skcipher_request_ctx(req);
+	struct skcipher_request *subreq = &rctx->subreq;
+	const unsigned int du = req->data_unit_size;
+	const unsigned int total = req->cryptlen;
+	const unsigned int ivsize = crypto_skcipher_ivsize(tfm);
+	const struct dun_mode *mode = ctx->mode;
+	bool inplace = req->src == req->dst;
+	struct scatter_walk src_walk, dst_walk;
+	struct scatterlist src_sg[2], dst_sg[2];
+	u8 iv_ctr[DUN_MAX_IVSIZE];
+	u8 iv_unit[DUN_MAX_IVSIZE];
+	unsigned int off;
+	int err = 0;
+
+	/* iv_ctr is the counter; iv_unit is a per-unit copy an inner may write
+	 * back in place (e.g. xts, essiv), so the counter is never mutated.
+	 */
+	memcpy(iv_ctr, req->iv, ivsize);
+
+	sg_init_table(src_sg, 2);
+	scatterwalk_start(&src_walk, req->src);
+	if (!inplace) {
+		sg_init_table(dst_sg, 2);
+		scatterwalk_start(&dst_walk, req->dst);
+	}
+
+	skcipher_request_set_tfm(subreq, ctx->child);
+	skcipher_request_set_callback(subreq, skcipher_request_flags(req),
+				      NULL, NULL);
+
+	for (off = 0; off < total; off += du) {
+		struct scatterlist *s, *d;
+
+		scatterwalk_get_sglist(&src_walk, src_sg);
+		scatterwalk_skip(&src_walk, du);
+		s = src_sg;
+		if (inplace) {
+			d = src_sg;
+		} else {
+			scatterwalk_get_sglist(&dst_walk, dst_sg);
+			scatterwalk_skip(&dst_walk, du);
+			d = dst_sg;
+		}
+
+		memcpy(iv_unit, iv_ctr, ivsize);
+		skcipher_request_set_crypt(subreq, s, d, du, iv_unit);
+		err = crypt(subreq);
+		if (err)
+			break;
+
+		mode->iv_next(iv_ctr, ivsize);
+	}
+
+	return err;
+}
+
+/*
+ * Validate a multi-DU request: non-zero cryptlen, and a data_unit_size that is
+ * set, a multiple of the block size, and divides cryptlen evenly.
+ */
+static int dun_check(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (!req->cryptlen || !req->data_unit_size ||
+	    !IS_ALIGNED(req->data_unit_size, crypto_skcipher_blocksize(tfm)) ||
+	    (req->cryptlen % req->data_unit_size))
+		return -EINVAL;
+	return 0;
+}
+
+static int dun_encrypt(struct skcipher_request *req)
+{
+	int err = dun_check(req);
+
+	if (err)
+		return err;
+	return dun_split(req, crypto_skcipher_encrypt);
+}
+
+static int dun_decrypt(struct skcipher_request *req)
+{
+	int err = dun_check(req);
+
+	if (err)
+		return err;
+	return dun_split(req, crypto_skcipher_decrypt);
+}
+
+static int dun_init_tfm(struct crypto_skcipher *tfm)
+{
+	struct skcipher_instance *inst = skcipher_alg_instance(tfm);
+	struct dun_inst_ctx *ictx = skcipher_instance_ctx(inst);
+	struct dun_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct crypto_skcipher *child;
+
+	child = crypto_spawn_skcipher(&ictx->spawn);
+	if (IS_ERR(child))
+		return PTR_ERR(child);
+
+	ctx->child = child;
+	ctx->mode = ictx->mode;
+	crypto_skcipher_set_reqsize(tfm,
+				    sizeof(struct dun_request_ctx) +
+				    crypto_skcipher_reqsize(child));
+	return 0;
+}
+
+static void dun_exit_tfm(struct crypto_skcipher *tfm)
+{
+	struct dun_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	crypto_free_skcipher(ctx->child);
+}
+
+static void dun_free_instance(struct skcipher_instance *inst)
+{
+	struct dun_inst_ctx *ictx = skcipher_instance_ctx(inst);
+
+	crypto_drop_skcipher(&ictx->spawn);
+	kfree(inst);
+}
+
+static int dun_create(struct crypto_template *tmpl, struct rtattr **tb)
+{
+	struct skcipher_alg_common *alg;
+	struct skcipher_instance *inst;
+	struct dun_inst_ctx *ictx;
+	const struct dun_mode *mode;
+	const char *cipher_name;
+	const char *mode_name;
+	u32 mask;
+	int err;
+
+	err = crypto_check_attr_type(tb, CRYPTO_ALG_TYPE_SKCIPHER, &mask);
+	if (err)
+		return err;
+
+	cipher_name = crypto_attr_alg_name(tb[1]);
+	if (IS_ERR(cipher_name))
+		return PTR_ERR(cipher_name);
+
+	/* Second parameter: the IV-walk mode (see dun_modes[]). */
+	mode_name = crypto_attr_alg_name(tb[2]);
+	if (IS_ERR(mode_name))
+		return PTR_ERR(mode_name);
+	mode = dun_find_mode(mode_name);
+	if (!mode)
+		return -EINVAL;
+
+	inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL);
+	if (!inst)
+		return -ENOMEM;
+	ictx = skcipher_instance_ctx(inst);
+	ictx->mode = mode;
+
+	/*
+	 * Sync-only: the split loop can't drive an async (-EINPROGRESS) child,
+	 * so resolve against a sync inner (mask | CRYPTO_ALG_ASYNC).
+	 */
+	err = crypto_grab_skcipher(&ictx->spawn, skcipher_crypto_instance(inst),
+				   cipher_name, 0, mask | CRYPTO_ALG_ASYNC);
+	if (err)
+		goto err_free_inst;
+
+	alg = crypto_spawn_skcipher_alg_common(&ictx->spawn);
+
+	/* The mode must accept this IV size, and it must fit our buffers. */
+	err = -EINVAL;
+	if (!alg->ivsize || alg->ivsize > DUN_MAX_IVSIZE ||
+	    !mode->ivsize_ok(alg->ivsize))
+		goto err_free_inst;
+
+	err = -ENAMETOOLONG;
+	if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, "dun(%s,%s)",
+		     alg->base.cra_name, mode->name) >= CRYPTO_MAX_ALG_NAME)
+		goto err_free_inst;
+	if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME,
+		     "dun(%s,%s)", alg->base.cra_driver_name,
+		     mode->name) >= CRYPTO_MAX_ALG_NAME)
+		goto err_free_inst;
+
+	inst->alg.base.cra_priority = alg->base.cra_priority;
+	inst->alg.base.cra_blocksize = alg->base.cra_blocksize;
+	inst->alg.base.cra_alignmask = alg->base.cra_alignmask;
+	inst->alg.base.cra_ctxsize = sizeof(struct dun_tfm_ctx);
+
+	inst->alg.ivsize = alg->ivsize;
+	inst->alg.chunksize = alg->chunksize;
+	inst->alg.min_keysize = alg->min_keysize;
+	inst->alg.max_keysize = alg->max_keysize;
+
+	inst->alg.init = dun_init_tfm;
+	inst->alg.exit = dun_exit_tfm;
+	inst->alg.setkey = dun_setkey;
+	inst->alg.encrypt = dun_encrypt;
+	inst->alg.decrypt = dun_decrypt;
+
+	inst->free = dun_free_instance;
+
+	err = skcipher_register_instance(tmpl, inst);
+	if (err) {
+err_free_inst:
+		dun_free_instance(inst);
+	}
+	return err;
+}
+
+static struct crypto_template dun_tmpl = {
+	.name = "dun",
+	.create = dun_create,
+	.module = THIS_MODULE,
+};
+
+static int __init dun_module_init(void)
+{
+	return crypto_register_template(&dun_tmpl);
+}
+
+static void __exit dun_module_exit(void)
+{
+	crypto_unregister_template(&dun_tmpl);
+}
+
+module_init(dun_module_init);
+module_exit(dun_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Data-unit-number dispatch template for skcipher");
+MODULE_ALIAS_CRYPTO("dun");
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 3/5] crypto: testmgr - test dun() dispatch
  2026-06-30  8:34 [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 2/5] crypto: dun - data-unit-number dispatch template Leonid Ravich
@ 2026-06-30  8:34 ` Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 4/5] dm crypt: batch a bio segment's sectors via dun() Leonid Ravich
  3 siblings, 0 replies; 5+ messages in thread
From: Leonid Ravich @ 2026-06-30  8:34 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: linux-block, linux-kernel, herbert, davem, ebiggers, snitzer,
	mpatocka, axboe

For every ivsize-16 skcipher, wrap it in both a dun(<inner>,le) and a
dun(<inner>,be) instance and cross-check each batched output against an
independent N x single-DU reference run directly on the inner tfm (both
keyed with one random key, the reference counter walked in the matching
endianness), over a deliberately fragmented scatterlist whose entries do
not align to the data-unit size.  The two must produce byte-identical
ciphertext; the batched ciphertext is then round-tripped and the caller
IV checked unchanged.  Testing both endiannesses exercises the be path
independently of any in-tree consumer.  Algorithms with no dun wrapper
(ivsize != 16) are skipped; a genuine mismatch returns -EBADMSG.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 crypto/testmgr.c | 289 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 289 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4d86efae65b2..cd9246f432de 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3211,6 +3211,291 @@ static int test_skcipher(int enc, const struct cipher_test_suite *suite,
 	return 0;
 }
 
+/* Upper bound on the IVs the dun() template accepts (16: xts; 32: Adiantum). */
+#define TEST_MDU_MAX_IVSIZE	32
+
+/*
+ * Increment an @ivsize-byte IV as a wide counter.  Byte-wise with carry --
+ * deliberately independent of crypto/dun.c's per-limb walk, so the two only
+ * agree if the carry is right.  LE: byte 0 least significant; BE: last byte.
+ */
+static void test_mdu_iv_inc(u8 *iv, unsigned int ivsize, bool big_endian)
+{
+	int i;
+
+	if (big_endian) {
+		for (i = ivsize - 1; i >= 0; i--)
+			if (++iv[i])
+				break;
+	} else {
+		for (i = 0; i < (int)ivsize; i++)
+			if (++iv[i])
+				break;
+	}
+}
+
+/*
+ * Seed @iv so the low 64-bit limb is all-ones but its least-significant byte:
+ * the 2nd increment wraps the limb and carries into the next.  LE limb is
+ * bytes [0,8); BE limb is the last 8 bytes.  Bytes outside keep their value.
+ */
+static void test_mdu_iv_boundary(u8 *iv, unsigned int ivsize, bool big_endian)
+{
+	unsigned int i;
+
+	if (big_endian) {
+		for (i = ivsize - 8; i < ivsize; i++)
+			iv[i] = 0xff;
+		iv[ivsize - 1] = 0xfe;
+	} else {
+		for (i = 0; i < 8; i++)
+			iv[i] = 0xff;
+		iv[0] = 0xfe;
+	}
+}
+
+/* Encrypt one du_size block with a plain single-DU request (the reference). */
+static int test_mdu_ref_encrypt(struct crypto_skcipher *tfm, const u8 *in,
+				u8 *out, unsigned int du_size, const u8 *iv,
+				unsigned int ivsize)
+{
+	struct skcipher_request *req;
+	struct scatterlist sg_in;
+	DECLARE_CRYPTO_WAIT(wait);
+	u8 ivbuf[TEST_MDU_MAX_IVSIZE];
+	int err;
+
+	req = skcipher_request_alloc(tfm, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+	memcpy(ivbuf, iv, ivsize);
+	memcpy(out, in, du_size);
+	sg_init_one(&sg_in, out, du_size);
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+	skcipher_request_set_crypt(req, &sg_in, &sg_in, du_size, ivbuf);
+	err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+	skcipher_request_free(req);
+	return err;
+}
+
+/*
+ * Build an SG over @buf with du_size-unaligned entries, so the splitter's
+ * per-DU views cross SG entries and exercise the scatter_walk cursor.
+ */
+static void test_mdu_sg_fragment(struct scatterlist *sg, unsigned int nents,
+				 u8 *buf, unsigned int total)
+{
+	unsigned int chunk = total / nents;
+	unsigned int off = 0, i;
+
+	sg_init_table(sg, nents);
+	for (i = 0; i < nents; i++) {
+		unsigned int len = (i == nents - 1) ? total - off : chunk;
+
+		sg_set_buf(&sg[i], buf + off, len);
+		off += len;
+	}
+}
+
+#define TEST_MDU_NR_UNITS	4
+#define TEST_MDU_NR_FRAGS	5
+/*
+ * Verify batched dispatch on @mdu (a dun(<inner>,<endian>) tfm) is byte-equal
+ * to an independent N x single-DU reference on @inner with @big_endian-walked
+ * IVs, over a fragmented SG, then round-trips.  Both tfms must share a key.
+ * @iv_orig is the ivsize-byte starting IV (the caller varies it to exercise
+ * both a random IV and one seeded to cross a carry boundary).
+ */
+static int test_skcipher_multi_du_one(struct crypto_skcipher *mdu,
+				      struct crypto_skcipher *inner,
+				      unsigned int du_size, bool big_endian,
+				      const u8 *iv_orig)
+{
+	const char *driver = crypto_skcipher_driver_name(mdu);
+	const unsigned int total = du_size * TEST_MDU_NR_UNITS;
+	const unsigned int ivsize = crypto_skcipher_ivsize(mdu);
+	struct skcipher_request *req = NULL;
+	struct scatterlist sg[TEST_MDU_NR_FRAGS];
+	DECLARE_CRYPTO_WAIT(wait);
+	u8 iv_work[TEST_MDU_MAX_IVSIZE], iv_ref[TEST_MDU_MAX_IVSIZE];
+	u8 *plain = NULL, *buf = NULL, *ref = NULL;
+	unsigned int u;
+	int err;
+
+	plain = kmalloc(total, GFP_KERNEL);
+	buf = kmalloc(total, GFP_KERNEL);
+	ref = kmalloc(total, GFP_KERNEL);
+	req = skcipher_request_alloc(mdu, GFP_KERNEL);
+	if (!plain || !buf || !ref || !req) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	get_random_bytes(plain, total);
+
+	/* Reference: per-DU single requests on the inner tfm, counter-walked IVs. */
+	memcpy(iv_ref, iv_orig, ivsize);
+	for (u = 0; u < TEST_MDU_NR_UNITS; u++) {
+		err = test_mdu_ref_encrypt(inner, plain + u * du_size,
+					   ref + u * du_size, du_size, iv_ref,
+					   ivsize);
+		if (err) {
+			pr_err("alg: skcipher: %s multi-DU ref encrypt failed (du=%u): %d\n",
+			       driver, du_size, err);
+			goto out;
+		}
+		test_mdu_iv_inc(iv_ref, ivsize, big_endian);
+	}
+
+	/* Batched: one request over a fragmented SG. */
+	memcpy(buf, plain, total);
+	memcpy(iv_work, iv_orig, ivsize);
+	test_mdu_sg_fragment(sg, TEST_MDU_NR_FRAGS, buf, total);
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+	skcipher_request_set_crypt(req, sg, sg, total, iv_work);
+	skcipher_request_set_data_unit_size(req, du_size);
+	err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+	if (err) {
+		pr_err("alg: skcipher: %s multi-DU encrypt failed (du=%u): %d\n",
+		       driver, du_size, err);
+		goto out;
+	}
+	if (memcmp(buf, ref, total) != 0) {
+		pr_err("alg: skcipher: %s multi-DU ciphertext differs from single-DU reference (du=%u)\n",
+		       driver, du_size);
+		err = -EBADMSG;
+		goto out;
+	}
+	/* req->iv must be unchanged after multi-DU dispatch. */
+	if (memcmp(iv_work, iv_orig, ivsize) != 0) {
+		pr_err("alg: skcipher: %s multi-DU encrypt mutated caller IV (du=%u)\n",
+		       driver, du_size);
+		err = -EBADMSG;
+		goto out;
+	}
+
+	/* Round-trip the batched ciphertext back to plaintext. */
+	test_mdu_sg_fragment(sg, TEST_MDU_NR_FRAGS, buf, total);
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+	skcipher_request_set_crypt(req, sg, sg, total, iv_work);
+	skcipher_request_set_data_unit_size(req, du_size);
+	err = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
+	if (err) {
+		pr_err("alg: skcipher: %s multi-DU decrypt failed (du=%u): %d\n",
+		       driver, du_size, err);
+		goto out;
+	}
+	if (memcmp(buf, plain, total) != 0) {
+		pr_err("alg: skcipher: %s multi-DU round-trip mismatch (du=%u)\n",
+		       driver, du_size);
+		err = -EBADMSG;
+	}
+
+out:
+	skcipher_request_free(req);
+	kfree(ref);
+	kfree(buf);
+	kfree(plain);
+	return err;
+}
+
+/*
+ * Cross-check the dun(<inner>,@endian) wrapper against @tfm over all du sizes.
+ * Returns 0 on success or skip (no wrapper / rejected key); -EBADMSG on a real
+ * mismatch.
+ */
+static int test_skcipher_multi_du_endian(struct crypto_skcipher *tfm,
+					 const char *alg_name,
+					 const char *endian, bool big_endian,
+					 const u8 *keybuf, unsigned int keylen)
+{
+	static const unsigned int du_sizes[] = { 512, 1024, 2048, 4096 };
+	char mdu_name[CRYPTO_MAX_ALG_NAME];
+	struct crypto_skcipher *mdu;
+	unsigned int ivsize;
+	u8 iv[TEST_MDU_MAX_IVSIZE];
+	unsigned int j;
+	int err;
+
+	if (snprintf(mdu_name, sizeof(mdu_name), "dun(%s,%s)", alg_name,
+		     endian) >= (int)sizeof(mdu_name))
+		return 0;
+
+	mdu = crypto_alloc_skcipher(mdu_name, 0, 0);
+	if (IS_ERR(mdu)) {
+		/* No dun wrapper (ivsize not a multiple of 8, or too wide): skip. */
+		if (PTR_ERR(mdu) == -ENOENT || PTR_ERR(mdu) == -EINVAL)
+			return 0;
+		return PTR_ERR(mdu);
+	}
+
+	ivsize = crypto_skcipher_ivsize(mdu);
+	if (ivsize > TEST_MDU_MAX_IVSIZE) {
+		err = 0;	/* wider than we have buffers for: skip */
+		goto out;
+	}
+
+	err = crypto_skcipher_setkey(mdu, keybuf, keylen);
+	if (err) {
+		err = 0;	/* weak/rejected key (e.g. XTS equal halves): skip */
+		goto out;
+	}
+
+	for (j = 0; j < ARRAY_SIZE(du_sizes); j++) {
+		/* A random starting IV. */
+		get_random_bytes(iv, ivsize);
+		err = test_skcipher_multi_du_one(mdu, tfm, du_sizes[j],
+						 big_endian, iv);
+		if (err)
+			break;
+		/* And one seeded to carry across a 64-bit limb / byte run. */
+		get_random_bytes(iv, ivsize);
+		test_mdu_iv_boundary(iv, ivsize, big_endian);
+		err = test_skcipher_multi_du_one(mdu, tfm, du_sizes[j],
+						 big_endian, iv);
+		if (err)
+			break;
+		cond_resched();
+	}
+out:
+	crypto_free_skcipher(mdu);
+	return err;
+}
+
+/*
+ * Cross-check dun() dispatch against a single-DU reference, in both le and be,
+ * for every ivsize the template accepts (16: xts; 32: Adiantum).
+ */
+static int test_skcipher_multi_du(struct crypto_skcipher *tfm)
+{
+	const char *alg_name = crypto_skcipher_alg(tfm)->base.cra_name;
+	u8 keybuf[128];
+	unsigned int keylen;
+	int err;
+
+	/* Key the inner tfm; each dun() wrapper is keyed identically below. */
+	keylen = crypto_skcipher_min_keysize(tfm);
+	if (keylen > sizeof(keybuf))
+		return 0;	/* unusually large key; skip rather than overflow */
+	get_random_bytes(keybuf, keylen);
+	err = crypto_skcipher_setkey(tfm, keybuf, keylen);
+	if (err)
+		return 0;	/* weak/rejected key (e.g. XTS equal halves): skip */
+
+	err = test_skcipher_multi_du_endian(tfm, alg_name, "le", false,
+					    keybuf, keylen);
+	if (err)
+		return err;
+	return test_skcipher_multi_du_endian(tfm, alg_name, "be", true,
+					     keybuf, keylen);
+}
+
 static int alg_test_skcipher(const struct alg_test_desc *desc,
 			     const char *driver, u32 type, u32 mask)
 {
@@ -3259,6 +3544,10 @@ static int alg_test_skcipher(const struct alg_test_desc *desc,
 	if (err)
 		goto out;
 
+	err = test_skcipher_multi_du(tfm);
+	if (err)
+		goto out;
+
 	err = test_skcipher_vs_generic_impl(desc->generic_driver, req, tsgls);
 out:
 	free_cipher_test_sglists(tsgls);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 4/5] dm crypt: batch a bio segment's sectors via dun()
  2026-06-30  8:34 [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template Leonid Ravich
                   ` (2 preceding siblings ...)
  2026-06-30  8:34 ` [PATCH v5 3/5] crypto: testmgr - test dun() dispatch Leonid Ravich
@ 2026-06-30  8:34 ` Leonid Ravich
  3 siblings, 0 replies; 5+ messages in thread
From: Leonid Ravich @ 2026-06-30  8:34 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: linux-block, linux-kernel, herbert, davem, ebiggers, snitzer,
	mpatocka, axboe

Submit one skcipher request per contiguous bio segment (a single
bio_vec) with data_unit_size = cc->sector_size, instead of one request
per sector.  E.g. the default 512-byte sector with a 4 KiB bio_vec
becomes one request of 8 data units; the crypto layer (the dun()
template, or a native driver) walks the per-sector IV as a data-unit
counter.  Because a bio_vec is one contiguous segment, the request uses
only the existing inline dmreq->sg_in[0]/sg_out[0] entry -- no per-bio
scatterlist allocation, and no regression on small random I/O.

crypt_alloc_tfms() wraps the skcipher in dun(<cipher>,<endian>) when
crypt_can_batch_dun() holds: an IV mode that is a data-unit counter (its
crypt_iv_operations sets dun_endian to the counter endianness -- "le" for
plain64, "be" for plain64be; non-counter modes such as lmk/tcw/eboiv
leave it NULL and are excluded), single-tfm, non-aead, and sector_size
512 or iv_large_sectors so the per-unit IV step is exactly one.  This is
the same kind of name rewrite as essiv(), done in the one alloc helper so
callers are unchanged.

DM_CRYPT selects CRYPTO_DUN and dun() resolves against a sync inner
cipher, so wrapping has no acceptable failure that the bare cipher would
survive -- there is no fallback; any error propagates.  (A config whose
only xts provider is async with no generic CRYPTO_XTS would now fail to
activate rather than silently run per-sector; generic xts is selected by
the dependency chain, so this does not arise in practice.)

crypt_convert_block_skcipher() handles both cases in one function: the
length is crypt_skcipher_len() -- a whole contiguous segment when
batching, else a single sector -- and data_unit_size is set
unconditionally (a dun() tfm reads it; a plain skcipher ignores it).  It
advances the bio iterators itself (as the aead path already does) and
reports the bytes processed, so crypt_convert() advances cc_sector /
tag_offset uniformly via one helper, no per-case duplication.

Verified byte-equivalent to the per-sector path: plain64 and plain64be
dm-crypt with dun() produce ciphertext bit-identical to an unpatched
kernel over a 256 MB device (xts-aes driving the split).

Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
 drivers/md/Kconfig    |   1 +
 drivers/md/dm-crypt.c | 208 +++++++++++++++++++++++++++++++++---------
 2 files changed, 166 insertions(+), 43 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index a3fcdca7e6db..e8e299566374 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -299,6 +299,7 @@ config DM_CRYPT
 	select CRC32
 	select CRYPTO
 	select CRYPTO_CBC
+	select CRYPTO_DUN # multi-data-unit batching of contiguous sectors
 	select CRYPTO_ESSIV
 	select CRYPTO_LIB_AES
 	select CRYPTO_LIB_MD5 # needed by lmk IV mode
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 608b617fb817..44938223ad3e 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -115,6 +115,13 @@ struct crypt_iv_operations {
 			 struct dm_crypt_request *dmreq);
 	void (*post)(struct crypt_config *cc, u8 *iv,
 		     struct dm_crypt_request *dmreq);
+
+	/*
+	 * Counter endianness ("le"/"be") for IV modes whose per-sector IV is a
+	 * data-unit-number counter (IV(s+i) == IV(s)+i), batchable via
+	 * dun(<cipher>,<dun_endian>).  NULL for non-counter modes (lmk, tcw, ...).
+	 */
+	const char *dun_endian;
 };
 
 struct iv_benbi_private {
@@ -151,6 +158,7 @@ enum cipher_flags {
 	CRYPT_IV_LARGE_SECTORS,		/* Calculate IV from sector_size, not 512B sectors */
 	CRYPT_ENCRYPT_PREPROCESS,	/* Must preprocess data for encryption (elephant) */
 	CRYPT_KEY_MAC_SIZE_SET,		/* The integrity_key_size option was used */
+	CRYPT_MULTI_DATA_UNIT,		/* Batch a bio segment's sectors per crypto request */
 };
 
 /*
@@ -1018,15 +1026,19 @@ static const struct crypt_iv_operations crypt_iv_plain_ops = {
 };
 
 static const struct crypt_iv_operations crypt_iv_plain64_ops = {
-	.generator = crypt_iv_plain64_gen
+	.generator = crypt_iv_plain64_gen,
+	.dun_endian = "le",
 };
 
 static const struct crypt_iv_operations crypt_iv_plain64be_ops = {
-	.generator = crypt_iv_plain64be_gen
+	.generator = crypt_iv_plain64be_gen,
+	.dun_endian = "be",
 };
 
 static const struct crypt_iv_operations crypt_iv_essiv_ops = {
-	.generator = crypt_iv_essiv_gen
+	.generator = crypt_iv_essiv_gen,
+	/* IV input is le64(sector); the salt-encrypt lives in essiv(). */
+	.dun_endian = "le",
 };
 
 static const struct crypt_iv_operations crypt_iv_benbi_ops = {
@@ -1349,21 +1361,51 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
 	return r;
 }
 
+/*
+ * Bytes to process in one skcipher request: a whole contiguous segment when
+ * batching (multi-data-unit), else one sector.  0 means an unusable
+ * (sub-sector / misaligned) segment.
+ */
+static unsigned int crypt_skcipher_len(struct crypt_config *cc,
+				       const struct bio_vec *bv_in,
+				       const struct bio_vec *bv_out)
+{
+	const unsigned int sector_size = cc->sector_size;
+
+	if (test_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags))
+		return round_down(min(bv_in->bv_len, bv_out->bv_len),
+				  sector_size);
+
+	/* Reject unexpected unaligned bio. */
+	if (unlikely(bv_in->bv_len & (sector_size - 1)))
+		return 0;
+	return sector_size;
+}
+
+/*
+ * Encrypt/decrypt one bio segment (one sector, or a whole segment when
+ * batching) and report the bytes done in *out_processed.  The integrity /
+ * preprocess / post handling is inert when batching (crypt_can_batch_dun()
+ * excludes those configs).
+ */
 static int crypt_convert_block_skcipher(struct crypt_config *cc,
 					struct convert_context *ctx,
 					struct skcipher_request *req,
-					unsigned int tag_offset)
+					unsigned int tag_offset,
+					unsigned int *out_processed)
 {
 	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
 	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+	const unsigned int sector_size = cc->sector_size;
 	struct scatterlist *sg_in, *sg_out;
 	struct dm_crypt_request *dmreq;
 	u8 *iv, *org_iv, *tag_iv;
 	__le64 *sector;
+	unsigned int len;
 	int r = 0;
 
-	/* Reject unexpected unaligned bio. */
-	if (unlikely(bv_in.bv_len & (cc->sector_size - 1)))
+	len = crypt_skcipher_len(cc, &bv_in, &bv_out);
+	if (unlikely(!len))
 		return -EIO;
 
 	dmreq = dmreq_of_req(cc, req);
@@ -1386,10 +1428,10 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	sg_out = &dmreq->sg_out[0];
 
 	sg_init_table(sg_in, 1);
-	sg_set_page(sg_in, bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
+	sg_set_page(sg_in, bv_in.bv_page, len, bv_in.bv_offset);
 
 	sg_init_table(sg_out, 1);
-	sg_set_page(sg_out, bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
+	sg_set_page(sg_out, bv_out.bv_page, len, bv_out.bv_offset);
 
 	if (cc->iv_gen_ops) {
 		/* For READs use IV stored in integrity metadata */
@@ -1410,7 +1452,9 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 		memcpy(iv, org_iv, cc->iv_size);
 	}
 
-	skcipher_request_set_crypt(req, sg_in, sg_out, cc->sector_size, iv);
+	skcipher_request_set_crypt(req, sg_in, sg_out, len, iv);
+	/* A dun() tfm reads this; a plain skcipher ignores it (len is one sector). */
+	skcipher_request_set_data_unit_size(req, sector_size);
 
 	if (bio_data_dir(ctx->bio_in) == WRITE)
 		r = crypto_skcipher_encrypt(req);
@@ -1420,9 +1464,10 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
 		cc->iv_gen_ops->post(cc, org_iv, dmreq);
 
-	bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size);
-	bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size);
+	bio_advance_iter(ctx->bio_in, &ctx->iter_in, len);
+	bio_advance_iter(ctx->bio_out, &ctx->iter_out, len);
 
+	*out_processed = len;
 	return r;
 }
 
@@ -1509,13 +1554,25 @@ static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_
 		crypt_free_req_skcipher(cc, req, base_bio);
 }
 
+/*
+ * Advance the IV-sector and integrity-tag cursors by @processed bytes; the
+ * bio iterators are advanced by the per-block helpers themselves.
+ */
+static void crypt_convert_advance(struct crypt_config *cc,
+				  struct convert_context *ctx,
+				  unsigned int processed)
+{
+	ctx->cc_sector += processed >> SECTOR_SHIFT;
+	ctx->tag_offset += processed / cc->sector_size;
+}
+
 /*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
 static blk_status_t crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx, bool atomic, bool reset_pending)
 {
-	unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+	unsigned int processed;
 	int r;
 
 	/*
@@ -1536,10 +1593,12 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
 
 		atomic_inc(&ctx->cc_pending);
 
+		processed = cc->sector_size;
 		if (crypt_integrity_aead(cc))
 			r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, ctx->tag_offset);
 		else
-			r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, ctx->tag_offset);
+			r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req,
+							 ctx->tag_offset, &processed);
 
 		switch (r) {
 		/*
@@ -1559,8 +1618,7 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
 					 * exit and continue processing in a workqueue
 					 */
 					ctx->r.req = NULL;
-					ctx->tag_offset++;
-					ctx->cc_sector += sector_step;
+					crypt_convert_advance(cc, ctx, processed);
 					return BLK_STS_DEV_RESOURCE;
 				}
 			} else {
@@ -1574,16 +1632,14 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
 		 */
 		case -EINPROGRESS:
 			ctx->r.req = NULL;
-			ctx->tag_offset++;
-			ctx->cc_sector += sector_step;
+			crypt_convert_advance(cc, ctx, processed);
 			continue;
 		/*
 		 * The request was already processed (synchronously).
 		 */
 		case 0:
 			atomic_dec(&ctx->cc_pending);
-			ctx->cc_sector += sector_step;
-			ctx->tag_offset++;
+			crypt_convert_advance(cc, ctx, processed);
 			if (!atomic)
 				cond_resched();
 			continue;
@@ -2345,12 +2401,37 @@ static int crypt_alloc_tfms_aead(struct crypt_config *cc, char *ciphermode)
 	return 0;
 }
 
+/*
+ * Whether to wrap the cipher in dun() for multi-data-unit batching: a counter
+ * IV mode (dun_endian set: plain64 "le", plain64be "be", essiv "le"), single-
+ * tfm, non-aead, and a per-unit IV step of exactly one (512B sectors or
+ * iv_large_sectors).  Integrity is configured
+ * after alloc, so it is re-checked post-alloc in crypt_ctr_cipher(); an
+ * integrity config keeps an inert dun() wrapper but never sets the batch flag.
+ */
+static bool crypt_can_batch_dun(struct crypt_config *cc)
+{
+	return !crypt_integrity_aead(cc) && cc->tfms_count == 1 &&
+		cc->iv_gen_ops && cc->iv_gen_ops->dun_endian &&
+		(cc->sector_size == (1 << SECTOR_SHIFT) ||
+		 test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags));
+}
+
 static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode)
 {
+	char dun_api[CRYPTO_MAX_ALG_NAME];
+
 	if (crypt_integrity_aead(cc))
 		return crypt_alloc_tfms_aead(cc, ciphermode);
-	else
-		return crypt_alloc_tfms_skcipher(cc, ciphermode);
+
+	/* Wrap in dun() for batching when eligible (like the essiv() rewrite). */
+	if (crypt_can_batch_dun(cc)) {
+		if (snprintf(dun_api, sizeof(dun_api), "dun(%s,%s)", ciphermode,
+			     cc->iv_gen_ops->dun_endian) >= (int)sizeof(dun_api))
+			return -ENAMETOOLONG;
+		ciphermode = dun_api;
+	}
+	return crypt_alloc_tfms_skcipher(cc, ciphermode);
 }
 
 static unsigned int crypt_subkey_size(struct crypt_config *cc)
@@ -2747,25 +2828,15 @@ static void crypt_dtr(struct dm_target *ti)
 	dm_audit_log_dtr(DM_MSG_PREFIX, ti, 1);
 }
 
-static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
+/*
+ * Select cc->iv_gen_ops from the IV mode string -- pure parsing, no tfm
+ * dependency, so it runs before alloc and lets crypt_can_batch_dun() see the
+ * mode.  The tfm-dependent IV sizing is finished later by crypt_ctr_ivmode().
+ */
+static int crypt_select_ivmode(struct dm_target *ti, const char *ivmode)
 {
 	struct crypt_config *cc = ti->private;
 
-	if (crypt_integrity_aead(cc))
-		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
-	else
-		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
-
-	if (cc->iv_size)
-		/* at least a 64 bit sector number should fit in our buffer */
-		cc->iv_size = max(cc->iv_size,
-				  (unsigned int)(sizeof(u64) / sizeof(u8)));
-	else if (ivmode) {
-		DMWARN("Selected cipher does not support IVs");
-		ivmode = NULL;
-	}
-
-	/* Choose ivmode, see comments at iv code. */
 	if (ivmode == NULL)
 		cc->iv_gen_ops = NULL;
 	else if (strcmp(ivmode, "plain") == 0)
@@ -2803,12 +2874,8 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 		}
 	} else if (strcmp(ivmode, "tcw") == 0) {
 		cc->iv_gen_ops = &crypt_iv_tcw_ops;
-		cc->key_parts += 2; /* IV + whitening */
-		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
 	} else if (strcmp(ivmode, "random") == 0) {
 		cc->iv_gen_ops = &crypt_iv_random_ops;
-		/* Need storage space in integrity fields. */
-		cc->integrity_iv_size = cc->iv_size;
 	} else {
 		ti->error = "Invalid IV mode";
 		return -EINVAL;
@@ -2817,6 +2884,37 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 	return 0;
 }
 
+static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
+{
+	struct crypt_config *cc = ti->private;
+
+	if (crypt_integrity_aead(cc))
+		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
+	else
+		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
+
+	if (cc->iv_size)
+		/* at least a 64 bit sector number should fit in our buffer */
+		cc->iv_size = max(cc->iv_size,
+				  (unsigned int)(sizeof(u64) / sizeof(u8)));
+	else if (ivmode) {
+		DMWARN("Selected cipher does not support IVs");
+		ivmode = NULL;
+		cc->iv_gen_ops = NULL;
+	}
+
+	/* Finish the tfm-dependent IV sizing; modes are already selected. */
+	if (cc->iv_gen_ops == &crypt_iv_tcw_ops) {
+		cc->key_parts += 2; /* IV + whitening */
+		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
+	} else if (cc->iv_gen_ops == &crypt_iv_random_ops) {
+		/* Need storage space in integrity fields. */
+		cc->integrity_iv_size = cc->iv_size;
+	}
+
+	return 0;
+}
+
 /*
  * Workaround to parse HMAC algorithm from AEAD crypto API spec.
  * The HMAC is needed to calculate tag size (HMAC digest size).
@@ -2914,7 +3012,12 @@ static int crypt_ctr_cipher_new(struct dm_target *ti, char *cipher_in, char *key
 
 	cc->key_parts = cc->tfms_count;
 
-	/* Allocate cipher */
+	/* Select IV mode before alloc so dun() wrapping can be decided. */
+	ret = crypt_select_ivmode(ti, *ivmode);
+	if (ret < 0)
+		return ret;
+
+	/* Allocate cipher (skcipher may be wrapped in dun()). */
 	ret = crypt_alloc_tfms(cc, cipher_api);
 	if (ret < 0) {
 		ti->error = "Error allocating crypto tfm";
@@ -2999,7 +3102,13 @@ static int crypt_ctr_cipher_old(struct dm_target *ti, char *cipher_in, char *key
 		goto bad_mem;
 	}
 
-	/* Allocate cipher */
+	/* Select IV mode before alloc so dun() wrapping can be decided. */
+	ret = crypt_select_ivmode(ti, *ivmode);
+	if (ret < 0) {
+		kfree(cipher_api);
+		return ret;
+	}
+
 	ret = crypt_alloc_tfms(cc, cipher_api);
 	if (ret < 0) {
 		ti->error = "Error allocating crypto tfm";
@@ -3063,6 +3172,19 @@ static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
 		}
 	}
 
+	/*
+	 * Enable batching only if the cipher was dun()-wrapped at alloc time and
+	 * no integrity was configured (integrity is set up after cipher alloc).
+	 */
+	if (!crypt_integrity_aead(cc) && !cc->integrity_tag_size &&
+	    !cc->integrity_iv_size &&
+	    !strncmp(crypto_skcipher_alg(any_tfm(cc))->base.cra_name,
+		     "dun(", 4)) {
+		set_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+		DMINFO("Using multi-data-unit crypto offload (du=%u)",
+		       cc->sector_size);
+	}
+
 	/* wipe the kernel key payload copy */
 	if (cc->key_string)
 		memset(cc->key, 0, cc->key_size * sizeof(u8));
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-30  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30  8:34 [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 2/5] crypto: dun - data-unit-number dispatch template Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 3/5] crypto: testmgr - test dun() dispatch Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 4/5] dm crypt: batch a bio segment's sectors via dun() Leonid Ravich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox