* [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
@ 2026-06-15 11:14 Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
To: Herbert Xu
Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
This is v4, addressing Herbert's review of v3. Two architectural
changes:
- data_unit_size is now per-request (on struct skcipher_request)
rather than per-tfm. Reverts to the v1 placement.
- The crypto API auto-splits multi-data-unit requests when the
underlying algorithm does not advertise
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. Consumers no longer test
for multi-DU support before submitting; setting data_unit_size
on any skcipher request whose algorithm uses the 128-bit LE
counter IV convention "just works".
These two changes shrink the series from 4 patches to 3 (the
generic xts(...) template needs no special handling - the
auto-splitter calls its single-DU encrypt/decrypt once per data
unit) and simplify the dm-crypt consumer (no advertise-flag check,
no per-tfm setup).
v3: https://lore.kernel.org/linux-crypto/20260601085641.16028-1-lravich@amazon.com/
v2: https://lore.kernel.org/linux-crypto/20260527065021.19525-1-lravich@amazon.com/
v1: https://lore.kernel.org/linux-crypto/20260519115955.27267-1-lravich@amazon.com/
The series adds a per-request "data unit size" to the skcipher API
so a caller can submit several data units (typically 512..4096-byte
sectors) sharing one starting IV in a single request. Algorithms
derive each data unit's IV from the caller-supplied IV by treating
it as a 128-bit little-endian counter and adding the data-unit
index, matching the layout produced by dm-crypt's plain64 IV mode
and by typical inline-encryption hardware.
This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.
The first user is dm-crypt, which today issues one skcipher request
per sector and so pays a per-sector cost in request allocation,
callback dispatch, completion handling, and scatterlist setup.
Proof-of-concept performance numbers from the RFC reply [1]: +19%
throughput / -40% CPU on a single-core arm64 system with a hardware
XTS-AES-256 accelerator running fio 4 KiB sequential writes through
dm-crypt, when an out-of-tree arm64 xts driver advertises
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. This series itself does not
include arch enablement; the fast path is opt-in per driver, the
slow path is universal via the auto-splitter.
The native fast path amortises both per-sector dispatch and per-sector
crypto setup across a bio - the measured win above, on an engine that
offloads the AES compute. The auto-splitter is for correctness and
reach: any consumer can set data_unit_size and get correct output with
the per-request allocation/callback/completion cost removed, but it
still issues one alg->encrypt per data unit, so on a software cipher it
saves only dispatch overhead (no throughput figure claimed - that is
hardware- and workload-dependent). What it guarantees unconditionally
is byte-identical output (Verification below) at O(entries + units),
walking the scatterlists with a pair of struct scatter_walk cursors
rather than rescanning from the head per unit.
[1] https://lore.kernel.org/linux-crypto/20260428101225.24316-1-lravich@amazon.com/
Changes since v3
----------------
- data_unit_size moved from struct crypto_skcipher (per-tfm) to
struct skcipher_request (per-request). (Herbert)
- Crypto API auto-splits multi-data-unit requests when the algorithm
does not advertise CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. Drops the
per-tfm setter/probe in favour of a single
skcipher_request_set_data_unit_size() usable by every consumer.
(Herbert)
- CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU is a type-specific cra_flags
bit (0x01000000) in crypto/internal/skcipher.h, not a generic bit
in the public header; drivers set it to opt OUT of auto-splitting.
- The auto-splitter advances through src/dst with a pair of struct
scatter_walk cursors (scatterwalk_start / scatterwalk_get_sglist /
scatterwalk_skip) instead of scatterwalk_ffwd() per unit, which
rescans from the head and is O(units^2) under fragmentation; the
cursors give a single linear pass. (Eric)
- crypto_skcipher_validate_multi_du() reports -EINVAL for a malformed
geometry (du not a power of two, cryptlen not a positive multiple)
and -EOPNOTSUPP for a target that cannot do multi-DU (ivsize != 16,
lskcipher, or async without the native flag), so a caller can fall
back. Gates the native path too, not just the auto-splitter.
(Eric)
- testmgr cross-checks the batched dispatch against an independent
N x single-DU reference with LE128-walked IVs over a fragmented
scatterlist (pins the IV convention and exercises the cursor),
round-trips, and checks IV preservation. Ineligible algorithms
skip via -EOPNOTSUPP; a real mismatch returns -EBADMSG.
- dm-crypt enables batching only for IV modes flagged sector_iv_le128
(a new bool on struct crypt_iv_operations, set on plain64 only),
plus ivsize 16, sync, single-tfm, no integrity, no post() hook. The
flag replaces a hardcoded plain64 pointer-compare, so eligibility is
a self-documenting property of the IV mode rather than a special
case. plain stays excluded (its 32-bit counter wraps differently
past 2^32 sectors). Sets req->data_unit_size = sector_size and
submits; -EOPNOTSUPP/-EAGAIN fall back to the per-sector path.
Mikulas's v2 Reviewed-by is dropped as the dm-crypt patch was
substantially rewritten.
- The generic xts(...) template needs no separate handling, dropping
the v3 crypto/xts.c patch (4 -> 3 patches).
Design overview
---------------
* Patch 1 adds the data_unit_size field, the setter, the
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU flag, and the auto-splitter in
crypto_skcipher_encrypt()/decrypt(). skcipher_request_set_tfm()
resets the field so a reused request defaults to single-DU.
* Patch 2 adds the testmgr multi-DU test (every ivsize == 16
skcipher).
* Patch 3 turns dm-crypt batching on automatically under the
conditions above and sets req->data_unit_size = cc->sector_size.
This series does NOT add the capability flag to any arch driver; the
auto-splitter ensures correctness without that opt-in.
Verification
------------
A regression protocol is included in the project tree
(.claude/regression-protocol.md, .claude/run-regression.sh). The
reference run reports 12/12 PASS:
- x86 + arm64 build clean; checkpatch.pl --strict clean.
- testmgr multi-DU: PASS for every ivsize == 16 skcipher in-tree.
- dm-crypt activation gating: plain64 enabled; essiv:sha256 /
plain64be / plain fall back.
- dm-crypt round-trip plain64 with multi-DU via the auto-splitter
(xts-aes-aesni, no native flag): PASS.
- dm-crypt round-trip essiv:sha256 (per-sector path): PASS.
- dm-crypt low-memory (mem=128M): PASS, no OOM kill.
- Byte-equivalence: 256 MB of ciphertext through the auto-splitter
is bit-identical to an unpatched axboe/for-next baseline (sha256
4913910b1aa6f8859fcb8f4adec20230274993a3ade8f4dd0140a323dc43efc0).
- arm64 functional under qemu-aarch64: PASS.
Leonid Ravich (3):
crypto: skcipher - add per-request data_unit_size with auto-splitting
crypto: testmgr - test for multi-data-unit dispatch
dm crypt: batch all sectors of a bio per crypto request
crypto/skcipher.c | 132 +++++++++++++++++++
crypto/testmgr.c | 192 +++++++++++++++++++++++++
drivers/md/dm-crypt.c | 215 +++++++++++++++++++++++++++--
include/crypto/internal/skcipher.h | 10 ++
include/crypto/skcipher.h | 28 ++++
5 files changed, 569 insertions(+), 8 deletions(-)
base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
@ 2026-06-15 11:14 ` Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
To: Herbert Xu
Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
Add a data_unit_size field to struct skcipher_request that lets a
caller submit several data units (typically 512..4096-byte sectors)
sharing one starting IV in a single request. Algorithms derive each
data unit's IV from the caller-supplied IV by treating it as a
128-bit little-endian counter and adding the data-unit index, which
matches the layout produced by dm-crypt's plain64 IV mode and by
typical inline-encryption hardware.
This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.
The crypto API auto-splits a multi-data-unit request into per-DU
sub-requests when the underlying algorithm does not advertise
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU (a type-specific cra_flags bit,
defined in crypto/internal/skcipher.h). A consumer sets
data_unit_size and submits: a native driver handles all units in one
pass, otherwise the core splits transparently. The split derives
per-DU IVs as a 128-bit LE counter, so this is correct only for
algorithms using that IV convention (e.g. XTS with plain64-style
IVs); callers are responsible for that match, as they already are for
the IV itself.
skcipher_request_set_tfm() resets the field to 0 so a request reused
from a pool or stack defaults to single-data-unit semantics; callers
that want batching set it explicitly via
skcipher_request_set_data_unit_size() after configuring the tfm.
crypto_skcipher_encrypt()/decrypt() call
crypto_skcipher_validate_multi_du() before any algorithm dispatch.
data_unit_size must be a power of two when non-zero (realistic sizes
are 512..4096, letting the per-DU loop and the cryptlen alignment
check use a mask instead of a divide) and cryptlen a positive
multiple of it; a malformed geometry is rejected with -EINVAL. A
target that cannot do multi-DU - ivsize != SKCIPHER_MDU_IVSIZE (16),
an lskcipher, or an async algorithm without the native flag - is
rejected with -EOPNOTSUPP so a caller can fall back. Async is
excluded because the splitter dispatches synchronously: an
-EINPROGRESS return would leave later units unsubmitted while the
driver still owned the request's scatterlists and IV. The check
gates the native path too, so algorithms never see a malformed
multi-DU request.
No in-tree algorithm sets CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU yet;
subsequent patches add the testmgr coverage and the dm-crypt
consumer.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
crypto/skcipher.c | 132 +++++++++++++++++++++++++++++
include/crypto/internal/skcipher.h | 10 +++
include/crypto/skcipher.h | 28 ++++++
3 files changed, 170 insertions(+)
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 2b31d1d5d268..9262b47acfb9 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -17,6 +17,7 @@
#include <linux/cryptouser.h>
#include <linux/err.h>
#include <linux/kernel.h>
+#include <linux/log2.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/seq_file.h>
@@ -432,15 +433,139 @@ int crypto_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
}
EXPORT_SYMBOL_GPL(crypto_skcipher_setkey);
+/* IV size for the 128-bit LE-counter multi-data-unit convention. */
+#define SKCIPHER_MDU_IVSIZE 16
+
+static inline void skcipher_iv_inc_le128(u8 *iv)
+{
+ __le64 lo_le, hi_le;
+ u64 lo;
+
+ memcpy(&lo_le, iv, 8);
+ memcpy(&hi_le, iv + 8, 8);
+ lo = le64_to_cpu(lo_le) + 1;
+ lo_le = cpu_to_le64(lo);
+ memcpy(iv, &lo_le, 8);
+ if (unlikely(lo == 0)) {
+ hi_le = cpu_to_le64(le64_to_cpu(hi_le) + 1);
+ memcpy(iv + 8, &hi_le, 8);
+ }
+}
+
+/*
+ * Dispatch a multi-data-unit request as one single-DU sub-request per
+ * unit. Each unit's IV is the caller's IV plus the unit index, taken
+ * as a 128-bit little-endian counter. A pair of scatter_walks advances
+ * through src/dst in a single linear pass (O(entries + units)); building
+ * each sub-request's view with scatterwalk_ffwd() would instead rescan
+ * from the head every unit, i.e. O(units^2).
+ */
+static int skcipher_split_data_units(struct skcipher_request *req,
+ int (*body)(struct skcipher_request *))
+{
+ const unsigned int du = req->data_unit_size;
+ const unsigned int total = req->cryptlen;
+ struct scatterlist *orig_src = req->src;
+ struct scatterlist *orig_dst = req->dst;
+ bool inplace = orig_src == orig_dst;
+ struct scatter_walk src_walk, dst_walk;
+ struct scatterlist src_sg[2], dst_sg[2];
+ u8 iv_orig[SKCIPHER_MDU_IVSIZE];
+ u8 iv_work[SKCIPHER_MDU_IVSIZE];
+ unsigned int off;
+ int err = 0;
+
+ memcpy(iv_orig, req->iv, sizeof(iv_orig));
+ memcpy(iv_work, iv_orig, sizeof(iv_orig));
+
+ sg_init_table(src_sg, 2);
+ scatterwalk_start(&src_walk, orig_src);
+ if (!inplace) {
+ sg_init_table(dst_sg, 2);
+ scatterwalk_start(&dst_walk, orig_dst);
+ }
+
+ /* Stop the per-DU body from re-entering the splitter. */
+ req->data_unit_size = 0;
+ req->src = src_sg;
+ req->dst = inplace ? src_sg : dst_sg;
+
+ for (off = 0; off < total; off += du) {
+ req->cryptlen = du;
+ scatterwalk_get_sglist(&src_walk, src_sg);
+ scatterwalk_skip(&src_walk, du);
+ if (!inplace) {
+ scatterwalk_get_sglist(&dst_walk, dst_sg);
+ scatterwalk_skip(&dst_walk, du);
+ }
+
+ err = body(req);
+ if (err)
+ break;
+
+ skcipher_iv_inc_le128(iv_work);
+ memcpy(req->iv, iv_work, sizeof(iv_work));
+ }
+
+ /* Caller-visible IV is the starting IV regardless of outcome. */
+ memcpy(req->iv, iv_orig, sizeof(iv_orig));
+ req->src = orig_src;
+ req->dst = orig_dst;
+ req->cryptlen = total;
+ req->data_unit_size = du;
+ return err;
+}
+
+static int crypto_skcipher_validate_multi_du(struct skcipher_request *req)
+{
+ const unsigned int du = req->data_unit_size;
+ struct crypto_skcipher *tfm;
+ struct skcipher_alg *alg;
+ u32 cra_flags;
+
+ if (likely(!du))
+ return 0;
+ if (!is_power_of_2(du) || du < SKCIPHER_MDU_IVSIZE)
+ return -EINVAL;
+ if (!req->cryptlen || (req->cryptlen & (du - 1)))
+ return -EINVAL;
+
+ tfm = crypto_skcipher_reqtfm(req);
+ alg = crypto_skcipher_alg(tfm);
+
+ /* lskcipher's *_sg path doesn't honour data_unit_size. */
+ if (alg->co.base.cra_type != &crypto_skcipher_type)
+ return -EOPNOTSUPP;
+
+ /* Capability mismatch, not a malformed request: report -EOPNOTSUPP. */
+ if (crypto_skcipher_ivsize(tfm) != SKCIPHER_MDU_IVSIZE)
+ return -EOPNOTSUPP;
+
+ /* The auto-splitter is sync-only; native drivers own async dispatch. */
+ cra_flags = alg->co.base.cra_flags;
+ if ((cra_flags & CRYPTO_ALG_ASYNC) &&
+ !(cra_flags & CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU))
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
int crypto_skcipher_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+ int err;
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
+ err = crypto_skcipher_validate_multi_du(req);
+ if (err)
+ return err;
if (alg->co.base.cra_type != &crypto_skcipher_type)
return crypto_lskcipher_encrypt_sg(req);
+ if (req->data_unit_size &&
+ !(alg->co.base.cra_flags & CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU))
+ return skcipher_split_data_units(req, alg->encrypt);
return alg->encrypt(req);
}
EXPORT_SYMBOL_GPL(crypto_skcipher_encrypt);
@@ -449,11 +574,18 @@ int crypto_skcipher_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+ int err;
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
+ err = crypto_skcipher_validate_multi_du(req);
+ if (err)
+ return err;
if (alg->co.base.cra_type != &crypto_skcipher_type)
return crypto_lskcipher_decrypt_sg(req);
+ if (req->data_unit_size &&
+ !(alg->co.base.cra_flags & CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU))
+ return skcipher_split_data_units(req, alg->decrypt);
return alg->decrypt(req);
}
EXPORT_SYMBOL_GPL(crypto_skcipher_decrypt);
diff --git a/include/crypto/internal/skcipher.h b/include/crypto/internal/skcipher.h
index a965b6aabf61..4c826f3bc715 100644
--- a/include/crypto/internal/skcipher.h
+++ b/include/crypto/internal/skcipher.h
@@ -21,6 +21,16 @@
*/
#define CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE CRYPTO_ALG_OPTIONAL_KEY
+/*
+ * Set by an skcipher that handles skcipher_request::data_unit_size > 0
+ * natively in one pass; otherwise the API splits the request. Lives in
+ * the type-specific 0xff000000 cra_flags range. A native driver must
+ * derive per-DU IVs as a 128-bit LE counter and leave @iv at the
+ * caller-supplied starting value on return, success or error, matching
+ * the auto-splitter so the two paths are observably identical.
+ */
+#define CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU 0x01000000
+
struct aead_request;
struct rtattr;
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 4efe2ca8c4d1..ced1fae08147 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -31,6 +31,11 @@ struct scatterlist;
/**
* struct skcipher_request - Symmetric key cipher request
* @cryptlen: Number of bytes to encrypt or decrypt
+ * @data_unit_size: Size in bytes of each data unit, or 0 for a
+ * single-data-unit request (the default). When non-zero,
+ * must be a power of two, @cryptlen must be a positive
+ * multiple of it, and per-DU IVs are derived from @iv as a
+ * 128-bit little-endian counter.
* @iv: Initialisation Vector
* @src: Source SG list
* @dst: Destination SG list
@@ -39,6 +44,7 @@ struct scatterlist;
*/
struct skcipher_request {
unsigned int cryptlen;
+ unsigned int data_unit_size;
u8 *iv;
@@ -225,6 +231,7 @@ struct lskcipher_alg {
struct skcipher_request *name = \
(((struct skcipher_request *)__##name##_desc)->base.tfm = \
crypto_sync_skcipher_tfm((_tfm)), \
+ ((struct skcipher_request *)__##name##_desc)->data_unit_size = 0, \
(void *)__##name##_desc)
/**
@@ -819,6 +826,8 @@ static inline void skcipher_request_set_tfm(struct skcipher_request *req,
struct crypto_skcipher *tfm)
{
req->base.tfm = crypto_skcipher_tfm(tfm);
+ /* Reused requests default to single-data-unit. */
+ req->data_unit_size = 0;
}
static inline void skcipher_request_set_sync_tfm(struct skcipher_request *req,
@@ -937,5 +946,24 @@ static inline void skcipher_request_set_crypt(
req->iv = iv;
}
+/**
+ * skcipher_request_set_data_unit_size() - submit as multiple data units
+ * @req: request handle
+ * @data_unit_size: data-unit size in bytes (power of two), or 0 to disable
+ *
+ * Process @req as @cryptlen / @data_unit_size data units sharing one starting
+ * @iv, with per-DU IVs derived as a 128-bit little-endian counter. @cryptlen
+ * must be a positive multiple of @data_unit_size, else the encrypt/decrypt
+ * call returns -EINVAL; a target that cannot do multi-DU (ivsize != 16, an
+ * lskcipher, or async without native support) returns -EOPNOTSUPP. Unlike
+ * the single-DU path, @iv is preserved across the call regardless of outcome.
+ */
+static inline void
+skcipher_request_set_data_unit_size(struct skcipher_request *req,
+ unsigned int data_unit_size)
+{
+ req->data_unit_size = data_unit_size;
+}
+
#endif /* _CRYPTO_SKCIPHER_H */
base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
@ 2026-06-15 11:14 ` Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
2026-06-15 22:53 ` [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Eric Biggers
3 siblings, 0 replies; 8+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
To: Herbert Xu
Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
Add a test that runs on every skcipher with ivsize == 16. It
encrypts random plaintext two ways and compares:
1. one batched request with skcipher_request_set_data_unit_size()
set, over a deliberately fragmented scatterlist whose entries do
not align to the data-unit size (so per-DU views cross SG entries
and exercise the scatter_walk cursor), and
2. an independent reference of N single-DU requests with IVs walked
as a 128-bit LE counter, matching the convention documented in
skcipher_request_set_data_unit_size().
The two must produce byte-identical ciphertext; this pins the IV
convention rather than only checking encrypt/decrypt symmetry. The
batched ciphertext is then round-tripped back to plaintext, and the
caller IV is checked unchanged. Iterates over typical data unit
sizes (512, 1024, 2048, 4096).
Algorithms the validator rejects for multi-DU return -EOPNOTSUPP on
the first call and skip cleanly; a genuine mismatch returns -EBADMSG
so it cannot be confused with a skip.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
crypto/testmgr.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 192 insertions(+)
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4d86efae65b2..5cbd0f4b070e 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3211,6 +3211,194 @@ static int test_skcipher(int enc, const struct cipher_test_suite *suite,
return 0;
}
+/* Increment a 16-byte IV as a little-endian 128-bit counter. */
+static void test_mdu_iv_inc(u8 iv[16])
+{
+ int i;
+
+ for (i = 0; i < 16; i++)
+ if (++iv[i])
+ break;
+}
+
+/*
+ * Encrypt one du_size block with a plain single-DU request; used to
+ * build an independent reference for the batched dispatch.
+ */
+static int test_mdu_ref_encrypt(struct crypto_skcipher *tfm, const u8 *in,
+ u8 *out, unsigned int du_size, const u8 iv[16])
+{
+ struct skcipher_request *req;
+ struct scatterlist sg_in, sg_out;
+ DECLARE_CRYPTO_WAIT(wait);
+ u8 ivbuf[16];
+ int err;
+
+ req = skcipher_request_alloc(tfm, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+ memcpy(ivbuf, iv, 16);
+ memcpy(out, in, du_size);
+ sg_init_one(&sg_in, out, du_size);
+ skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP,
+ crypto_req_done, &wait);
+ skcipher_request_set_crypt(req, &sg_in, &sg_in, du_size, ivbuf);
+ err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+ skcipher_request_free(req);
+ return err;
+}
+
+/*
+ * Build a deliberately fragmented SG over @buf: entries that do not
+ * align to du_size, so the splitter's per-DU views cross SG entries
+ * and exercise the scatter_walk cursor.
+ */
+static void test_mdu_sg_fragment(struct scatterlist *sg, unsigned int nents,
+ u8 *buf, unsigned int total)
+{
+ unsigned int chunk = total / nents;
+ unsigned int off = 0, i;
+
+ sg_init_table(sg, nents);
+ for (i = 0; i < nents; i++) {
+ unsigned int len = (i == nents - 1) ? total - off : chunk;
+
+ sg_set_buf(&sg[i], buf + off, len);
+ off += len;
+ }
+}
+
+/*
+ * Multi-DU test: verify the batched dispatch produces byte-identical
+ * ciphertext to an independent N x single-DU reference with per-DU IVs
+ * walked as a 128-bit LE counter (pins the IV convention, not just
+ * enc/dec symmetry), over a fragmented SG, then round-trips. Real
+ * mismatches return -EBADMSG; ineligible algorithms skip via the
+ * validator's -EOPNOTSUPP.
+ */
+#define TEST_MDU_NR_UNITS 4
+#define TEST_MDU_NR_FRAGS 5
+static int test_skcipher_multi_du_one(struct crypto_skcipher *tfm,
+ unsigned int du_size)
+{
+ const char *driver = crypto_skcipher_driver_name(tfm);
+ const unsigned int total = du_size * TEST_MDU_NR_UNITS;
+ struct skcipher_request *req = NULL;
+ struct scatterlist sg[TEST_MDU_NR_FRAGS];
+ DECLARE_CRYPTO_WAIT(wait);
+ u8 iv_orig[16], iv_work[16], iv_ref[16];
+ u8 *plain = NULL, *buf = NULL, *ref = NULL;
+ unsigned int u;
+ int err;
+
+ plain = kmalloc(total, GFP_KERNEL);
+ buf = kmalloc(total, GFP_KERNEL);
+ ref = kmalloc(total, GFP_KERNEL);
+ req = skcipher_request_alloc(tfm, GFP_KERNEL);
+ if (!plain || !buf || !ref || !req) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ get_random_bytes(plain, total);
+ get_random_bytes(iv_orig, sizeof(iv_orig));
+
+ /* Reference: per-DU single requests with LE128-walked IVs. */
+ memcpy(iv_ref, iv_orig, sizeof(iv_orig));
+ for (u = 0; u < TEST_MDU_NR_UNITS; u++) {
+ err = test_mdu_ref_encrypt(tfm, plain + u * du_size,
+ ref + u * du_size, du_size, iv_ref);
+ /* First single-DU call reveals an ineligible algorithm. */
+ if (err == -EOPNOTSUPP && u == 0)
+ goto out;
+ if (err) {
+ pr_err("alg: skcipher: %s multi-DU ref encrypt failed (du=%u): %d\n",
+ driver, du_size, err);
+ goto out;
+ }
+ test_mdu_iv_inc(iv_ref);
+ }
+
+ /* Batched: one request over a fragmented SG. */
+ memcpy(buf, plain, total);
+ memcpy(iv_work, iv_orig, sizeof(iv_orig));
+ test_mdu_sg_fragment(sg, TEST_MDU_NR_FRAGS, buf, total);
+ skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP,
+ crypto_req_done, &wait);
+ skcipher_request_set_crypt(req, sg, sg, total, iv_work);
+ skcipher_request_set_data_unit_size(req, du_size);
+ err = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+ if (err == -EOPNOTSUPP)
+ goto out;
+ if (err) {
+ pr_err("alg: skcipher: %s multi-DU encrypt failed (du=%u): %d\n",
+ driver, du_size, err);
+ goto out;
+ }
+ if (memcmp(buf, ref, total) != 0) {
+ pr_err("alg: skcipher: %s multi-DU ciphertext differs from single-DU reference (du=%u)\n",
+ driver, du_size);
+ err = -EBADMSG;
+ goto out;
+ }
+ /* req->iv must be unchanged after multi-DU dispatch. */
+ if (memcmp(iv_work, iv_orig, sizeof(iv_orig)) != 0) {
+ pr_err("alg: skcipher: %s multi-DU encrypt mutated caller IV (du=%u)\n",
+ driver, du_size);
+ err = -EBADMSG;
+ goto out;
+ }
+
+ /* Round-trip the batched ciphertext back to plaintext. */
+ test_mdu_sg_fragment(sg, TEST_MDU_NR_FRAGS, buf, total);
+ skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP,
+ crypto_req_done, &wait);
+ skcipher_request_set_crypt(req, sg, sg, total, iv_work);
+ skcipher_request_set_data_unit_size(req, du_size);
+ err = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
+ if (err) {
+ pr_err("alg: skcipher: %s multi-DU decrypt failed (du=%u): %d\n",
+ driver, du_size, err);
+ goto out;
+ }
+ if (memcmp(buf, plain, total) != 0) {
+ pr_err("alg: skcipher: %s multi-DU round-trip mismatch (du=%u)\n",
+ driver, du_size);
+ err = -EBADMSG;
+ }
+
+out:
+ skcipher_request_free(req);
+ kfree(ref);
+ kfree(buf);
+ kfree(plain);
+ return err;
+}
+
+static int test_skcipher_multi_du(struct crypto_skcipher *tfm)
+{
+ static const unsigned int du_sizes[] = { 512, 1024, 2048, 4096 };
+ unsigned int j;
+ int err;
+
+ if (crypto_skcipher_ivsize(tfm) != 16)
+ return 0;
+
+ for (j = 0; j < ARRAY_SIZE(du_sizes); j++) {
+ err = test_skcipher_multi_du_one(tfm, du_sizes[j]);
+ /* Ineligible algorithms skip; real failures propagate. */
+ if (err == -EOPNOTSUPP)
+ return 0;
+ if (err)
+ return err;
+ cond_resched();
+ }
+ return 0;
+}
+
static int alg_test_skcipher(const struct alg_test_desc *desc,
const char *driver, u32 type, u32 mask)
{
@@ -3259,6 +3447,10 @@ static int alg_test_skcipher(const struct alg_test_desc *desc,
if (err)
goto out;
+ err = test_skcipher_multi_du(tfm);
+ if (err)
+ goto out;
+
err = test_skcipher_vs_generic_impl(desc->generic_driver, req, tsgls);
out:
free_cipher_test_sglists(tsgls);
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
@ 2026-06-15 11:14 ` Leonid Ravich
2026-06-15 22:53 ` [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Eric Biggers
3 siblings, 0 replies; 8+ messages in thread
From: Leonid Ravich @ 2026-06-15 11:14 UTC (permalink / raw)
To: Herbert Xu
Cc: Alasdair Kergon, Ard Biesheuvel, Eric Biggers, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
Submit one skcipher request per bio with
skcipher_request_set_data_unit_size(req, cc->sector_size) instead of
issuing one request per sector. This removes per-sector overhead in
the crypto API hot path: request allocation, callback dispatch,
completion handling, and SG setup.
The optimisation is enabled automatically at table load when all
of the following hold:
- the cipher is non-aead (i.e. skcipher), sync, tfms_count 1;
- the IV mode advertises sector_iv_le128, i.e. its per-sector IV
advances as a 128-bit LE counter, matching the convention
documented in skcipher_request_set_data_unit_size(). Only plain64
sets it today (its 64-bit LE counter extends correctly); plain is
excluded as its 32-bit counter wraps differently across a
2^32-sector boundary;
- ivsize is 16 (the core rejects other sizes with -EOPNOTSUPP);
- the iv_gen_ops->post() hook is unset;
- dm-integrity is not stacked (no integrity tag or integrity IV).
The cipher driver does not need to advertise anything: the crypto
API auto-splits multi-data-unit requests for drivers that cannot
handle them natively, so dm-crypt sees the same fast batched
submission contract regardless of the underlying driver.
A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
time, gates the multi-data-unit dispatch. The existing per-sector
path in crypt_convert_block_skcipher() is unchanged; the new
crypt_convert_block_skcipher_multi() is reached from a small
dispatch in crypt_convert() and shares the same backlog/-EBUSY/
-EINPROGRESS flow control with the per-sector path.
Heap-allocated scatterlists are stashed in dm_crypt_request and
freed in crypt_free_req_skcipher() to avoid races between the
synchronous-success free path and async-completion reuse from the
request pool. On scatterlist allocation failure the helper returns
-EAGAIN, and the core returns -EOPNOTSUPP if a driver turns out
unable to do multi-DU; crypt_convert() handles both by clearing its
local multi_du flag and falling back to the per-sector path for the
rest of the current crypt_convert() invocation, ensuring forward progress
on the swap-out-to-dm-crypt path even under total memory exhaustion
(the per-sector path uses only cc->req_pool, a mempool with
reservoir set up at table-load time, and the inline
dmreq->sg_in[]/sg_out[] arrays — no allocation that could fail).
Verified end-to-end with a byte-equivalence test: encrypted output
of plain64 dm-crypt with the multi-data-unit path matches output of
the single-data-unit path bit-for-bit over a 256 MB device, with
xts-aes-aesni driving the auto-split path.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
drivers/md/dm-crypt.c | 215 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 207 insertions(+), 8 deletions(-)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 608b617fb817..bfb98dd876d7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -101,6 +101,9 @@ struct dm_crypt_request {
struct scatterlist sg_in[4];
struct scatterlist sg_out[4];
u64 iv_sector;
+ /* Multi-data-unit SG arrays, NULL when sg_in[]/sg_out[] suffice. */
+ struct scatterlist *sg_in_ext;
+ struct scatterlist *sg_out_ext;
};
struct crypt_config;
@@ -115,6 +118,12 @@ struct crypt_iv_operations {
struct dm_crypt_request *dmreq);
void (*post)(struct crypt_config *cc, u8 *iv,
struct dm_crypt_request *dmreq);
+ /*
+ * The per-sector IV advances as a 128-bit LE counter, so a bio's
+ * consecutive sectors share one starting IV and can be batched into
+ * a single skcipher request via data_unit_size.
+ */
+ bool sector_iv_le128;
};
struct iv_benbi_private {
@@ -151,6 +160,7 @@ enum cipher_flags {
CRYPT_IV_LARGE_SECTORS, /* Calculate IV from sector_size, not 512B sectors */
CRYPT_ENCRYPT_PREPROCESS, /* Must preprocess data for encryption (elephant) */
CRYPT_KEY_MAC_SIZE_SET, /* The integrity_key_size option was used */
+ CRYPT_MULTI_DATA_UNIT, /* Batch all sectors of a bio per crypto request */
};
/*
@@ -1018,7 +1028,8 @@ static const struct crypt_iv_operations crypt_iv_plain_ops = {
};
static const struct crypt_iv_operations crypt_iv_plain64_ops = {
- .generator = crypt_iv_plain64_gen
+ .generator = crypt_iv_plain64_gen,
+ .sector_iv_le128 = true,
};
static const struct crypt_iv_operations crypt_iv_plain64be_ops = {
@@ -1426,12 +1437,126 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
return r;
}
+/*
+ * Submit all remaining sectors of the current bio in one skcipher request.
+ * Same return convention as crypt_convert_block_skcipher() except for
+ * -EAGAIN, which the caller must treat as "disable multi-DU and re-enter
+ * the per-sector path" so swap-out-to-dm-crypt always makes forward
+ * progress on the mempool reserve.
+ */
+static int crypt_convert_block_skcipher_multi(struct crypt_config *cc,
+ struct convert_context *ctx,
+ struct skcipher_request *req,
+ unsigned int *out_processed)
+{
+ const unsigned int sector_size = cc->sector_size;
+ const gfp_t gfp = GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN;
+ unsigned int total = ctx->iter_in.bi_size;
+ unsigned int n_sg_in = 0, n_sg_out = 0;
+ struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+ struct scatterlist *sg_in = NULL, *sg_out = NULL;
+ struct bvec_iter iter_in, iter_out;
+ struct bio_vec bv;
+ u8 *iv, *org_iv;
+ int r;
+
+ if (WARN_ON_ONCE(ctx->iter_in.bi_size != ctx->iter_out.bi_size))
+ return -EIO;
+ if (unlikely(total & (sector_size - 1)))
+ return -EIO;
+
+ iter_in = ctx->iter_in;
+ iter_in.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_in, iter_in, iter_in)
+ n_sg_in++;
+
+ iter_out = ctx->iter_out;
+ iter_out.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_out, iter_out, iter_out)
+ n_sg_out++;
+
+ sg_in = kmalloc_array(n_sg_in, sizeof(*sg_in), gfp);
+ sg_out = (ctx->bio_in == ctx->bio_out) ? sg_in :
+ kmalloc_array(n_sg_out, sizeof(*sg_out), gfp);
+ if (!sg_in || !sg_out) {
+ kfree(sg_in);
+ if (sg_out != sg_in)
+ kfree(sg_out);
+ return -EAGAIN;
+ }
+
+ sg_init_table(sg_in, n_sg_in);
+ {
+ unsigned int i = 0;
+
+ iter_in = ctx->iter_in;
+ iter_in.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_in, iter_in, iter_in)
+ sg_set_page(&sg_in[i++], bv.bv_page, bv.bv_len,
+ bv.bv_offset);
+ }
+
+ if (sg_out != sg_in) {
+ unsigned int i = 0;
+
+ sg_init_table(sg_out, n_sg_out);
+ iter_out = ctx->iter_out;
+ iter_out.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_out, iter_out, iter_out)
+ sg_set_page(&sg_out[i++], bv.bv_page, bv.bv_len,
+ bv.bv_offset);
+ }
+
+ dmreq->iv_sector = ctx->cc_sector;
+ if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+ dmreq->iv_sector >>= cc->sector_shift;
+ dmreq->ctx = ctx;
+
+ iv = iv_of_dmreq(cc, dmreq);
+ org_iv = org_iv_of_dmreq(cc, dmreq);
+ r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
+ if (r < 0)
+ goto out_free_sg;
+ memcpy(iv, org_iv, cc->iv_size);
+
+ dmreq->sg_in_ext = sg_in;
+ dmreq->sg_out_ext = (sg_out == sg_in) ? NULL : sg_out;
+
+ skcipher_request_set_crypt(req, sg_in, sg_out, total, iv);
+ skcipher_request_set_data_unit_size(req, sector_size);
+
+ if (bio_data_dir(ctx->bio_in) == WRITE)
+ r = crypto_skcipher_encrypt(req);
+ else
+ r = crypto_skcipher_decrypt(req);
+
+ /*
+ * Sync error: kcryptd_async_done won't run, so free the SG
+ * arrays here. Async returns (-EINPROGRESS, -EBUSY) hand
+ * ownership to the completion callback.
+ */
+ if (r && r != -EINPROGRESS && r != -EBUSY)
+ goto out_free_sg;
+
+ *out_processed = total;
+ return r;
+
+out_free_sg:
+ kfree(sg_in);
+ if (sg_out != sg_in)
+ kfree(sg_out);
+ dmreq->sg_in_ext = NULL;
+ dmreq->sg_out_ext = NULL;
+ return r;
+}
+
static void kcryptd_async_done(void *async_req, int error);
static int crypt_alloc_req_skcipher(struct crypt_config *cc,
struct convert_context *ctx)
{
unsigned int key_index = ctx->cc_sector & (cc->tfms_count - 1);
+ struct dm_crypt_request *dmreq;
if (!ctx->r.req) {
ctx->r.req = mempool_alloc(&cc->req_pool, in_interrupt() ? GFP_ATOMIC : GFP_NOIO);
@@ -1441,6 +1566,11 @@ static int crypt_alloc_req_skcipher(struct crypt_config *cc,
skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
+ /* Multi-DU SG arrays are owned by the helper that allocates them. */
+ dmreq = dmreq_of_req(cc, ctx->r.req);
+ dmreq->sg_in_ext = NULL;
+ dmreq->sg_out_ext = NULL;
+
/*
* Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
* requests if driver request queue is full.
@@ -1487,6 +1617,12 @@ static void crypt_free_req_skcipher(struct crypt_config *cc,
struct skcipher_request *req, struct bio *base_bio)
{
struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
+ struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+
+ kfree(dmreq->sg_in_ext);
+ dmreq->sg_in_ext = NULL;
+ kfree(dmreq->sg_out_ext);
+ dmreq->sg_out_ext = NULL;
if ((struct skcipher_request *)(io + 1) != req)
mempool_free(req, &cc->req_pool);
@@ -1515,7 +1651,9 @@ static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_
static blk_status_t crypt_convert(struct crypt_config *cc,
struct convert_context *ctx, bool atomic, bool reset_pending)
{
- unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+ const unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+ bool multi_du = test_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+ unsigned int processed;
int r;
/*
@@ -1536,8 +1674,13 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
atomic_inc(&ctx->cc_pending);
+ processed = cc->sector_size;
if (crypt_integrity_aead(cc))
r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, ctx->tag_offset);
+ else if (multi_du)
+ r = crypt_convert_block_skcipher_multi(cc, ctx,
+ ctx->r.req,
+ &processed);
else
r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, ctx->tag_offset);
@@ -1559,8 +1702,19 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
* exit and continue processing in a workqueue
*/
ctx->r.req = NULL;
- ctx->tag_offset++;
- ctx->cc_sector += sector_step;
+ if (!multi_du) {
+ ctx->tag_offset++;
+ ctx->cc_sector += sector_step;
+ } else {
+ bio_advance_iter(ctx->bio_in,
+ &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out,
+ &ctx->iter_out,
+ processed);
+ ctx->cc_sector +=
+ processed >> SECTOR_SHIFT;
+ }
return BLK_STS_DEV_RESOURCE;
}
} else {
@@ -1574,19 +1728,41 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
*/
case -EINPROGRESS:
ctx->r.req = NULL;
- ctx->tag_offset++;
- ctx->cc_sector += sector_step;
+ if (!multi_du) {
+ ctx->tag_offset++;
+ ctx->cc_sector += sector_step;
+ } else {
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+ processed);
+ ctx->cc_sector += processed >> SECTOR_SHIFT;
+ }
continue;
/*
* The request was already processed (synchronously).
*/
case 0:
atomic_dec(&ctx->cc_pending);
- ctx->cc_sector += sector_step;
- ctx->tag_offset++;
+ if (!multi_du) {
+ ctx->cc_sector += sector_step;
+ ctx->tag_offset++;
+ } else {
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+ processed);
+ ctx->cc_sector += processed >> SECTOR_SHIFT;
+ }
if (!atomic)
cond_resched();
continue;
+ /* Multi-DU rejected (no memory or sync-only mismatch): fall back. */
+ case -EAGAIN:
+ case -EOPNOTSUPP:
+ atomic_dec(&ctx->cc_pending);
+ multi_du = false;
+ continue;
/*
* There was a data integrity error.
*/
@@ -3063,6 +3239,29 @@ static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
}
}
+ /*
+ * Enable multi-data-unit batching only when per-DU IVs can be
+ * derived from one starting IV as a 128-bit LE counter, matching
+ * skcipher_request_set_data_unit_size(). Only IV modes flagged
+ * sector_iv_le128 qualify (plain64; not plain, whose 32-bit counter
+ * wraps differently across a 2^32-sector boundary). ivsize must be
+ * 16 (the core rejects otherwise) and the cipher must be sync,
+ * single-tfm, no integrity, no per-sector post() hook. The driver
+ * advertises nothing: the core auto-splits for drivers that lack
+ * native support.
+ */
+ if (!crypt_integrity_aead(cc) && cc->tfms_count == 1 &&
+ cc->iv_gen_ops && cc->iv_gen_ops->sector_iv_le128 &&
+ !cc->iv_gen_ops->post &&
+ !cc->integrity_tag_size && !cc->integrity_iv_size &&
+ crypto_skcipher_ivsize(any_tfm(cc)) == 16 &&
+ !(crypto_skcipher_alg(any_tfm(cc))->base.cra_flags &
+ CRYPTO_ALG_ASYNC)) {
+ set_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+ DMINFO("Using multi-data-unit crypto offload (du=%u)",
+ cc->sector_size);
+ }
+
/* wipe the kernel key payload copy */
if (cc->key_string)
memset(cc->key, 0, cc->key_size * sizeof(u8));
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
` (2 preceding siblings ...)
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
@ 2026-06-15 22:53 ` Eric Biggers
2026-06-16 4:13 ` Herbert Xu
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2026-06-15 22:53 UTC (permalink / raw)
To: Leonid Ravich
Cc: Herbert Xu, Alasdair Kergon, Ard Biesheuvel, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
On Mon, Jun 15, 2026 at 11:14:56AM +0000, Leonid Ravich wrote:
> The series adds a per-request "data unit size" to the skcipher API
> so a caller can submit several data units (typically 512..4096-byte
> sectors) sharing one starting IV in a single request. Algorithms
> derive each data unit's IV from the caller-supplied IV by treating
> it as a 128-bit little-endian counter and adding the data-unit
> index, matching the layout produced by dm-crypt's plain64 IV mode
> and by typical inline-encryption hardware.
>
> This mirrors the data_unit_size concept already exposed by
> struct blk_crypto_config for inline encryption.
>
> The first user is dm-crypt, which today issues one skcipher request
> per sector and so pays a per-sector cost in request allocation,
> callback dispatch, completion handling, and scatterlist setup.
>
> Proof-of-concept performance numbers from the RFC reply [1]: +19%
> throughput / -40% CPU on a single-core arm64 system with a hardware
> XTS-AES-256 accelerator running fio 4 KiB sequential writes through
> dm-crypt, when an out-of-tree arm64 xts driver advertises
> CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. This series itself does not
> include arch enablement; the fast path is opt-in per driver, the
> slow path is universal via the auto-splitter.
>
> The native fast path amortises both per-sector dispatch and per-sector
> crypto setup across a bio - the measured win above, on an engine that
> offloads the AES compute. The auto-splitter is for correctness and
> reach: any consumer can set data_unit_size and get correct output with
> the per-request allocation/callback/completion cost removed, but it
> still issues one alg->encrypt per data unit, so on a software cipher it
> saves only dispatch overhead (no throughput figure claimed - that is
> hardware- and workload-dependent). What it guarantees unconditionally
> is byte-identical output (Verification below) at O(entries + units),
> walking the scatterlists with a pair of struct scatter_walk cursors
> rather than rescanning from the head per unit.
So in other words, this series slows down dm-crypt and crypto_skcipher
for everyone to optimize for an out-of-tree driver. And there's also no
benchmark showing that your driver is even worth it over just using the
CPU.
- Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
2026-06-15 22:53 ` [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Eric Biggers
@ 2026-06-16 4:13 ` Herbert Xu
2026-06-16 4:50 ` Eric Biggers
0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2026-06-16 4:13 UTC (permalink / raw)
To: Eric Biggers
Cc: Leonid Ravich, Alasdair Kergon, Ard Biesheuvel, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
On Mon, Jun 15, 2026 at 03:53:17PM -0700, Eric Biggers wrote:
>
> So in other words, this series slows down dm-crypt and crypto_skcipher
> for everyone to optimize for an out-of-tree driver. And there's also no
> benchmark showing that your driver is even worth it over just using the
> CPU.
There is no reason why the software fallback should be slower
than the status quo. Existing callers of the Crypto API will
be issuing one indirect function call per data unit. With the
new scheme, the indirect calls per unit moves from from the caller
into the Crypto API.
In fact, we could move it down further and improve upon the
status quo by splitting the data in each algorithm implemntation
so that the calls per unit become direct function calls and only
the overall call into the Crypto API remains indirect.
But yes it would be nice to provide numbers for the fallback
path to verify that we didn't get this case terribly wrong.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
2026-06-16 4:13 ` Herbert Xu
@ 2026-06-16 4:50 ` Eric Biggers
2026-06-16 4:53 ` Herbert Xu
0 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2026-06-16 4:50 UTC (permalink / raw)
To: Herbert Xu
Cc: Leonid Ravich, Alasdair Kergon, Ard Biesheuvel, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
On Tue, Jun 16, 2026 at 12:13:03PM +0800, Herbert Xu wrote:
> On Mon, Jun 15, 2026 at 03:53:17PM -0700, Eric Biggers wrote:
> >
> > So in other words, this series slows down dm-crypt and crypto_skcipher
> > for everyone to optimize for an out-of-tree driver. And there's also no
> > benchmark showing that your driver is even worth it over just using the
> > CPU.
>
> There is no reason why the software fallback should be slower
> than the status quo. Existing callers of the Crypto API will
> be issuing one indirect function call per data unit. With the
> new scheme, the indirect calls per unit moves from from the caller
> into the Crypto API.
Have you checked the code? This patchset adds overhead in multiple
places. Dynamically allocating multiple scatterlists and then parsing
them, adding a new field to skcipher_request for everyone, new checks in
crypto_skcipher_en/decrypt for everyone, new checks to validate the data
unit size that the caller knew was valid in the first place, etc.
> In fact, we could move it down further and improve upon the
> status quo by splitting the data in each algorithm implemntation
> so that the calls per unit become direct function calls and only
> the overall call into the Crypto API remains indirect.
That's not what this patchset does. But also, as we know, a better way
to eliminate "Crypto API" overhead is to call the algorithms directly.
- Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
2026-06-16 4:50 ` Eric Biggers
@ 2026-06-16 4:53 ` Herbert Xu
0 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2026-06-16 4:53 UTC (permalink / raw)
To: Eric Biggers
Cc: Leonid Ravich, Alasdair Kergon, Ard Biesheuvel, Jens Axboe,
Horia Geanta, Gilad Ben-Yossef, linux-crypto, dm-devel,
linux-block
On Mon, Jun 15, 2026 at 09:50:23PM -0700, Eric Biggers wrote:
>
> Have you checked the code? This patchset adds overhead in multiple
> places. Dynamically allocating multiple scatterlists and then parsing
> them, adding a new field to skcipher_request for everyone, new checks in
> crypto_skcipher_en/decrypt for everyone, new checks to validate the data
> unit size that the caller knew was valid in the first place, etc.
No I have not :)
I'm just stating the general principle.
Of course I will not apply the patch-set until I have reviewed it
properly.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-16 4:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
2026-06-15 22:53 ` [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Eric Biggers
2026-06-16 4:13 ` Herbert Xu
2026-06-16 4:50 ` Eric Biggers
2026-06-16 4:53 ` Herbert Xu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.