From: Leonid Ravich <lravich@amazon.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alasdair Kergon <agk@redhat.com>,
Ard Biesheuvel <ardb@kernel.org>,
"Eric Biggers" <ebiggers@kernel.org>,
Jens Axboe <axboe@kernel.dk>, Horia Geanta <horia.geanta@nxp.com>,
Gilad Ben-Yossef <gilad@benyossef.com>,
<linux-crypto@vger.kernel.org>, <dm-devel@lists.linux.dev>,
<linux-block@vger.kernel.org>
Subject: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
Date: Mon, 15 Jun 2026 11:14:56 +0000 [thread overview]
Message-ID: <20260615111459.9452-1-lravich@amazon.com> (raw)
This is v4, addressing Herbert's review of v3. Two architectural
changes:
- data_unit_size is now per-request (on struct skcipher_request)
rather than per-tfm. Reverts to the v1 placement.
- The crypto API auto-splits multi-data-unit requests when the
underlying algorithm does not advertise
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. Consumers no longer test
for multi-DU support before submitting; setting data_unit_size
on any skcipher request whose algorithm uses the 128-bit LE
counter IV convention "just works".
These two changes shrink the series from 4 patches to 3 (the
generic xts(...) template needs no special handling - the
auto-splitter calls its single-DU encrypt/decrypt once per data
unit) and simplify the dm-crypt consumer (no advertise-flag check,
no per-tfm setup).
v3: https://lore.kernel.org/linux-crypto/20260601085641.16028-1-lravich@amazon.com/
v2: https://lore.kernel.org/linux-crypto/20260527065021.19525-1-lravich@amazon.com/
v1: https://lore.kernel.org/linux-crypto/20260519115955.27267-1-lravich@amazon.com/
The series adds a per-request "data unit size" to the skcipher API
so a caller can submit several data units (typically 512..4096-byte
sectors) sharing one starting IV in a single request. Algorithms
derive each data unit's IV from the caller-supplied IV by treating
it as a 128-bit little-endian counter and adding the data-unit
index, matching the layout produced by dm-crypt's plain64 IV mode
and by typical inline-encryption hardware.
This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.
The first user is dm-crypt, which today issues one skcipher request
per sector and so pays a per-sector cost in request allocation,
callback dispatch, completion handling, and scatterlist setup.
Proof-of-concept performance numbers from the RFC reply [1]: +19%
throughput / -40% CPU on a single-core arm64 system with a hardware
XTS-AES-256 accelerator running fio 4 KiB sequential writes through
dm-crypt, when an out-of-tree arm64 xts driver advertises
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. This series itself does not
include arch enablement; the fast path is opt-in per driver, the
slow path is universal via the auto-splitter.
The native fast path amortises both per-sector dispatch and per-sector
crypto setup across a bio - the measured win above, on an engine that
offloads the AES compute. The auto-splitter is for correctness and
reach: any consumer can set data_unit_size and get correct output with
the per-request allocation/callback/completion cost removed, but it
still issues one alg->encrypt per data unit, so on a software cipher it
saves only dispatch overhead (no throughput figure claimed - that is
hardware- and workload-dependent). What it guarantees unconditionally
is byte-identical output (Verification below) at O(entries + units),
walking the scatterlists with a pair of struct scatter_walk cursors
rather than rescanning from the head per unit.
[1] https://lore.kernel.org/linux-crypto/20260428101225.24316-1-lravich@amazon.com/
Changes since v3
----------------
- data_unit_size moved from struct crypto_skcipher (per-tfm) to
struct skcipher_request (per-request). (Herbert)
- Crypto API auto-splits multi-data-unit requests when the algorithm
does not advertise CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. Drops the
per-tfm setter/probe in favour of a single
skcipher_request_set_data_unit_size() usable by every consumer.
(Herbert)
- CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU is a type-specific cra_flags
bit (0x01000000) in crypto/internal/skcipher.h, not a generic bit
in the public header; drivers set it to opt OUT of auto-splitting.
- The auto-splitter advances through src/dst with a pair of struct
scatter_walk cursors (scatterwalk_start / scatterwalk_get_sglist /
scatterwalk_skip) instead of scatterwalk_ffwd() per unit, which
rescans from the head and is O(units^2) under fragmentation; the
cursors give a single linear pass. (Eric)
- crypto_skcipher_validate_multi_du() reports -EINVAL for a malformed
geometry (du not a power of two, cryptlen not a positive multiple)
and -EOPNOTSUPP for a target that cannot do multi-DU (ivsize != 16,
lskcipher, or async without the native flag), so a caller can fall
back. Gates the native path too, not just the auto-splitter.
(Eric)
- testmgr cross-checks the batched dispatch against an independent
N x single-DU reference with LE128-walked IVs over a fragmented
scatterlist (pins the IV convention and exercises the cursor),
round-trips, and checks IV preservation. Ineligible algorithms
skip via -EOPNOTSUPP; a real mismatch returns -EBADMSG.
- dm-crypt enables batching only for IV modes flagged sector_iv_le128
(a new bool on struct crypt_iv_operations, set on plain64 only),
plus ivsize 16, sync, single-tfm, no integrity, no post() hook. The
flag replaces a hardcoded plain64 pointer-compare, so eligibility is
a self-documenting property of the IV mode rather than a special
case. plain stays excluded (its 32-bit counter wraps differently
past 2^32 sectors). Sets req->data_unit_size = sector_size and
submits; -EOPNOTSUPP/-EAGAIN fall back to the per-sector path.
Mikulas's v2 Reviewed-by is dropped as the dm-crypt patch was
substantially rewritten.
- The generic xts(...) template needs no separate handling, dropping
the v3 crypto/xts.c patch (4 -> 3 patches).
Design overview
---------------
* Patch 1 adds the data_unit_size field, the setter, the
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU flag, and the auto-splitter in
crypto_skcipher_encrypt()/decrypt(). skcipher_request_set_tfm()
resets the field so a reused request defaults to single-DU.
* Patch 2 adds the testmgr multi-DU test (every ivsize == 16
skcipher).
* Patch 3 turns dm-crypt batching on automatically under the
conditions above and sets req->data_unit_size = cc->sector_size.
This series does NOT add the capability flag to any arch driver; the
auto-splitter ensures correctness without that opt-in.
Verification
------------
A regression protocol is included in the project tree
(.claude/regression-protocol.md, .claude/run-regression.sh). The
reference run reports 12/12 PASS:
- x86 + arm64 build clean; checkpatch.pl --strict clean.
- testmgr multi-DU: PASS for every ivsize == 16 skcipher in-tree.
- dm-crypt activation gating: plain64 enabled; essiv:sha256 /
plain64be / plain fall back.
- dm-crypt round-trip plain64 with multi-DU via the auto-splitter
(xts-aes-aesni, no native flag): PASS.
- dm-crypt round-trip essiv:sha256 (per-sector path): PASS.
- dm-crypt low-memory (mem=128M): PASS, no OOM kill.
- Byte-equivalence: 256 MB of ciphertext through the auto-splitter
is bit-identical to an unpatched axboe/for-next baseline (sha256
4913910b1aa6f8859fcb8f4adec20230274993a3ade8f4dd0140a323dc43efc0).
- arm64 functional under qemu-aarch64: PASS.
Leonid Ravich (3):
crypto: skcipher - add per-request data_unit_size with auto-splitting
crypto: testmgr - test for multi-data-unit dispatch
dm crypt: batch all sectors of a bio per crypto request
crypto/skcipher.c | 132 +++++++++++++++++++
crypto/testmgr.c | 192 +++++++++++++++++++++++++
drivers/md/dm-crypt.c | 215 +++++++++++++++++++++++++++--
include/crypto/internal/skcipher.h | 10 ++
include/crypto/skcipher.h | 28 ++++
5 files changed, 569 insertions(+), 8 deletions(-)
base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
--
2.47.3
next reply other threads:[~2026-06-15 11:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 11:14 Leonid Ravich [this message]
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
2026-06-15 22:53 ` [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Eric Biggers
2026-06-16 4:13 ` Herbert Xu
2026-06-16 4:50 ` Eric Biggers
2026-06-16 4:53 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260615111459.9452-1-lravich@amazon.com \
--to=lravich@amazon.com \
--cc=agk@redhat.com \
--cc=ardb@kernel.org \
--cc=axboe@kernel.dk \
--cc=dm-devel@lists.linux.dev \
--cc=ebiggers@kernel.org \
--cc=gilad@benyossef.com \
--cc=herbert@gondor.apana.org.au \
--cc=horia.geanta@nxp.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.