From: Leonid Ravich <lravich@amazon.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alasdair Kergon <agk@redhat.com>,
Ard Biesheuvel <ardb@kernel.org>,
"Eric Biggers" <ebiggers@kernel.org>,
Jens Axboe <axboe@kernel.dk>, Horia Geanta <horia.geanta@nxp.com>,
Gilad Ben-Yossef <gilad@benyossef.com>,
<linux-crypto@vger.kernel.org>, <dm-devel@lists.linux.dev>,
<linux-block@vger.kernel.org>
Subject: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
Date: Mon, 15 Jun 2026 11:14:56 +0000 [thread overview]
Message-ID: <20260615111459.9452-1-lravich@amazon.com> (raw)
This is v4, addressing Herbert's review of v3. Two architectural
changes:
- data_unit_size is now per-request (on struct skcipher_request)
rather than per-tfm. Reverts to the v1 placement.
- The crypto API auto-splits multi-data-unit requests when the
underlying algorithm does not advertise
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. Consumers no longer test
for multi-DU support before submitting; setting data_unit_size
on any skcipher request whose algorithm uses the 128-bit LE
counter IV convention "just works".
These two changes shrink the series from 4 patches to 3 (the
generic xts(...) template needs no special handling - the
auto-splitter calls its single-DU encrypt/decrypt once per data
unit) and simplify the dm-crypt consumer (no advertise-flag check,
no per-tfm setup).
v3: https://lore.kernel.org/linux-crypto/20260601085641.16028-1-lravich@amazon.com/
v2: https://lore.kernel.org/linux-crypto/20260527065021.19525-1-lravich@amazon.com/
v1: https://lore.kernel.org/linux-crypto/20260519115955.27267-1-lravich@amazon.com/
The series adds a per-request "data unit size" to the skcipher API
so a caller can submit several data units (typically 512..4096-byte
sectors) sharing one starting IV in a single request. Algorithms
derive each data unit's IV from the caller-supplied IV by treating
it as a 128-bit little-endian counter and adding the data-unit
index, matching the layout produced by dm-crypt's plain64 IV mode
and by typical inline-encryption hardware.
This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.
The first user is dm-crypt, which today issues one skcipher request
per sector and so pays a per-sector cost in request allocation,
callback dispatch, completion handling, and scatterlist setup.
Proof-of-concept performance numbers from the RFC reply [1]: +19%
throughput / -40% CPU on a single-core arm64 system with a hardware
XTS-AES-256 accelerator running fio 4 KiB sequential writes through
dm-crypt, when an out-of-tree arm64 xts driver advertises
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. This series itself does not
include arch enablement; the fast path is opt-in per driver, the
slow path is universal via the auto-splitter.
The native fast path amortises both per-sector dispatch and per-sector
crypto setup across a bio - the measured win above, on an engine that
offloads the AES compute. The auto-splitter is for correctness and
reach: any consumer can set data_unit_size and get correct output with
the per-request allocation/callback/completion cost removed, but it
still issues one alg->encrypt per data unit, so on a software cipher it
saves only dispatch overhead (no throughput figure claimed - that is
hardware- and workload-dependent). What it guarantees unconditionally
is byte-identical output (Verification below) at O(entries + units),
walking the scatterlists with a pair of struct scatter_walk cursors
rather than rescanning from the head per unit.
[1] https://lore.kernel.org/linux-crypto/20260428101225.24316-1-lravich@amazon.com/
Changes since v3
----------------
- data_unit_size moved from struct crypto_skcipher (per-tfm) to
struct skcipher_request (per-request). (Herbert)
- Crypto API auto-splits multi-data-unit requests when the algorithm
does not advertise CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. Drops the
per-tfm setter/probe in favour of a single
skcipher_request_set_data_unit_size() usable by every consumer.
(Herbert)
- CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU is a type-specific cra_flags
bit (0x01000000) in crypto/internal/skcipher.h, not a generic bit
in the public header; drivers set it to opt OUT of auto-splitting.
- The auto-splitter advances through src/dst with a pair of struct
scatter_walk cursors (scatterwalk_start / scatterwalk_get_sglist /
scatterwalk_skip) instead of scatterwalk_ffwd() per unit, which
rescans from the head and is O(units^2) under fragmentation; the
cursors give a single linear pass. (Eric)
- crypto_skcipher_validate_multi_du() reports -EINVAL for a malformed
geometry (du not a power of two, cryptlen not a positive multiple)
and -EOPNOTSUPP for a target that cannot do multi-DU (ivsize != 16,
lskcipher, or async without the native flag), so a caller can fall
back. Gates the native path too, not just the auto-splitter.
(Eric)
- testmgr cross-checks the batched dispatch against an independent
N x single-DU reference with LE128-walked IVs over a fragmented
scatterlist (pins the IV convention and exercises the cursor),
round-trips, and checks IV preservation. Ineligible algorithms
skip via -EOPNOTSUPP; a real mismatch returns -EBADMSG.
- dm-crypt enables batching only for IV modes flagged sector_iv_le128
(a new bool on struct crypt_iv_operations, set on plain64 only),
plus ivsize 16, sync, single-tfm, no integrity, no post() hook. The
flag replaces a hardcoded plain64 pointer-compare, so eligibility is
a self-documenting property of the IV mode rather than a special
case. plain stays excluded (its 32-bit counter wraps differently
past 2^32 sectors). Sets req->data_unit_size = sector_size and
submits; -EOPNOTSUPP/-EAGAIN fall back to the per-sector path.
Mikulas's v2 Reviewed-by is dropped as the dm-crypt patch was
substantially rewritten.
- The generic xts(...) template needs no separate handling, dropping
the v3 crypto/xts.c patch (4 -> 3 patches).
Design overview
---------------
* Patch 1 adds the data_unit_size field, the setter, the
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU flag, and the auto-splitter in
crypto_skcipher_encrypt()/decrypt(). skcipher_request_set_tfm()
resets the field so a reused request defaults to single-DU.
* Patch 2 adds the testmgr multi-DU test (every ivsize == 16
skcipher).
* Patch 3 turns dm-crypt batching on automatically under the
conditions above and sets req->data_unit_size = cc->sector_size.
This series does NOT add the capability flag to any arch driver; the
auto-splitter ensures correctness without that opt-in.
Verification
------------
A regression protocol is included in the project tree
(.claude/regression-protocol.md, .claude/run-regression.sh). The
reference run reports 12/12 PASS:
- x86 + arm64 build clean; checkpatch.pl --strict clean.
- testmgr multi-DU: PASS for every ivsize == 16 skcipher in-tree.
- dm-crypt activation gating: plain64 enabled; essiv:sha256 /
plain64be / plain fall back.
- dm-crypt round-trip plain64 with multi-DU via the auto-splitter
(xts-aes-aesni, no native flag): PASS.
- dm-crypt round-trip essiv:sha256 (per-sector path): PASS.
- dm-crypt low-memory (mem=128M): PASS, no OOM kill.
- Byte-equivalence: 256 MB of ciphertext through the auto-splitter
is bit-identical to an unpatched axboe/for-next baseline (sha256
4913910b1aa6f8859fcb8f4adec20230274993a3ade8f4dd0140a323dc43efc0).
- arm64 functional under qemu-aarch64: PASS.
Leonid Ravich (3):
crypto: skcipher - add per-request data_unit_size with auto-splitting
crypto: testmgr - test for multi-data-unit dispatch
dm crypt: batch all sectors of a bio per crypto request
crypto/skcipher.c | 132 +++++++++++++++++++
crypto/testmgr.c | 192 +++++++++++++++++++++++++
drivers/md/dm-crypt.c | 215 +++++++++++++++++++++++++++--
include/crypto/internal/skcipher.h | 10 ++
include/crypto/skcipher.h | 28 ++++
5 files changed, 569 insertions(+), 8 deletions(-)
base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
--
2.47.3
next reply other threads:[~2026-06-15 11:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 11:14 Leonid Ravich [this message]
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260615111459.9452-1-lravich@amazon.com \
--to=lravich@amazon.com \
--cc=agk@redhat.com \
--cc=ardb@kernel.org \
--cc=axboe@kernel.dk \
--cc=dm-devel@lists.linux.dev \
--cc=ebiggers@kernel.org \
--cc=gilad@benyossef.com \
--cc=herbert@gondor.apana.org.au \
--cc=horia.geanta@nxp.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox