[PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching

Linux block layer
 help / color / mirror / Atom feed

From: Leonid Ravich <lravich@amazon.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alasdair Kergon <agk@redhat.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	"Eric Biggers" <ebiggers@kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Horia Geanta <horia.geanta@nxp.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	<linux-crypto@vger.kernel.org>, <dm-devel@lists.linux.dev>,
	<linux-block@vger.kernel.org>
Subject: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
Date: Mon, 15 Jun 2026 11:14:56 +0000	[thread overview]
Message-ID: <20260615111459.9452-1-lravich@amazon.com> (raw)

This is v4, addressing Herbert's review of v3.  Two architectural
changes:

  - data_unit_size is now per-request (on struct skcipher_request)
    rather than per-tfm.  Reverts to the v1 placement.

  - The crypto API auto-splits multi-data-unit requests when the
    underlying algorithm does not advertise
    CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU.  Consumers no longer test
    for multi-DU support before submitting; setting data_unit_size
    on any skcipher request whose algorithm uses the 128-bit LE
    counter IV convention "just works".

These two changes shrink the series from 4 patches to 3 (the
generic xts(...) template needs no special handling - the
auto-splitter calls its single-DU encrypt/decrypt once per data
unit) and simplify the dm-crypt consumer (no advertise-flag check,
no per-tfm setup).

v3: https://lore.kernel.org/linux-crypto/20260601085641.16028-1-lravich@amazon.com/
v2: https://lore.kernel.org/linux-crypto/20260527065021.19525-1-lravich@amazon.com/
v1: https://lore.kernel.org/linux-crypto/20260519115955.27267-1-lravich@amazon.com/

The series adds a per-request "data unit size" to the skcipher API
so a caller can submit several data units (typically 512..4096-byte
sectors) sharing one starting IV in a single request.  Algorithms
derive each data unit's IV from the caller-supplied IV by treating
it as a 128-bit little-endian counter and adding the data-unit
index, matching the layout produced by dm-crypt's plain64 IV mode
and by typical inline-encryption hardware.

This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.

The first user is dm-crypt, which today issues one skcipher request
per sector and so pays a per-sector cost in request allocation,
callback dispatch, completion handling, and scatterlist setup.

Proof-of-concept performance numbers from the RFC reply [1]: +19%
throughput / -40% CPU on a single-core arm64 system with a hardware
XTS-AES-256 accelerator running fio 4 KiB sequential writes through
dm-crypt, when an out-of-tree arm64 xts driver advertises
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU.  This series itself does not
include arch enablement; the fast path is opt-in per driver, the
slow path is universal via the auto-splitter.

The native fast path amortises both per-sector dispatch and per-sector
crypto setup across a bio - the measured win above, on an engine that
offloads the AES compute.  The auto-splitter is for correctness and
reach: any consumer can set data_unit_size and get correct output with
the per-request allocation/callback/completion cost removed, but it
still issues one alg->encrypt per data unit, so on a software cipher it
saves only dispatch overhead (no throughput figure claimed - that is
hardware- and workload-dependent).  What it guarantees unconditionally
is byte-identical output (Verification below) at O(entries + units),
walking the scatterlists with a pair of struct scatter_walk cursors
rather than rescanning from the head per unit.

[1] https://lore.kernel.org/linux-crypto/20260428101225.24316-1-lravich@amazon.com/

Changes since v3
----------------

- data_unit_size moved from struct crypto_skcipher (per-tfm) to
  struct skcipher_request (per-request).  (Herbert)

- Crypto API auto-splits multi-data-unit requests when the algorithm
  does not advertise CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU.  Drops the
  per-tfm setter/probe in favour of a single
  skcipher_request_set_data_unit_size() usable by every consumer.
  (Herbert)

- CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU is a type-specific cra_flags
  bit (0x01000000) in crypto/internal/skcipher.h, not a generic bit
  in the public header; drivers set it to opt OUT of auto-splitting.

- The auto-splitter advances through src/dst with a pair of struct
  scatter_walk cursors (scatterwalk_start / scatterwalk_get_sglist /
  scatterwalk_skip) instead of scatterwalk_ffwd() per unit, which
  rescans from the head and is O(units^2) under fragmentation; the
  cursors give a single linear pass.  (Eric)

- crypto_skcipher_validate_multi_du() reports -EINVAL for a malformed
  geometry (du not a power of two, cryptlen not a positive multiple)
  and -EOPNOTSUPP for a target that cannot do multi-DU (ivsize != 16,
  lskcipher, or async without the native flag), so a caller can fall
  back.  Gates the native path too, not just the auto-splitter.
  (Eric)

- testmgr cross-checks the batched dispatch against an independent
  N x single-DU reference with LE128-walked IVs over a fragmented
  scatterlist (pins the IV convention and exercises the cursor),
  round-trips, and checks IV preservation.  Ineligible algorithms
  skip via -EOPNOTSUPP; a real mismatch returns -EBADMSG.

- dm-crypt enables batching only for IV modes flagged sector_iv_le128
  (a new bool on struct crypt_iv_operations, set on plain64 only),
  plus ivsize 16, sync, single-tfm, no integrity, no post() hook.  The
  flag replaces a hardcoded plain64 pointer-compare, so eligibility is
  a self-documenting property of the IV mode rather than a special
  case.  plain stays excluded (its 32-bit counter wraps differently
  past 2^32 sectors).  Sets req->data_unit_size = sector_size and
  submits; -EOPNOTSUPP/-EAGAIN fall back to the per-sector path.
  Mikulas's v2 Reviewed-by is dropped as the dm-crypt patch was
  substantially rewritten.

- The generic xts(...) template needs no separate handling, dropping
  the v3 crypto/xts.c patch (4 -> 3 patches).

Design overview
---------------

* Patch 1 adds the data_unit_size field, the setter, the
  CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU flag, and the auto-splitter in
  crypto_skcipher_encrypt()/decrypt().  skcipher_request_set_tfm()
  resets the field so a reused request defaults to single-DU.

* Patch 2 adds the testmgr multi-DU test (every ivsize == 16
  skcipher).

* Patch 3 turns dm-crypt batching on automatically under the
  conditions above and sets req->data_unit_size = cc->sector_size.

This series does NOT add the capability flag to any arch driver; the
auto-splitter ensures correctness without that opt-in.

Verification
------------

A regression protocol is included in the project tree
(.claude/regression-protocol.md, .claude/run-regression.sh).  The
reference run reports 12/12 PASS:

  - x86 + arm64 build clean; checkpatch.pl --strict clean.
  - testmgr multi-DU: PASS for every ivsize == 16 skcipher in-tree.
  - dm-crypt activation gating: plain64 enabled; essiv:sha256 /
    plain64be / plain fall back.
  - dm-crypt round-trip plain64 with multi-DU via the auto-splitter
    (xts-aes-aesni, no native flag): PASS.
  - dm-crypt round-trip essiv:sha256 (per-sector path): PASS.
  - dm-crypt low-memory (mem=128M): PASS, no OOM kill.
  - Byte-equivalence: 256 MB of ciphertext through the auto-splitter
    is bit-identical to an unpatched axboe/for-next baseline (sha256
    4913910b1aa6f8859fcb8f4adec20230274993a3ade8f4dd0140a323dc43efc0).
  - arm64 functional under qemu-aarch64: PASS.

Leonid Ravich (3):
  crypto: skcipher - add per-request data_unit_size with auto-splitting
  crypto: testmgr - test for multi-data-unit dispatch
  dm crypt: batch all sectors of a bio per crypto request

 crypto/skcipher.c                  | 132 +++++++++++++++++++
 crypto/testmgr.c                   | 192 +++++++++++++++++++++++++
 drivers/md/dm-crypt.c              | 215 +++++++++++++++++++++++++++--
 include/crypto/internal/skcipher.h |  10 ++
 include/crypto/skcipher.h          |  28 ++++
 5 files changed, 569 insertions(+), 8 deletions(-)

base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
--
2.47.3

next             reply	other threads:[~2026-06-15 11:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 11:14 Leonid Ravich [this message]
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request Leonid Ravich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260615111459.9452-1-lravich@amazon.com \
    --to=lravich@amazon.com \
    --cc=agk@redhat.com \
    --cc=ardb@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=ebiggers@kernel.org \
    --cc=gilad@benyossef.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=horia.geanta@nxp.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox