Linux cryptographic layer development
 help / color / mirror / Atom feed
* [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template
@ 2026-06-30  8:34 Leonid Ravich
  2026-06-30  8:34 ` [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size Leonid Ravich
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Leonid Ravich @ 2026-06-30  8:34 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: linux-block, linux-kernel, herbert, davem, ebiggers, snitzer,
	mpatocka, axboe

This is v5. It reworks the multi-data-unit support from the in-core
auto-splitter of v4 into a crypto template, dun(...), addressing the v4
review: there is now no added cost on the core skcipher path, no
per-algorithm capability flag, and the per-data-unit split lives in an
algorithm rather than in crypto_skcipher_encrypt/decrypt the shape
Herbert suggested, which removes the "overhead for everyone" Eric
objected to.

v4: https://lore.kernel.org/linux-crypto/20260615105022.8025-1-lravich@amazon.com/

Model
---

A skcipher_request gains a data_unit_size field (patch 1). When set,
the request covers cryptlen / data_unit_size data units sharing one
starting IV; per-unit IVs are derived from the IV as a wide data-unit-
number (DUN) counter the convention blk-crypto already uses for
inline encryption.

dun(...) (patch 2) is a template that wraps an inner skcipher whose IV
is that counter (e.g. dun(xts(aes),le)). Its ->encrypt/->decrypt split
the request into one inner call per data unit, walking the IV +1 each
unit; each inner call is direct, so only the outer dispatch into the API
is indirect. A plain skcipher is unchanged and ignores data_unit_size,
so existing callers pay nothing the field is inert and the core
en/decrypt path is untouched.

The second template parameter selects how the per-unit IV advances. A
neighbour relates by a +1 step in exactly one of two ways, little- or
big-endian, so dun(...,le) / dun(...,be) is a closed parameter space,
not an open-ended set of "IV types". Internally each is one row of a
small struct dun_mode op table (an iv_next walk plus an ivsize
predicate); adding a future convention e.g. a width-bounded counter,
or an affine sector<<shift+k step is one row, with the dispatch loop
unchanged. IV constructions that are not such a counter are simply not
wrapped (the consumer keeps its per-unit path); an IV that is encrypted
(essiv) composes as the inner algorithm, dun(essiv(...),le), since the
encryption already lives in that inner template.

Why a template
--------------

  - No core cost for anyone. crypto_skcipher_encrypt/decrypt are stock;
    only a dun() tfm reads data_unit_size. (addresses Eric's "adds
    checks/overhead for everyone")

  - No capability flag. A hardware engine that handles a whole multi-DU
    request in one pass registers its own dun(xts(aes),le) at a higher
    cra_priority and is picked automatically exactly how
    xts-aes-aesni already beats generic xts. No CRYPTO_ALG_* bit, no
    core branch choosing native-vs-split. Such a native driver may also
    be async (it owns its dispatch); only the generic template is
    sync-only.

  - The split is in the algorithm. (the direction Herbert described)

  - It is the same kind of wrapper crypto/ already has. Like cryptd()
    (async dispatch) and pcrypt() (parallel dispatch), dun() wraps an
    inner skcipher and changes only how the request is dispatched 
    here, split across data units performing no cipher transform of
    its own.

  - It is a reusable primitive, not a dm-crypt feature. Two in-tree
    consumers are included: dm-crypt (patch 4) and blk-crypto-fallback
    (patch 5), which both open-code the per-DUN loop today; fscrypt's
    direct (non-inline) path open-codes the same loop and could follow.
    A HW engine is a provider via cra_priority. Consumers and providers
    are decoupled through one named algorithm.

What it does and does not buy
-----------------------------

On a software cipher this is not a throughput win: the generic template
still issues one inner encrypt per data unit, so the AES compute is
unchanged. It removes per-request overhead and the consumer's
open-coded per-unit loop, and is byte-for-byte identical to the
per-sector path (Verification). The win is for a one-pass provider; no
software throughput is claimed.

dm-crypt consumer (patch 4)
---------------------------

dm-crypt submits one request per contiguous bio segment with
data_unit_size = cc->sector_size (e.g. the default 512-byte sector with
a 4 KiB bio_vec -> one request of 8 data units), using only its existing
inline single-entry scatterlist no per-bio allocation, no regression.
It allocates dun(<cipher>,<endian>) instead of the bare cipher when the
config can form the DUN counter: a counter IV mode (plain64 -> le,
plain64be -> be; essiv/lmk/tcw etc. are not plain counters and stay
per-sector), single-tfm, non-aead, sector_size 512 or iv_large_sectors.
DM_CRYPT selects CRYPTO_DUN and the template resolves against a sync
inner, so there is no acceptable wrap failure the bare cipher would
survive; an integrity config keeps an inert dun() wrapper but never
batches (one inner call per request == the per-sector path).

blk-crypto-fallback consumer (patch 5)
--------------------------------------

Every blk-crypto inline-encryption mode feeds the DUN as a little-endian
counter, so the fallback wraps its cipher as dun(<cipher>,le)
unconditionally (BLK_INLINE_ENCRYPTION_FALLBACK selects CRYPTO_DUN).
Because the template handles any counter width up to 32 bytes, this
covers all four modes AES-256-XTS, AES-128-CBC-ESSIV, Adiantum
(32-byte IV) and SM4-XTS and the open-coded per-unit loop is removed
from both the encrypt and decrypt paths.

Verification
------------

Regression protocol in the tree, on x86 + arm64 under qemu: build clean
and checkpatch strict clean (the lone warning is the new-file
MAINTAINERS reminder; crypto/ is an F: catch-all); testmgr dun()
cross-check (batched == N x single-DU reference over a fragmented
scatterlist, plus a boundary-seeded IV that forces a carry across a
64-bit limb / byte run) for every accepted ivsize including 32 (Adiantum)
in BOTH dun(...,le) and dun(...,be), so the big-endian counter path is
exercised independently of any consumer; an AF_ALG probe forces the
dun() cross-check to run for each blk-crypto inner cipher
(dun(essiv(cbc(aes),sha256),le), dun(adiantum(xchacha12,aes),le), ...);
dm-crypt plain64/plain64be activate dun() (le/be), essiv / plain fall
back; negative gates (multikey and integrity not batched); plain64 and
plain64be round-trips and a 4096-byte iv_large_sectors round-trip;
low-memory; arm64 functional; an end-to-end blk-crypto-fallback test
(ext4 + fscrypt -o inlinecrypt with no inline HW, driving dun(xts,le)
and verifying a post-cache-drop round-trip); and byte-equivalence:
ciphertext is bit-identical to an unpatched axboe/for-next baseline
(sha256 4913910b...43efc0 le, da0869a9...63004 be).

Changes since v4
----------------

- The in-core auto-splitter and validator are gone; multi-DU dispatch is
  the dun(...) template. crypto_skcipher_encrypt/decrypt revert to
  stock, so there is no added cost on the core path.
- The CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU capability flag is dropped; a
  native one-pass driver is selected by cra_priority instead.
- The template is dun(<inner>,<endian>) in the cryptd()/pcrypt() family
  of dispatch-only wrappers; the counter endianness (le/be) is its
  second parameter, backed by a struct dun_mode op table so a future
  counter convention is one table row. It handles any counter width up
 to 32 bytes (covering Adiantum) and rejects a data_unit_size 0 /
 cryptlen 0 request.
- dm-crypt allocates dun(<cipher>,le|be) when eligible (selecting the IV
 mode before tfm allocation); plain64 -> le, plain64be -> be. An
  integrity config keeps an inert dun() wrapper but never batches.
  DM_CRYPT selects CRYPTO_DUN.
- blk-crypto-fallback is a second consumer (patch 5), demonstrating the
  template is a shared primitive, not dm-crypt-only; it wraps every mode
  as dun(<cipher>,le) and BLK_INLINE_ENCRYPTION_FALLBACK selects
  CRYPTO_DUN.
- testmgr exercises the template via dun(<inner>,le) and dun(<inner>,be),
  including ivsize 32 and a carry-boundary IV; an end-to-end fscrypt
  -o inlinecrypt test drives the blk-crypto-fallback consumer.

Leonid Ravich (5):
  crypto: skcipher - add per-request data_unit_size
  crypto: dun - data-unit-number dispatch template
  crypto: testmgr - test dun() dispatch
  dm crypt: batch a bio segment's sectors via dun()
  blk-crypto: fallback - batch a segment's data units via dun()

 block/Kconfig               |   1 +
 block/blk-crypto-fallback.c |  74 ++++----
 crypto/Kconfig              |  14 ++
 crypto/Makefile             |   1 +
 crypto/dun.c                | 359 ++++++++++++++++++++++++++++++++++++
 crypto/testmgr.c            | 289 +++++++++++++++++++++++++++++
 drivers/md/Kconfig          |   1 +
 drivers/md/dm-crypt.c       | 208 ++++++++++++++++-----
 include/crypto/skcipher.h   |  34 ++++
 9 files changed, 899 insertions(+), 82 deletions(-)
 create mode 100644 crypto/dun.c


base-commit: a8cafdf8c949f17c92eca0045532e88ac0dac30d
-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-30  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30  8:34 [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 1/5] crypto: skcipher - add per-request data_unit_size Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 2/5] crypto: dun - data-unit-number dispatch template Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 3/5] crypto: testmgr - test dun() dispatch Leonid Ravich
2026-06-30  8:34 ` [PATCH v5 4/5] dm crypt: batch a bio segment's sectors via dun() Leonid Ravich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox