From: Leonid Ravich <lravich@amazon.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alasdair Kergon <agk@redhat.com>,
Ard Biesheuvel <ardb@kernel.org>,
"Eric Biggers" <ebiggers@kernel.org>,
Jens Axboe <axboe@kernel.dk>, Horia Geanta <horia.geanta@nxp.com>,
Gilad Ben-Yossef <gilad@benyossef.com>,
<linux-crypto@vger.kernel.org>, <dm-devel@lists.linux.dev>,
<linux-block@vger.kernel.org>
Subject: [PATCH v4 3/3] dm crypt: batch all sectors of a bio per crypto request
Date: Mon, 15 Jun 2026 11:14:59 +0000 [thread overview]
Message-ID: <20260615111459.9452-4-lravich@amazon.com> (raw)
In-Reply-To: <20260615111459.9452-1-lravich@amazon.com>
Submit one skcipher request per bio with
skcipher_request_set_data_unit_size(req, cc->sector_size) instead of
issuing one request per sector. This removes per-sector overhead in
the crypto API hot path: request allocation, callback dispatch,
completion handling, and SG setup.
The optimisation is enabled automatically at table load when all
of the following hold:
- the cipher is non-aead (i.e. skcipher), sync, tfms_count 1;
- the IV mode advertises sector_iv_le128, i.e. its per-sector IV
advances as a 128-bit LE counter, matching the convention
documented in skcipher_request_set_data_unit_size(). Only plain64
sets it today (its 64-bit LE counter extends correctly); plain is
excluded as its 32-bit counter wraps differently across a
2^32-sector boundary;
- ivsize is 16 (the core rejects other sizes with -EOPNOTSUPP);
- the iv_gen_ops->post() hook is unset;
- dm-integrity is not stacked (no integrity tag or integrity IV).
The cipher driver does not need to advertise anything: the crypto
API auto-splits multi-data-unit requests for drivers that cannot
handle them natively, so dm-crypt sees the same fast batched
submission contract regardless of the underlying driver.
A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
time, gates the multi-data-unit dispatch. The existing per-sector
path in crypt_convert_block_skcipher() is unchanged; the new
crypt_convert_block_skcipher_multi() is reached from a small
dispatch in crypt_convert() and shares the same backlog/-EBUSY/
-EINPROGRESS flow control with the per-sector path.
Heap-allocated scatterlists are stashed in dm_crypt_request and
freed in crypt_free_req_skcipher() to avoid races between the
synchronous-success free path and async-completion reuse from the
request pool. On scatterlist allocation failure the helper returns
-EAGAIN, and the core returns -EOPNOTSUPP if a driver turns out
unable to do multi-DU; crypt_convert() handles both by clearing its
local multi_du flag and falling back to the per-sector path for the
rest of the current crypt_convert() invocation, ensuring forward progress
on the swap-out-to-dm-crypt path even under total memory exhaustion
(the per-sector path uses only cc->req_pool, a mempool with
reservoir set up at table-load time, and the inline
dmreq->sg_in[]/sg_out[] arrays — no allocation that could fail).
Verified end-to-end with a byte-equivalence test: encrypted output
of plain64 dm-crypt with the multi-data-unit path matches output of
the single-data-unit path bit-for-bit over a 256 MB device, with
xts-aes-aesni driving the auto-split path.
Signed-off-by: Leonid Ravich <lravich@amazon.com>
---
drivers/md/dm-crypt.c | 215 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 207 insertions(+), 8 deletions(-)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 608b617fb817..bfb98dd876d7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -101,6 +101,9 @@ struct dm_crypt_request {
struct scatterlist sg_in[4];
struct scatterlist sg_out[4];
u64 iv_sector;
+ /* Multi-data-unit SG arrays, NULL when sg_in[]/sg_out[] suffice. */
+ struct scatterlist *sg_in_ext;
+ struct scatterlist *sg_out_ext;
};
struct crypt_config;
@@ -115,6 +118,12 @@ struct crypt_iv_operations {
struct dm_crypt_request *dmreq);
void (*post)(struct crypt_config *cc, u8 *iv,
struct dm_crypt_request *dmreq);
+ /*
+ * The per-sector IV advances as a 128-bit LE counter, so a bio's
+ * consecutive sectors share one starting IV and can be batched into
+ * a single skcipher request via data_unit_size.
+ */
+ bool sector_iv_le128;
};
struct iv_benbi_private {
@@ -151,6 +160,7 @@ enum cipher_flags {
CRYPT_IV_LARGE_SECTORS, /* Calculate IV from sector_size, not 512B sectors */
CRYPT_ENCRYPT_PREPROCESS, /* Must preprocess data for encryption (elephant) */
CRYPT_KEY_MAC_SIZE_SET, /* The integrity_key_size option was used */
+ CRYPT_MULTI_DATA_UNIT, /* Batch all sectors of a bio per crypto request */
};
/*
@@ -1018,7 +1028,8 @@ static const struct crypt_iv_operations crypt_iv_plain_ops = {
};
static const struct crypt_iv_operations crypt_iv_plain64_ops = {
- .generator = crypt_iv_plain64_gen
+ .generator = crypt_iv_plain64_gen,
+ .sector_iv_le128 = true,
};
static const struct crypt_iv_operations crypt_iv_plain64be_ops = {
@@ -1426,12 +1437,126 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
return r;
}
+/*
+ * Submit all remaining sectors of the current bio in one skcipher request.
+ * Same return convention as crypt_convert_block_skcipher() except for
+ * -EAGAIN, which the caller must treat as "disable multi-DU and re-enter
+ * the per-sector path" so swap-out-to-dm-crypt always makes forward
+ * progress on the mempool reserve.
+ */
+static int crypt_convert_block_skcipher_multi(struct crypt_config *cc,
+ struct convert_context *ctx,
+ struct skcipher_request *req,
+ unsigned int *out_processed)
+{
+ const unsigned int sector_size = cc->sector_size;
+ const gfp_t gfp = GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN;
+ unsigned int total = ctx->iter_in.bi_size;
+ unsigned int n_sg_in = 0, n_sg_out = 0;
+ struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+ struct scatterlist *sg_in = NULL, *sg_out = NULL;
+ struct bvec_iter iter_in, iter_out;
+ struct bio_vec bv;
+ u8 *iv, *org_iv;
+ int r;
+
+ if (WARN_ON_ONCE(ctx->iter_in.bi_size != ctx->iter_out.bi_size))
+ return -EIO;
+ if (unlikely(total & (sector_size - 1)))
+ return -EIO;
+
+ iter_in = ctx->iter_in;
+ iter_in.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_in, iter_in, iter_in)
+ n_sg_in++;
+
+ iter_out = ctx->iter_out;
+ iter_out.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_out, iter_out, iter_out)
+ n_sg_out++;
+
+ sg_in = kmalloc_array(n_sg_in, sizeof(*sg_in), gfp);
+ sg_out = (ctx->bio_in == ctx->bio_out) ? sg_in :
+ kmalloc_array(n_sg_out, sizeof(*sg_out), gfp);
+ if (!sg_in || !sg_out) {
+ kfree(sg_in);
+ if (sg_out != sg_in)
+ kfree(sg_out);
+ return -EAGAIN;
+ }
+
+ sg_init_table(sg_in, n_sg_in);
+ {
+ unsigned int i = 0;
+
+ iter_in = ctx->iter_in;
+ iter_in.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_in, iter_in, iter_in)
+ sg_set_page(&sg_in[i++], bv.bv_page, bv.bv_len,
+ bv.bv_offset);
+ }
+
+ if (sg_out != sg_in) {
+ unsigned int i = 0;
+
+ sg_init_table(sg_out, n_sg_out);
+ iter_out = ctx->iter_out;
+ iter_out.bi_size = total;
+ __bio_for_each_bvec(bv, ctx->bio_out, iter_out, iter_out)
+ sg_set_page(&sg_out[i++], bv.bv_page, bv.bv_len,
+ bv.bv_offset);
+ }
+
+ dmreq->iv_sector = ctx->cc_sector;
+ if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+ dmreq->iv_sector >>= cc->sector_shift;
+ dmreq->ctx = ctx;
+
+ iv = iv_of_dmreq(cc, dmreq);
+ org_iv = org_iv_of_dmreq(cc, dmreq);
+ r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
+ if (r < 0)
+ goto out_free_sg;
+ memcpy(iv, org_iv, cc->iv_size);
+
+ dmreq->sg_in_ext = sg_in;
+ dmreq->sg_out_ext = (sg_out == sg_in) ? NULL : sg_out;
+
+ skcipher_request_set_crypt(req, sg_in, sg_out, total, iv);
+ skcipher_request_set_data_unit_size(req, sector_size);
+
+ if (bio_data_dir(ctx->bio_in) == WRITE)
+ r = crypto_skcipher_encrypt(req);
+ else
+ r = crypto_skcipher_decrypt(req);
+
+ /*
+ * Sync error: kcryptd_async_done won't run, so free the SG
+ * arrays here. Async returns (-EINPROGRESS, -EBUSY) hand
+ * ownership to the completion callback.
+ */
+ if (r && r != -EINPROGRESS && r != -EBUSY)
+ goto out_free_sg;
+
+ *out_processed = total;
+ return r;
+
+out_free_sg:
+ kfree(sg_in);
+ if (sg_out != sg_in)
+ kfree(sg_out);
+ dmreq->sg_in_ext = NULL;
+ dmreq->sg_out_ext = NULL;
+ return r;
+}
+
static void kcryptd_async_done(void *async_req, int error);
static int crypt_alloc_req_skcipher(struct crypt_config *cc,
struct convert_context *ctx)
{
unsigned int key_index = ctx->cc_sector & (cc->tfms_count - 1);
+ struct dm_crypt_request *dmreq;
if (!ctx->r.req) {
ctx->r.req = mempool_alloc(&cc->req_pool, in_interrupt() ? GFP_ATOMIC : GFP_NOIO);
@@ -1441,6 +1566,11 @@ static int crypt_alloc_req_skcipher(struct crypt_config *cc,
skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
+ /* Multi-DU SG arrays are owned by the helper that allocates them. */
+ dmreq = dmreq_of_req(cc, ctx->r.req);
+ dmreq->sg_in_ext = NULL;
+ dmreq->sg_out_ext = NULL;
+
/*
* Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
* requests if driver request queue is full.
@@ -1487,6 +1617,12 @@ static void crypt_free_req_skcipher(struct crypt_config *cc,
struct skcipher_request *req, struct bio *base_bio)
{
struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
+ struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+
+ kfree(dmreq->sg_in_ext);
+ dmreq->sg_in_ext = NULL;
+ kfree(dmreq->sg_out_ext);
+ dmreq->sg_out_ext = NULL;
if ((struct skcipher_request *)(io + 1) != req)
mempool_free(req, &cc->req_pool);
@@ -1515,7 +1651,9 @@ static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_
static blk_status_t crypt_convert(struct crypt_config *cc,
struct convert_context *ctx, bool atomic, bool reset_pending)
{
- unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+ const unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT;
+ bool multi_du = test_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+ unsigned int processed;
int r;
/*
@@ -1536,8 +1674,13 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
atomic_inc(&ctx->cc_pending);
+ processed = cc->sector_size;
if (crypt_integrity_aead(cc))
r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, ctx->tag_offset);
+ else if (multi_du)
+ r = crypt_convert_block_skcipher_multi(cc, ctx,
+ ctx->r.req,
+ &processed);
else
r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, ctx->tag_offset);
@@ -1559,8 +1702,19 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
* exit and continue processing in a workqueue
*/
ctx->r.req = NULL;
- ctx->tag_offset++;
- ctx->cc_sector += sector_step;
+ if (!multi_du) {
+ ctx->tag_offset++;
+ ctx->cc_sector += sector_step;
+ } else {
+ bio_advance_iter(ctx->bio_in,
+ &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out,
+ &ctx->iter_out,
+ processed);
+ ctx->cc_sector +=
+ processed >> SECTOR_SHIFT;
+ }
return BLK_STS_DEV_RESOURCE;
}
} else {
@@ -1574,19 +1728,41 @@ static blk_status_t crypt_convert(struct crypt_config *cc,
*/
case -EINPROGRESS:
ctx->r.req = NULL;
- ctx->tag_offset++;
- ctx->cc_sector += sector_step;
+ if (!multi_du) {
+ ctx->tag_offset++;
+ ctx->cc_sector += sector_step;
+ } else {
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+ processed);
+ ctx->cc_sector += processed >> SECTOR_SHIFT;
+ }
continue;
/*
* The request was already processed (synchronously).
*/
case 0:
atomic_dec(&ctx->cc_pending);
- ctx->cc_sector += sector_step;
- ctx->tag_offset++;
+ if (!multi_du) {
+ ctx->cc_sector += sector_step;
+ ctx->tag_offset++;
+ } else {
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+ processed);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+ processed);
+ ctx->cc_sector += processed >> SECTOR_SHIFT;
+ }
if (!atomic)
cond_resched();
continue;
+ /* Multi-DU rejected (no memory or sync-only mismatch): fall back. */
+ case -EAGAIN:
+ case -EOPNOTSUPP:
+ atomic_dec(&ctx->cc_pending);
+ multi_du = false;
+ continue;
/*
* There was a data integrity error.
*/
@@ -3063,6 +3239,29 @@ static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
}
}
+ /*
+ * Enable multi-data-unit batching only when per-DU IVs can be
+ * derived from one starting IV as a 128-bit LE counter, matching
+ * skcipher_request_set_data_unit_size(). Only IV modes flagged
+ * sector_iv_le128 qualify (plain64; not plain, whose 32-bit counter
+ * wraps differently across a 2^32-sector boundary). ivsize must be
+ * 16 (the core rejects otherwise) and the cipher must be sync,
+ * single-tfm, no integrity, no per-sector post() hook. The driver
+ * advertises nothing: the core auto-splits for drivers that lack
+ * native support.
+ */
+ if (!crypt_integrity_aead(cc) && cc->tfms_count == 1 &&
+ cc->iv_gen_ops && cc->iv_gen_ops->sector_iv_le128 &&
+ !cc->iv_gen_ops->post &&
+ !cc->integrity_tag_size && !cc->integrity_iv_size &&
+ crypto_skcipher_ivsize(any_tfm(cc)) == 16 &&
+ !(crypto_skcipher_alg(any_tfm(cc))->base.cra_flags &
+ CRYPTO_ALG_ASYNC)) {
+ set_bit(CRYPT_MULTI_DATA_UNIT, &cc->cipher_flags);
+ DMINFO("Using multi-data-unit crypto offload (du=%u)",
+ cc->sector_size);
+ }
+
/* wipe the kernel key payload copy */
if (cc->key_string)
memset(cc->key, 0, cc->key_size * sizeof(u8));
--
2.47.3
next prev parent reply other threads:[~2026-06-15 11:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 11:14 [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 1/3] crypto: skcipher - add per-request data_unit_size with auto-splitting Leonid Ravich
2026-06-15 11:14 ` [PATCH v4 2/3] crypto: testmgr - test for multi-data-unit dispatch Leonid Ravich
2026-06-15 11:14 ` Leonid Ravich [this message]
2026-06-15 22:53 ` [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching Eric Biggers
2026-06-16 4:13 ` Herbert Xu
2026-06-16 4:50 ` Eric Biggers
2026-06-16 4:53 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260615111459.9452-4-lravich@amazon.com \
--to=lravich@amazon.com \
--cc=agk@redhat.com \
--cc=ardb@kernel.org \
--cc=axboe@kernel.dk \
--cc=dm-devel@lists.linux.dev \
--cc=ebiggers@kernel.org \
--cc=gilad@benyossef.com \
--cc=herbert@gondor.apana.org.au \
--cc=horia.geanta@nxp.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.