public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io
@ 2025-09-18 16:16 Keith Busch
  2025-09-18 20:13 ` Keith Busch
  2025-09-18 20:27 ` Mike Snitzer
  0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2025-09-18 16:16 UTC (permalink / raw)
  To: dm-devel; +Cc: snitzer, linux-block, mpatocka, ebiggers, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Most storage devices can handle DMA for data that is not aligned to the
sector block size. The block and filesystem layers have introduced
updates to allow that kind of memory alignment flexibility when
possible.

dm-crypt, however, currently constrains itself to aligned memory because
it sends a single scatterlist element for the input ot the encrypt and
decrypt algorithms. This forces applications that have unaligned data to
copy through a bounce buffer, increasing CPU and memory utilization.

It appears to be a pretty straight forward thing to modify for skcipher
since there are 3 unused scatterlist elements immediately available. In
practice, that should be enough as the sector granularity of data
generally doesn't straddle more than one page, if at all.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/md/dm-crypt.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 5ef43231fe77f..f860716b7a5c1 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1429,18 +1429,14 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 					struct skcipher_request *req,
 					unsigned int tag_offset)
 {
-	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
 	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+	unsigned int bytes = cc->sector_size;
 	struct scatterlist *sg_in, *sg_out;
 	struct dm_crypt_request *dmreq;
 	u8 *iv, *org_iv, *tag_iv;
 	__le64 *sector;
 	int r = 0;
 
-	/* Reject unexpected unaligned bio. */
-	if (unlikely(bv_in.bv_len & (cc->sector_size - 1)))
-		return -EIO;
-
 	dmreq = dmreq_of_req(cc, req);
 	dmreq->iv_sector = ctx->cc_sector;
 	if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
@@ -1457,11 +1453,24 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	*sector = cpu_to_le64(ctx->cc_sector - cc->iv_offset);
 
 	/* For skcipher we use only the first sg item */
-	sg_in  = &dmreq->sg_in[0];
 	sg_out = &dmreq->sg_out[0];
 
-	sg_init_table(sg_in, 1);
-	sg_set_page(sg_in, bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
+	do {
+		struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
+		int len = min(bytes, bv_in.bv_len);
+
+		if (r >= ARRAY_SIZE(dmreq->sg_in))
+			return -EINVAL;
+
+		sg_in = &dmreq->sg_in[r++];
+		memset(sg_in, 0, sizeof(*sg_in));
+		sg_set_page(sg_in, bv_in.bv_page, len, bv_in.bv_offset);
+		bio_advance_iter_single(ctx->bio_in, &ctx->iter_in, len);
+		bytes -= len;
+	} while (bytes);
+
+	sg_mark_end(sg_in);
+	sg_in = dmreq->sg_in[0];
 
 	sg_init_table(sg_out, 1);
 	sg_set_page(sg_out, bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
@@ -1495,7 +1504,6 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
 		r = cc->iv_gen_ops->post(cc, org_iv, dmreq);
 
-	bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size);
 	bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size);
 
 	return r;
@@ -3750,7 +3758,8 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	limits->physical_block_size =
 		max_t(unsigned int, limits->physical_block_size, cc->sector_size);
 	limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size);
-	limits->dma_alignment = limits->logical_block_size - 1;
+	if (crypt_integrity_aead(cc))
+		limits->dma_alignment = limits->logical_block_size - 1;
 
 	/*
 	 * For zoned dm-crypt targets, there will be no internal splitting of
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io
  2025-09-18 16:16 [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io Keith Busch
@ 2025-09-18 20:13 ` Keith Busch
  2025-09-26 14:19   ` Mikulas Patocka
  2025-09-18 20:27 ` Mike Snitzer
  1 sibling, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-09-18 20:13 UTC (permalink / raw)
  To: Keith Busch; +Cc: dm-devel, snitzer, linux-block, mpatocka, ebiggers

On Thu, Sep 18, 2025 at 09:16:42AM -0700, Keith Busch wrote:
> +		bio_advance_iter_single(ctx->bio_in, &ctx->iter_in, len);
> +		bytes -= len;
> +	} while (bytes);
> +
> +	sg_mark_end(sg_in);
> +	sg_in = dmreq->sg_in[0];

Err, there should be an '&' in there, "&dmreq->sg_in[0];"

By the way, I only tested plain64 for the ivmode. That appears to work
fine, but I am aware this will not be successful with elephant, lmk, or
tcw. So just an RFC for now to see if it's worth pursuing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io
  2025-09-18 16:16 [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io Keith Busch
  2025-09-18 20:13 ` Keith Busch
@ 2025-09-18 20:27 ` Mike Snitzer
  2025-09-18 20:52   ` Keith Busch
  1 sibling, 1 reply; 6+ messages in thread
From: Mike Snitzer @ 2025-09-18 20:27 UTC (permalink / raw)
  To: Keith Busch; +Cc: dm-devel, linux-block, mpatocka, ebiggers, Keith Busch

On Thu, Sep 18, 2025 at 09:16:42AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Most storage devices can handle DMA for data that is not aligned to the
> sector block size. The block and filesystem layers have introduced
> updates to allow that kind of memory alignment flexibility when
> possible.

I'd love to understand what changes in filesystems you're referring
to.  Because I know for certain that DIO with memory that isn't
'dma_alignment' aligned fails with certainty ontop of XFS.

Pretty certain it balks at DIO that isn't logical_block_size aligned
ondisk too.

> dm-crypt, however, currently constrains itself to aligned memory because
> it sends a single scatterlist element for the input ot the encrypt and
> decrypt algorithms. This forces applications that have unaligned data to
> copy through a bounce buffer, increasing CPU and memory utilization.

Even this notion that an application is somehow able to (unwittingly)
lean on "unaligned data to copy through a bounce buffer" -- has me
asking: where does Keith get these wonderful toys?

Anyway, just asking these things because if they were true I wouldn't
be needing to add specialized code to NFSD and NFS to handle
misaligned DIO.

> It appears to be a pretty straight forward thing to modify for skcipher
> since there are 3 unused scatterlist elements immediately available. In
> practice, that should be enough as the sector granularity of data
> generally doesn't straddle more than one page, if at all.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/md/dm-crypt.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 5ef43231fe77f..f860716b7a5c1 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1429,18 +1429,14 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
>  					struct skcipher_request *req,
>  					unsigned int tag_offset)
>  {
> -	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
>  	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
> +	unsigned int bytes = cc->sector_size;
>  	struct scatterlist *sg_in, *sg_out;
>  	struct dm_crypt_request *dmreq;
>  	u8 *iv, *org_iv, *tag_iv;
>  	__le64 *sector;
>  	int r = 0;
>  
> -	/* Reject unexpected unaligned bio. */
> -	if (unlikely(bv_in.bv_len & (cc->sector_size - 1)))
> -		return -EIO;
> -
>  	dmreq = dmreq_of_req(cc, req);
>  	dmreq->iv_sector = ctx->cc_sector;
>  	if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
> @@ -1457,11 +1453,24 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
>  	*sector = cpu_to_le64(ctx->cc_sector - cc->iv_offset);
>  
>  	/* For skcipher we use only the first sg item */
> -	sg_in  = &dmreq->sg_in[0];
>  	sg_out = &dmreq->sg_out[0];
>  
> -	sg_init_table(sg_in, 1);
> -	sg_set_page(sg_in, bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
> +	do {
> +		struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
> +		int len = min(bytes, bv_in.bv_len);
> +
> +		if (r >= ARRAY_SIZE(dmreq->sg_in))
> +			return -EINVAL;
> +
> +		sg_in = &dmreq->sg_in[r++];
> +		memset(sg_in, 0, sizeof(*sg_in));
> +		sg_set_page(sg_in, bv_in.bv_page, len, bv_in.bv_offset);
> +		bio_advance_iter_single(ctx->bio_in, &ctx->iter_in, len);
> +		bytes -= len;
> +	} while (bytes);
> +
> +	sg_mark_end(sg_in);
> +	sg_in = dmreq->sg_in[0];
>  
>  	sg_init_table(sg_out, 1);
>  	sg_set_page(sg_out, bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
> @@ -1495,7 +1504,6 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
>  	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
>  		r = cc->iv_gen_ops->post(cc, org_iv, dmreq);
>  
> -	bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size);
>  	bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size);
>  
>  	return r;
> @@ -3750,7 +3758,8 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  	limits->physical_block_size =
>  		max_t(unsigned int, limits->physical_block_size, cc->sector_size);
>  	limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size);
> -	limits->dma_alignment = limits->logical_block_size - 1;
> +	if (crypt_integrity_aead(cc))
> +		limits->dma_alignment = limits->logical_block_size - 1;
>  
>  	/*
>  	 * For zoned dm-crypt targets, there will be no internal splitting of
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io
  2025-09-18 20:27 ` Mike Snitzer
@ 2025-09-18 20:52   ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2025-09-18 20:52 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Keith Busch, dm-devel, linux-block, mpatocka, ebiggers

On Thu, Sep 18, 2025 at 04:27:14PM -0400, Mike Snitzer wrote:
> On Thu, Sep 18, 2025 at 09:16:42AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Most storage devices can handle DMA for data that is not aligned to the
> > sector block size. The block and filesystem layers have introduced
> > updates to allow that kind of memory alignment flexibility when
> > possible.
> 
> I'd love to understand what changes in filesystems you're referring
> to.  Because I know for certain that DIO with memory that isn't
> 'dma_alignment' aligned fails with certainty ontop of XFS.

I only mentioned the "sector block size" alignment, not the hardware dma
alignment. The dma alignment remains the minimum address alignment you
have to use. But xfs has been able to handle dword aligned addresses for
a while now, assuming your block_device can handle dword aligned DMA.
But the old requirement for a buffer to be aligned to a 4k address
offset for a 4k block device isn't necessary anymore.
 
> Pretty certain it balks at DIO that isn't logical_block_size aligned
> ondisk too.

We might be talking about different things. The total length of a
vector must be a multiple of the logical block size, yes. But I'm
talking about the address offset. Right now dm-crypt can't handle a
request if the address offset is not aligned to the logical block size.
But that's a purely software created limitation, there's no hard reason
that needs to be the case.

> > it sends a single scatterlist element for the input ot the encrypt and
> > decrypt algorithms. This forces applications that have unaligned data to
> > copy through a bounce buffer, increasing CPU and memory utilization.
> 
> Even this notion that an application is somehow able to (unwittingly)
> lean on "unaligned data to copy through a bounce buffer" -- has me
> asking: where does Keith get these wonderful toys?

I'm just trying to write data to disk from buffers filled by NICs
subscribed to io_uring zero-copy receive capabilities. I guess they need
fancy features to do that, but it's not that uncommon, is it? Anyway,
the data that needs to be persisted often have offsets that are still
DMA friendly, but unlikely to be perfectly page aligned.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io
  2025-09-18 20:13 ` Keith Busch
@ 2025-09-26 14:19   ` Mikulas Patocka
  2025-09-26 16:17     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Mikulas Patocka @ 2025-09-26 14:19 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, dm-devel, snitzer, linux-block, ebiggers



On Thu, 18 Sep 2025, Keith Busch wrote:

> On Thu, Sep 18, 2025 at 09:16:42AM -0700, Keith Busch wrote:
> > +		bio_advance_iter_single(ctx->bio_in, &ctx->iter_in, len);
> > +		bytes -= len;
> > +	} while (bytes);
> > +
> > +	sg_mark_end(sg_in);
> > +	sg_in = dmreq->sg_in[0];
> 
> Err, there should be an '&' in there, "&dmreq->sg_in[0];"
> 
> By the way, I only tested plain64 for the ivmode. That appears to work
> fine, but I am aware this will not be successful with elephant, lmk, or
> tcw. So just an RFC for now to see if it's worth pursuing.

Hi

I'd like to ask - how much does it help performance? How many percent 
faster does your application run?

Another question - what if the user uses preadv or pwritev with direct I/O 
and uses more than 4 sg lists? Will this be rejected in the upper layers, 
or will it reach dm-crypt and return -EINVAL? Note that returning error 
from dm-crypt may be quite problematic, because it would kick the leg out 
of RAID, if there were RAID above dm-crypt. I think that we should return 
BLK_STS_NOTSUPP, because that would be ignored by RAID.

I am considering committing this for the kernel 6.19 (it's too late to add 
it to the 6.18 merge window).

Mikulas


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io
  2025-09-26 14:19   ` Mikulas Patocka
@ 2025-09-26 16:17     ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2025-09-26 16:17 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Keith Busch, dm-devel, snitzer, linux-block, ebiggers

On Fri, Sep 26, 2025 at 04:19:58PM +0200, Mikulas Patocka wrote:
> 
> I'd like to ask - how much does it help performance? How many percent 
> faster does your application run?

The best info I have from the storage team I'm working with is this
reduces their application's memory bandwidth utilization by a little
over 10%.

> Another question - what if the user uses preadv or pwritev with direct I/O 
> and uses more than 4 sg lists? Will this be rejected in the upper layers, 
> or will it reach dm-crypt and return -EINVAL? 

I believe it would reach dm-crypt with this patch.

If you tried today, it should get rejected by the upper layers for
unalignement. But there are some changes pending in the 6.18 block tree
that defer the alignment detection to later such that dm-crypt may have
to deal with this and fail the IO (unless something higher splits the
bio to its queue limits first), so sounds like I should sort that out.

Regarding the 4 scatterlist limit, we're using 4k sized logical blocks.
The incoming data typically has an offset crossing a page boundary, so
needs 2 pages for each block. Just 2 scatterlists would do the trick.

If we really want to remove all software constraints here, then we could
increase the pre-allocated scatterlist size, or dynamically allocate an
approrpiate scatter gather table per-io. I guess we'd have to expand the
list somewhere if this needs to work with aead, too.

> Note that returning error 
> from dm-crypt may be quite problematic, because it would kick the leg out 
> of RAID, if there were RAID above dm-crypt. I think that we should return 
> BLK_STS_NOTSUPP, because that would be ignored by RAID.

Thanks, I'll try this out.

> I am considering committing this for the kernel 6.19 (it's too late to add 
> it to the 6.18 merge window).

No problem, I don't want to rush this. Your reply is at least
encouraging enough for me to sort out more than just "plaint64" with
this proposal.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-26 16:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 16:16 [RFC PATCH] dm-crypt: allow unaligned bio_vecs for direct io Keith Busch
2025-09-18 20:13 ` Keith Busch
2025-09-26 14:19   ` Mikulas Patocka
2025-09-26 16:17     ` Keith Busch
2025-09-18 20:27 ` Mike Snitzer
2025-09-18 20:52   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox