Re: [PATCH v2 3/4] dm: dm-crypt: Do not split write operations with zoned targets

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Damien Le Moal <dlemoal@kernel.org>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@kernel.org>,
	Bart Van Assche <bvanassche@acm.org>
Subject: Re: [PATCH v2 3/4] dm: dm-crypt: Do not split write operations with zoned targets
Date: Wed, 25 Jun 2025 21:54:48 +0900	[thread overview]
Message-ID: <07d71ad1-a6de-474b-bee4-a64180284802@kernel.org> (raw)
In-Reply-To: <96831fd8-da1e-771f-7d19-8087d29f2af1@redhat.com>

On 6/25/25 19:19, Mikulas Patocka wrote:
> 
> 
> On Wed, 25 Jun 2025, Damien Le Moal wrote:
> 
>> Read and write operations issued to a dm-crypt target may be split
>> according to the dm-crypt internal limits defined by the max_read_size
>> and max_write_size module parameters (default is 128 KB). The intent is
>> to improve processing time of large BIOs by splitting them into smaller
>> operations that can be parallelized on different CPUs.
>>
>> For zoned dm-crypt targets, this BIO splitting is still done but without
>> the parallel execution to ensure that the issuing order of write
>> operations to the underlying devices remains sequential. However, the
>> splitting itself causes other problems:
>>
>> 1) Since dm-crypt relies on the block layer zone write plugging to
>>    handle zone append emulation using regular write operations, the
>>    reminder of a split write BIO will always be plugged into the target
>>    zone write plugged. Once the on-going write BIO finishes, this
>>    reminder BIO is unplugged and issued from the zone write plug work.
>>    If this reminder BIO itself needs to be split, the reminder will be
>>    re-issued and plugged again, but that causes a call to a
>>    blk_queue_enter(), which may block if a queue freeze operation was
>>    initiated. This results in a deadlock as DM submission still holds
>>    BIOs that the queue freeze side is waiting for.
>>
>> 2) dm-crypt relies on the emulation done by the block layer using
>>    regular write operations for processing zone append operations. This
>>    still requires to properly return the written sector as the BIO
>>    sector of the original BIO. However, this can be done correctly only
>>    and only if there is a single clone BIO used for processing the
>>    original zone append operation issued by the user. If the size of a
>>    zone append operation is larger than dm-crypt max_write_size, then
>>    the orginal BIO will be split and processed as a chain of regular
>>    write operations. Such chaining result in an incorrect written sector
>>    being returned to the zone append issuer using the original BIO
>>    sector.  This in turn results in file system data corruptions using
>>    xfs or btrfs.
>>
>> Fix this by modifying get_max_request_size() to always return the size
>> of the BIO to avoid it being split with dm_accpet_partial_bio() in
>> crypt_map(). get_max_request_size() is renamed to
>> get_max_request_sectors() to clarify the unit of the value returned
>> and its interface is changed to take a struct dm_target pointer and a
>> pointer to the struct bio being processed. In addition to this change,
>> to ensure that crypt_alloc_buffer() works correctly, set the dm-crypt
>> device max_hw_sectors limit to be at most
>> BIO_MAX_VECS << PAGE_SECTORS_SHIFT (1 MB with a 4KB page architecture).
>> This forces DM core to split write BIOs before passing them to
>> crypt_map(), and thus guaranteeing that dm-crypt can always accept an
>> entire write BIO without needing to split it.
>>
>> This change does not have any effect on the read path of dm-crypt. Read
>> operations can still be split and the BIO fragments processed in
>> parallel. There is also no impact on the performance of the write path
>> given that all zone write BIOs were already processed inline instead of
>> in parallel.
>>
>> This change also does not affect in any way regular dm-crypt block
>> devices.
>>
>> Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>  drivers/md/dm-crypt.c | 49 ++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 39 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>> index 17157c4216a5..4e80784d1734 100644
>> --- a/drivers/md/dm-crypt.c
>> +++ b/drivers/md/dm-crypt.c
>> @@ -253,17 +253,35 @@ MODULE_PARM_DESC(max_read_size, "Maximum size of a read request");
>>  static unsigned int max_write_size = 0;
>>  module_param(max_write_size, uint, 0644);
>>  MODULE_PARM_DESC(max_write_size, "Maximum size of a write request");
>> -static unsigned get_max_request_size(struct crypt_config *cc, bool wrt)
>> +
>> +static unsigned get_max_request_sectors(struct dm_target *ti, struct bio *bio)
>>  {
>> +	struct crypt_config *cc = ti->private;
>>  	unsigned val, sector_align;
>> -	val = !wrt ? READ_ONCE(max_read_size) : READ_ONCE(max_write_size);
>> -	if (likely(!val))
>> -		val = !wrt ? DM_CRYPT_DEFAULT_MAX_READ_SIZE : DM_CRYPT_DEFAULT_MAX_WRITE_SIZE;
>> -	if (wrt || cc->used_tag_size) {
>> -		if (unlikely(val > BIO_MAX_VECS << PAGE_SHIFT))
>> -			val = BIO_MAX_VECS << PAGE_SHIFT;
>> -	}
>> -	sector_align = max(bdev_logical_block_size(cc->dev->bdev), (unsigned)cc->sector_size);
>> +	bool wrt = op_is_write(bio_op(bio));
>> +
>> +	if (wrt) {
>> +		/*
>> +		 * For zoned devices, splitting write operations creates the
>> +		 * risk of deadlocking queue freeze operations with zone write
>> +		 * plugging BIO work when the reminder of a split BIO is
>> +		 * issued. So always allow the entire BIO to proceed.
>> +		 */
>> +		if (ti->emulate_zone_append)
>> +			return bio_sectors(bio);
> 
> The overrun may still happen (if the user changes the dm table while some 
> bio is in progress) and if it happens, you should terminate the bio with 
> DM_MAPIO_KILL (like it was in my original patch).

I am confused... Overrun against what ? We are now completely ignoring the
max_write_size limit so even if the user changes it, that will not affect the
BIO processing. If you are referring to an overrun against the zoned device
max_hw_sectors limit, it is not possible since changing limits is done with the
DM device queue frozen, so we are guaranteed that there will be no BIO in-flight.

I am not sure about what kind of table change you are thinking of, but at the
very least,  dm_table_supports_size_change() ensure that there cannot be any
device size change for a zoned DM device. And given the above point about limits
changes, I do not see how a table change can affect the BIO execution.

Do you have a specific example in mind ?

Or is it maybe the if condition that is confusing ?

	if (ti->emulate_zone_append)

applies to the target, so *all* write operations (emulated zone append writes
and regular writes) will be handled by this and bio_sectors(bio) returned, thus
avoiding a split for all write operations. Maybe using:

	if (bdev_is_zoned(bio->bi_bdev))

would be clearer ?

-- 
Damien Le Moal
Western Digital Research

next prev parent reply	other threads:[~2025-06-25 12:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25  5:59 [PATCH v2 0/4] Fix write operation handling for zoned DM devices Damien Le Moal
2025-06-25  5:59 ` [PATCH v2 1/4] block: Introduce bio_needs_zone_write_plugging() Damien Le Moal
2025-06-25  6:12   ` Christoph Hellwig
2025-06-25  6:14     ` Damien Le Moal
2025-06-25  6:18       ` Christoph Hellwig
2025-06-25  6:17         ` Damien Le Moal
2025-06-25  5:59 ` [PATCH v2 2/4] dm: Always split write BIOs to zoned device limits Damien Le Moal
2025-06-25  6:15   ` Christoph Hellwig
2025-06-25  6:15     ` Damien Le Moal
2025-06-25  5:59 ` [PATCH v2 3/4] dm: dm-crypt: Do not split write operations with zoned targets Damien Le Moal
2025-06-25 10:19   ` Mikulas Patocka
2025-06-25 12:54     ` Damien Le Moal [this message]
2025-06-25 14:03       ` Mikulas Patocka
2025-06-25 14:43         ` Damien Le Moal
2025-06-25 16:33           ` Mikulas Patocka
2025-06-25  5:59 ` [PATCH v2 4/4] dm: Check for forbidden splitting of zone write operations Damien Le Moal
2025-06-25  6:16   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07d71ad1-a6de-474b-bee4-a64180284802@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-block@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).