Re: [PATCH v17 00/14] Improve write performance for zoned UFS devices

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Damien Le Moal <dlemoal@kernel.org>
To: Bart Van Assche <bvanassche@acm.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v17 00/14] Improve write performance for zoned UFS devices
Date: Thu, 23 Jan 2025 13:16:28 +0900	[thread overview]
Message-ID: <d9ef8666-4501-4913-98da-abbd1793fa42@kernel.org> (raw)
In-Reply-To: <3a94f3e2-20d9-4bac-9198-4df4a64a5277@acm.org>

On 1/22/25 6:57 AM, Bart Van Assche wrote:
> On 1/17/25 2:47 PM, Damien Le Moal wrote:
>> Also, patch 14 puts back a lot of the code that was recently removed. Not nice,
>> and its commit message is also far too short given the size of the patch. Please
>> spend more time explaining what the patches do and how they do it to facilitate
>> review.
> 
> Regarding the following part of patch 07/14:
> 
> -		disk_zone_wplug_abort(zwplug);
> -		zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE;
> +		if (!disk->queue->limits.driver_preserves_write_order)
> +			zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE;
> +		zwplug->flags |= BLK_ZONE_WPLUG_ERROR;
> 
> How about setting BLK_ZONE_WPLUG_ERROR only if
> disk->queue->limits.driver_preserves_write_order has been set such that
> the restored error handling code is only used if the storage controller
> preserves the write order? This should be sufficient to preserve the
> existing behavior of the blk-zoned.c code for SMR disks.

Your patch changes BLK_ZONE_WPLUG_NEED_WP_UPDATE into BLK_ZONE_WPLUG_ERROR for
regular (no write order preserving) zoned devices. I do not see why this is
needed. The main difference between this default behavior and what you are
trying to do is that BLK_ZONE_WPLUG_NEED_WP_UPDATE is to be expected with your
changes while it is NOT expected to ever happen for a regular coned device.

> The reason I restored the error handling code is that the code in 
> blk-zoned.c does not support error handling inside block drivers if QD > 1
> and because supporting the SCSI error handler is essential for UFS
> devices.

What error handling ? The only error handling we need is to tell tell the user
that there was an error and track that the user actually does something about
it. That tracking is done with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag. That's
it, nothing more.

I think that conflating/handling the expected (even if infrequent) write requeue
events with zoned UFS devices with actual hard device write errors (e.g. the
write failed because the device is toast or you hit a bad sector, or you have a
bad cable or whatever other hardware reason for the write to fail) is making a
mess of everything. Treating the requeues like hard errors makes things
complicated, but also wrong because even for zoned UFS devices, we may still get
hard write errors, and treating these in the same way as requeue event is wrong
in my opinion. For hard errors, the write must not be retried and the error
propagated immediately to the issuer (e.g. the FS).

For a requeue event, even though that is signaled initially at the bottom of the
stack by a device unit attention error, there is no reason for us to treat these
events as failed requests in the zone write plugging code. We need to have a
special treatment for these, e.g. a "ZONE_WRITE_NEED_RETRY" request flag (and
that flag can bhe set in the scsi mid-layer before calling into the block layer
requeue.

When the zone write plugging sees this flag, it can:
1) Stop issuing write BIOs since we know they will fail
2) Wait for all requests already issued and in-flight to come back
3) Restoore the zone write plug write pointer position to the lowest sector of
all requests that were requeued
4) Re-issue the requeued writes (after somehow sorting them)
5) Re-start issuing BIOs waiting in the write plug

And any write that is failed due to a hard error will still set the zone write
plug with BLK_ZONE_WPLUG_NEED_WP_UPDATE after aborting all BIOs that are pluggeg
in the zone write plug.

Now, I think that the biggest difficulty of this work is to make sure that a
write that fails with an unaligned write error due to a write requeue event
before it can be clearly differentiated from the same failure without the
requeue event before it. For the former, we can recover. For the latter, it is a
user bug and we must NOT hide it.

Can this be done cleanly ? I am not entirely sure about it because I am still
not 100% clear about the conditions that result in a zoned UFS device failing a
write with unit attention. As far as I can tell, the first write issued when the
device is sleeping will be rejected with that error and must be requeued. But
others write behind this failed write that are already in-flight will endup
failing with an unaligned write error, no ? Or will they be failed with a unit
attention too ? This is all unclear to me.

-- 
Damien Le Moal
Western Digital Research

next prev parent reply	other threads:[~2025-01-23  4:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15 22:46 [PATCH v17 00/14] Improve write performance for zoned UFS devices Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 01/14] block: Support block drivers that preserve the order of write requests Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 02/14] dm-linear: Report to the block layer that the write order is preserved Bart Van Assche
2025-01-17 18:04   ` Mikulas Patocka
2025-01-21 21:38     ` Bart Van Assche
2025-01-27 17:55       ` Mikulas Patocka
2025-01-15 22:46 ` [PATCH v17 03/14] block: Rework request allocation in blk_mq_submit_bio() Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 04/14] block: Support allocating from a specific software queue Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 05/14] blk-mq: Restore the zoned write order when requeuing Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 06/14] blk-zoned: Track the write pointer per zone Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 07/14] blk-zoned: Defer error handling Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 08/14] blk-zoned: Add an argument to blk_zone_plug_bio() Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 09/14] blk-zoned: Support pipelining of zoned writes Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 10/14] scsi: core: Retry unaligned " Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 11/14] scsi: sd: Increase retry count for " Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 12/14] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 13/14] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 14/14] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
2025-01-16  7:43   ` Can Guo
2025-01-16 15:58   ` Bao D. Nguyen
2025-01-23  0:52     ` Bart Van Assche
2025-01-17 22:47 ` [PATCH v17 00/14] Improve write performance for zoned UFS devices Damien Le Moal
2025-01-21 21:57   ` Bart Van Assche
2025-01-23  4:16     ` Damien Le Moal [this message]
2025-01-27 23:01       ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9ef8666-4501-4913-98da-abbd1793fa42@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox