From: Bart Van Assche <bvanassche@acm.org>
To: Damien Le Moal <dlemoal@kernel.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v17 00/14] Improve write performance for zoned UFS devices
Date: Mon, 27 Jan 2025 15:01:22 -0800 [thread overview]
Message-ID: <485f7749-63c4-4d2c-9e9e-6bbedd75604b@acm.org> (raw)
In-Reply-To: <d9ef8666-4501-4913-98da-abbd1793fa42@kernel.org>
On 1/22/25 8:16 PM, Damien Le Moal wrote:
> On 1/22/25 6:57 AM, Bart Van Assche wrote:
>> The reason I restored the error handling code is that the code in
>> blk-zoned.c does not support error handling inside block drivers if QD > 1
>> and because supporting the SCSI error handler is essential for UFS
>> devices.
>
> What error handling ?
The SCSI error handler. If the SCSI error handler is invoked while only
one request is outstanding, there is no chance of reordering and no
additional measures are necessary. If multiple requests are outstanding
and the SCSI error handler is invoked, care must be taken that the
pending requests are resubmitted in LBA order per zone. This is not
possible without blocking zoned write processing until all pending
zoned writes have completed. Hence the need to set BLK_ZONE_WPLUG_ERROR
if write pipelining is enabled and a request is requeued.
> I think that conflating/handling the expected (even if infrequent) write requeue
> events with zoned UFS devices with actual hard device write errors (e.g. the
> write failed because the device is toast or you hit a bad sector, or you have a
> bad cable or whatever other hardware reason for the write to fail) is making a
> mess of everything. Treating the requeues like hard errors makes things
> complicated, but also wrong because even for zoned UFS devices, we may still get
> hard write errors, and treating these in the same way as requeue event is wrong
> in my opinion. For hard errors, the write must not be retried and the error
> propagated immediately to the issuer (e.g. the FS).
>
> For a requeue event, even though that is signaled initially at the bottom of the
> stack by a device unit attention error, there is no reason for us to treat these
> events as failed requests in the zone write plugging code. We need to have a
> special treatment for these, e.g. a "ZONE_WRITE_NEED_RETRY" request flag (and
> that flag can bhe set in the scsi mid-layer before calling into the block layer
> requeue.
>
> When the zone write plugging sees this flag, it can:
> 1) Stop issuing write BIOs since we know they will fail
> 2) Wait for all requests already issued and in-flight to come back
> 3) Restoore the zone write plug write pointer position to the lowest sector of
> all requests that were requeued
> 4) Re-issue the requeued writes (after somehow sorting them)
> 5) Re-start issuing BIOs waiting in the write plug
>
> And any write that is failed due to a hard error will still set the zone write
> plug with BLK_ZONE_WPLUG_NEED_WP_UPDATE after aborting all BIOs that are pluggeg
> in the zone write plug.
>
> Now, I think that the biggest difficulty of this work is to make sure that a
> write that fails with an unaligned write error due to a write requeue event
> before it can be clearly differentiated from the same failure without the
> requeue event before it. For the former, we can recover. For the latter, it is a
> user bug and we must NOT hide it.
>
> Can this be done cleanly ? I am not entirely sure about it because I am still
> not 100% clear about the conditions that result in a zoned UFS device failing a
> write with unit attention. As far as I can tell, the first write issued when the
> device is sleeping will be rejected with that error and must be requeued. But
> others write behind this failed write that are already in-flight will endup
> failing with an unaligned write error, no ? Or will they be failed with a unit
> attention too ? This is all unclear to me.
I propose to keep the approach of the code in blk-zoned.c independent of
the details of how UFS devices work. That will make it more likely that
the blk-zoned.c code remains generic and that it can be reused for other
storage protocols.
Although I have not yet observed reordering due to a unit attention, I
think that unit attentions should be supported because unit attentions
are such an important SCSI mechanism.
Since UFSHCI 3 controllers use a 32-bit register in the UFSHCI
controller for indicating which requests are in flight, reordering of
requests is likely to happen if a UFSHCI controller comes out of the
suspended state. When a UFSHCI controller leaves the suspended state,
it scans that 32-bit register from the lowest to the highest bit and
hence ordering information that was provided by the host is lost. It
seems to me that the easiest way to address this reordering is by
resubmitting requests once in case the controller reordered the requests
due to leaving the suspended state. Please note that UFSHCI 3
controllers are not my primary focus and that I would be fine with not
supporting write pipelining for these controllers if there would be
disagreement about this aspect. UFSHCI 4 controllers have proper
submission queues in host memory, just like NVMe controllers, and hence
request ordering information for submission queues is not lost if the
controller is suspended and resumed.
The SCSI error handler can also cause reordering. If e.g. the UFS
controller reports a CRC error then the UFS driver will reset the UFS
host controller and activate the SCSI error handler. Pending writes must
be resubmitted in LBA order per zone after the SCSI error handler has
finished to prevent that write errors propagate to the file system and
to user space.
At the time a UFS device reports an unaligned write error, I don't think
that it is possible to determine whether or not this is a retryable
error. So instead of introducing a ZONE_WRITE_NEED_RETRY flag, I propose
the following:
* Use BLK_ZONE_WPLUG_ERROR for both hard and soft errors. Hard errors
means errors that shouldn't be retried. Soft errors means errors that
are retryable.
* After all pending writes have completed and before resubmitting any
zoned writes, check whether or not these should be retried. For UFS
devices this can be done by comparing the wp_offset_compl member with
the lowest LBA of the pending zoned writes.
Do I remember correctly that residual information is not always reliable
for zoned writes to SMR disks and hence that another approach is needed
for SMR disks? How about setting BLK_ZONE_WPLUG_NEED_WP_UPDATE for SMR
disks instead of resubmitting writes after an unaligned write has been
reported by the storage device?
Thanks,
Bart.
prev parent reply other threads:[~2025-01-27 23:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 22:46 [PATCH v17 00/14] Improve write performance for zoned UFS devices Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 01/14] block: Support block drivers that preserve the order of write requests Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 02/14] dm-linear: Report to the block layer that the write order is preserved Bart Van Assche
2025-01-17 18:04 ` Mikulas Patocka
2025-01-21 21:38 ` Bart Van Assche
2025-01-27 17:55 ` Mikulas Patocka
2025-01-15 22:46 ` [PATCH v17 03/14] block: Rework request allocation in blk_mq_submit_bio() Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 04/14] block: Support allocating from a specific software queue Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 05/14] blk-mq: Restore the zoned write order when requeuing Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 06/14] blk-zoned: Track the write pointer per zone Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 07/14] blk-zoned: Defer error handling Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 08/14] blk-zoned: Add an argument to blk_zone_plug_bio() Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 09/14] blk-zoned: Support pipelining of zoned writes Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 10/14] scsi: core: Retry unaligned " Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 11/14] scsi: sd: Increase retry count for " Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 12/14] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 13/14] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2025-01-15 22:46 ` [PATCH v17 14/14] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
2025-01-16 7:43 ` Can Guo
2025-01-16 15:58 ` Bao D. Nguyen
2025-01-23 0:52 ` Bart Van Assche
2025-01-17 22:47 ` [PATCH v17 00/14] Improve write performance for zoned UFS devices Damien Le Moal
2025-01-21 21:57 ` Bart Van Assche
2025-01-23 4:16 ` Damien Le Moal
2025-01-27 23:01 ` Bart Van Assche [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=485f7749-63c4-4d2c-9e9e-6bbedd75604b@acm.org \
--to=bvanassche@acm.org \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox