From: Damien Le Moal <dlemoal@kernel.org>
To: Bart Van Assche <bvanassche@acm.org>, Jens Axboe <axboe@kernel.dk>
Cc: Hannes Reinecke <hare@suse.de>,
linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
Christoph Hellwig <hch@lst.de>, Ming Lei <ming.lei@redhat.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>
Subject: Re: [PATCH v11 04/16] scsi: core: Introduce a mechanism for reordering requests in the error handler
Date: Thu, 24 Aug 2023 08:22:59 +0900 [thread overview]
Message-ID: <741e19ae-d4fd-11f5-7faa-18b888ff769c@kernel.org> (raw)
In-Reply-To: <078d2954-f4af-6678-29ce-d8f65ff1397a@acm.org>
On 8/24/23 00:15, Bart Van Assche wrote:
> On 8/22/23 23:26, Hannes Reinecke wrote:
>> On 8/22/23 21:16, Bart Van Assche wrote:
>>> +/*
>>> + * Comparison function that allows to sort SCSI commands by ULD driver.
>>> + */
>>> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
>>> + const struct list_head *_b)
>>> +{
>>> + struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>>> + struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>>> +
>>> + /* See also the comment above the list_sort() definition. */
>>> + return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
>>
>> I have to agree with Christoph here.
>> Comparing LBA numbers at the SCSI level is really the wrong place.
>> SCSI commands might be anything, and quite some of these commands don't
>> even have LBA numbers. So trying to order them will be pointless.
>>
>> The reordering mechanism really has to go into the block layer, with
>> the driver failing the request and the block layer resubmitting in-order.
>
> Hi Hannes,
>
> Please take another look at patches 04/16 and 05/16. As one can see no
> LBA numbers are being compared in the SCSI core - comparing LBA numbers
> happens in the sd (SCSI disk) driver. The code that you replied to
> compares ULD pointers, a well-defined concept in the SCSI core.
>
> Your request to move the functionality from patches 04/16 and 05/16 into
> the block layer would involve the following:
> * Report the unaligned write errors (because a write did not happen at the
> write pointer) to the block layer (BLK_STS_WP_MISMATCH?).
> * Introduce a mechanism in the block layer for postponing error handling
> until all outstanding commands have failed. The approach from the SCSI
> core (tracking the number of failed and the number of busy commands
> and only waking up the error handler after these counters are equal)
> would be unacceptable because of the runtime overhead this mechanism
> would introduce in the block layer hot path. Additionally, I strongly
> doubt that it is possible to introduce any mechanism for postponing
> error handling in the block layer without introducing additional
> overhead in the hot path.
> * Christoph's opinion is that NVMe software should use zone append
> (REQ_OP_ZONE_APPEND) instead of regular writes (REQ_OP_WRITE) when
> writing to a zoned namespace. So the SCSI subsystem would be the only
> user of the new mechanism introduced in the block layer. The reason we
> chose REQ_OP_WRITE for zoned UFS devices is because the SCSI standard
> does not support a zone append command and introducing a zone append
> command in the SCSI standards is not something that can be realized in
> time for the first generation of zoned UFS devices.
The sd driver does have zone append emulation using regular writes. The
emulation relies on zone write locking to avoid issues with adapters that do not
have strong ordering guarantees, but that could be adapted to be removed for UFS
devices with write ordering guarantees. This solution would greatly simplify
your series since zone append requests are not subject to zone write locking at
the block layer. So no changes would be needed at that layer.
However, that implies that your preferred use case (f2fs) must be adapted to use
zone append instead of regular writes. That in itself may be a bigger-ish
change, but in the long run, likely a better one I think as that would be
compatible with NVMe ZNS and also future UFS standards defining a zone append
command.
>
> Because I assume that both Jens and Christoph disagree strongly with your
> request: I have no plans to move the code for sorting zoned writes into
> the block layer core.
>
> Jens and Christoph, please correct me if I misunderstood something.
>
> Thanks,
>
> Bart.
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-08-23 23:24 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-22 19:16 [PATCH v11 00/16] Improve write performance for zoned UFS devices Bart Van Assche
2023-08-22 19:16 ` [PATCH v11 01/16] block: Introduce more member variables related to zone write locking Bart Van Assche
2023-08-23 6:18 ` Hannes Reinecke
2023-08-23 8:08 ` Nitesh Shetty
2023-08-22 19:16 ` [PATCH v11 02/16] block: Only use write locking if necessary Bart Van Assche
2023-08-23 6:19 ` Hannes Reinecke
2023-08-23 8:09 ` Nitesh Shetty
2023-08-22 19:16 ` [PATCH v11 03/16] block/mq-deadline: Only use zone " Bart Van Assche
2023-08-23 6:21 ` Hannes Reinecke
2023-08-23 8:23 ` Nitesh Shetty
2023-08-22 19:16 ` [PATCH v11 04/16] scsi: core: Introduce a mechanism for reordering requests in the error handler Bart Van Assche
2023-08-23 6:26 ` Hannes Reinecke
2023-08-23 15:15 ` Bart Van Assche
2023-08-23 23:22 ` Damien Le Moal [this message]
2023-08-24 14:47 ` Bart Van Assche
2023-08-24 16:44 ` Hannes Reinecke
2023-08-24 16:52 ` Bart Van Assche
2023-08-24 23:53 ` Damien Le Moal
2023-08-25 1:00 ` Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 05/16] scsi: core: Add unit tests for scsi_call_prepare_resubmit() Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 06/16] scsi: sd: Sort commands by LBA before resubmitting Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 07/16] scsi: core: Retry unaligned zoned writes Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 08/16] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 09/16] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 10/16] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 11/16] scsi: ufs: hisi: Rework the code that disables auto-hibernation Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 12/16] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 13/16] scsi: ufs: Change the return type of ufshcd_auto_hibern8_update() Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 14/16] scsi: ufs: Simplify ufshcd_auto_hibern8_update() Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 15/16] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
2023-08-22 19:17 ` [PATCH v11 16/16] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=741e19ae-d4fd-11f5-7faa-18b888ff769c@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jejb@linux.ibm.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox