From: Damien Le Moal <dlemoal@kernel.org>
To: Bart Van Assche <bvanassche@acm.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>
Subject: Re: [PATCH v2 4/5] scsi: Retry unaligned zoned writes
Date: Wed, 19 Jul 2023 18:59:06 +0900 [thread overview]
Message-ID: <fd64fa90-1227-6d4d-8f0b-fc67d8c42a7e@kernel.org> (raw)
In-Reply-To: <fb8b1b7e-4054-6598-8204-eb252395227d@acm.org>
On 7/19/23 07:53, Bart Van Assche wrote:
> On 7/17/23 23:47, Damien Le Moal wrote:
>> On 7/11/23 03:01, Bart Van Assche wrote:
>>> Send commands that failed with an unaligned write error to the SCSI
>>> error
>>> handler. Let the SCSI error handler sort SCSI commands per LBA before
>>> resubmitting these.
>>>
>>> Increase the number of retries for write commands sent to a sequential
>>> zone to the maximum number of outstanding commands.
>>
>> I think I mentioned this before. When we started btrfs work, we did
>> something
>> similar (but at the IO scheduler level) to try to avoid adding a big
>> lock in
>> btrfs to serialize (and thus order) writes. What we discovered is that
>> it was
>> extremely easy to fall into a situation were the maximum number of
>> possible
>> outstanding request is already issued, but they all are behind a
>> "hole" and
>> indefinitely delayed because the missing request cannot be issued due
>> to the max
>> nr request limit being reached. No forward progress and deadlock.
>>
>> I do not see how your change addresses this problem. The same will
>> happen with
>> this and I do not have any suggestion how to solve this. For btrfs, we
>> ended up
>> using cone append emulation for scsi to avoid the big lock and avoid
>> the FS from
>> having to order writes. That solution guarantees forward progress.
>> Delaying
>> already issued writes that are not sequential has no such guarantees.
>
> Hi Damien,
>
> Thank you for having explained in detail the scenario that you ran into.
>
> I think what has been explained above is a scenario in which the filesystem
> allocates requests per zone in another order than the LBA order. How about
> requiring that the filesystem allocates and submits zoned writes in LBA
> order
> per zone? I think that this is how F2FS supports zoned storage.
Sure. But what if an application uses the drive directly ? You loose
guarantees of forward progress then. Given that an application has to
use direct IO for writes to sequential zones, this is unlikely to happen
in a "good" scenario, but it also would not be hard to write an
application that can deadlock the drive forever by simply missing one
write in a sequence of writes for a zone... That is my concern. While
f2fs would likely be OK, the delay approach is not solid enough for all
cases.
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-07-19 9:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-10 18:01 [PATCH v2 0/5] Enable zoned write pipelining for UFS devices Bart Van Assche
2023-07-10 18:01 ` [PATCH v2 1/5] block: Introduce a request queue flag for pipelining zoned writes Bart Van Assche
2023-07-18 6:34 ` Damien Le Moal
2023-07-18 22:37 ` Bart Van Assche
2023-07-19 9:58 ` Damien Le Moal
2023-07-10 18:01 ` [PATCH v2 2/5] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
2023-07-18 6:38 ` Damien Le Moal
2023-07-18 22:38 ` Bart Van Assche
2023-07-24 21:39 ` Bart Van Assche
2023-07-10 18:01 ` [PATCH v2 3/5] block/null_blk: Add support for pipelining zoned writes Bart Van Assche
2023-07-10 18:01 ` [PATCH v2 4/5] scsi: Retry unaligned " Bart Van Assche
2023-07-18 6:47 ` Damien Le Moal
2023-07-18 22:53 ` Bart Van Assche
2023-07-19 9:59 ` Damien Le Moal [this message]
2023-07-19 16:31 ` Bart Van Assche
2023-07-19 23:07 ` Damien Le Moal
2023-07-20 18:18 ` Bart Van Assche
2023-07-21 0:20 ` Damien Le Moal
2023-07-10 18:01 ` [PATCH v2 5/5] scsi: ufs: Enable zoned write pipelining Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd64fa90-1227-6d4d-8f0b-fc67d8c42a7e@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=jejb@linux.ibm.com \
--cc=linux-block@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).