qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Sam Li <faithilikerun@gmail.com>,
	qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	dmitry.fomichev@wdc.com, qemu-block@nongnu.org
Subject: Re: [PATCH v2] block/file-posix: optimize append write
Date: Tue, 22 Oct 2024 10:56:20 +0900	[thread overview]
Message-ID: <2980c9de-af94-4dbe-abd6-8036e4b2c95c@kernel.org> (raw)
In-Reply-To: <20241021181342.GA293227@fedora.redhat.com>

On 10/22/24 03:13, Stefan Hajnoczi wrote:
> On Mon, Oct 21, 2024 at 09:32:50PM +0900, Damien Le Moal wrote:
>> On 10/21/24 20:08, Kevin Wolf wrote:
>>> Am 20.10.2024 um 03:03 hat Damien Le Moal geschrieben:
>>>> On 10/18/24 23:37, Kevin Wolf wrote:
>>>>> Am 04.10.2024 um 12:41 hat Sam Li geschrieben:
>>>>>> When the file-posix driver emulates append write, it holds the lock
>>>>>> whenever accessing wp, which limits the IO queue depth to one.
>>>>>>
>>>>>> The write IO flow can be optimized to allow concurrent writes. The lock
>>>>>> is held in two cases:
>>>>>> 1. Assumed that the write IO succeeds, update the wp before issuing the
>>>>>> write.
>>>>>> 2. If the write IO fails, report that zone and use the reported value
>>>>>> as the current wp.
>>>>>
>>>>> What happens with the concurrent writes that started later and may not
>>>>> have completed yet? Can we really just reset to the reported value
>>>>> before all other requests have completed, too?
>>>>
>>>> Yes, because if one write fails, we know that the following writes
>>>> will fail too as they will not be aligned to the write pointer. These
>>>> subsequent failed writes will again trigger the report zones and
>>>> update, but that is fine. All of them have failed and the report will
>>>> give the same wp again.
>>>>
>>>> This is a typical pattern with zoned block device: if one write fails
>>>> in a zone, the user has to expect failures for all other writes issued
>>>> to the same zone, do a report zone to get the wp and restart writing
>>>> from there.
>>>
>>> Ok, that makes sense. Can we be sure that requests are handled in the
>>> order they were submitted, though? That is, if the failed request is
>>> resubmitted, could the already pending next one still succeed if it's
>>> overtaken by the resubmitted request? Not sure if this would even cause
>>> a probem, but is it a case we have to consider?
>>
>> A zoned device will always handle writes in the order they were submitted (per
>> zone) and that is true for emulated devices as well as real ones.
> 
> Is there serialization code in the kernel so that zoned devices behind
> multi-path keep requests ordered?

Yes: the kernel only issues at most one write per zone at any time, to preserve
ordering. So there should be no issues at all.

> Normally I don't assume any ordering between concurrent requests to a
> block device, so I'm surprised that it's safe to submit multiple writes.

Correct, the normal case does not provide any guarantees. But writes to zoned
block devices are a special case. More on this here:

https://zonedstorage.io/docs/linux/sched


-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2024-10-22  1:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04 10:41 [PATCH v2] block/file-posix: optimize append write Sam Li
2024-10-18 14:37 ` Kevin Wolf
2024-10-20  1:03   ` Damien Le Moal
2024-10-21 11:08     ` Kevin Wolf
2024-10-21 12:32       ` Damien Le Moal
2024-10-21 18:13         ` Stefan Hajnoczi
2024-10-22  1:56           ` Damien Le Moal [this message]
2024-10-21 13:21   ` Sam Li
2024-10-21 22:11     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2980c9de-af94-4dbe-abd6-8036e4b2c95c@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=dmitry.fomichev@wdc.com \
    --cc=faithilikerun@gmail.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).