qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Sam Li <faithilikerun@gmail.com>, Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	dmitry.fomichev@wdc.com, qemu-block@nongnu.org,
	Eric Blake <eblake@redhat.com>,
	hare@suse.de, Kevin Wolf <kwolf@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v7 3/4] qcow2: add zoned emulation capability
Date: Mon, 23 Sep 2024 15:22:20 +0200	[thread overview]
Message-ID: <bc821290-2003-4795-a5fa-99a7c55e1374@kernel.org> (raw)
In-Reply-To: <CAAAx-8LyxDtZra_5TC0CLmq4F4ShYtQqVTF0OCGVZ9tYWP4QMA@mail.gmail.com>

On 2024/09/23 13:06, Sam Li wrote:

[...]

>>> @@ -2837,6 +3180,19 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
>>>          qiov_offset += cur_bytes;
>>>          trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
>>>      }
>>> +
>>> +    if (bs->bl.zoned == BLK_Z_HM) {
>>> +        index = start_offset / zone_size;
>>> +        wp = &bs->wps->wp[index];
>>> +        if (!QCOW2_ZT_IS_CONV(*wp)) {
>>> +            /* Advance the write pointer when the write completes */
>>
>> Updating the write pointer after I/O does not prevent other write
>> requests from beginning at the same offset as this request. Multiple
>> write request coroutines can run concurrently and only the first one
>> should succeed. The others should fail if they are using the same
>> offset.
>>
>> The comment above says "Real drives change states before it can write to
>> the zone" and I think it's appropriate to update the write pointer
>> before performing the write too. The qcow2 zone emulation code is
>> different from the file-posix.c passthrough code. We are responsible for
>> maintaining zoned metadata state and cannot wait for the result of the
>> I/O to tell us what happened.

Yes, correct. The wp MUST be updated when issuing the IO, with the assumption
that the write IO will succeed (errors are rare !).

> The problem of updating the write pointer before IO completion is the
> failure case.  It can't be predicted in advance if an IO fails or not.
> When write I/O fails, the wp should not be updated.

Correct, if an IO fails, the wp should not be updated. However, that is not
difficult to deal with:
1) under the zone lock, advance the wp position when issuing the write IO
2) When the write IO completes with success, nothing else needs to be done.
3) When *any* write IO completes with error you need to:
	- Lock the zone
	- Do a report zone for the target zone of the failed write to get the current
wp location
	- Update bs->wps->wp[index] using that current wp location
	- Unlock the zone

With that, one may get a few errors if multiple async writes are being issued,
but that behavior is consistent with the same happening with a real drive. So no
issue. And since the report zones gets you the current wp location, the user can
restart writing from that location once it has dealt with all the previous write
failures.

> The alternative way is to hold the wps lock as is also required for wp
> accessing. Therefore only one of multiple concurrent write requests
> will succeed.

That is a very simple solution that avoids the above error recovery, but that
would be very bad for performance (especially for a pure sequential write
workload as we would limit IOs to quue depth 1). So if we can avoid this simple
approach, that would be a lot better.


-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2024-09-23 13:23 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 18:48 [PATCH v7 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2024-01-22 18:48 ` [PATCH v7 1/4] docs/qcow2: add the zoned format feature Sam Li
2024-01-22 18:48 ` [PATCH v7 2/4] qcow2: add configurations for zoned format extension Sam Li
2024-02-19 11:57   ` Markus Armbruster
2024-02-19 12:05     ` Markus Armbruster
2024-02-19 12:23       ` Sam Li
2024-02-19 14:39         ` Markus Armbruster
2024-02-19 14:48           ` Sam Li
2024-02-19 15:56             ` Markus Armbruster
2024-02-19 16:09               ` Sam Li
2024-02-19 20:42                 ` Markus Armbruster
2024-02-19 20:46                   ` Sam Li
2024-02-19 21:15                     ` Markus Armbruster
2024-02-20  2:25                       ` Damien Le Moal
2024-03-12 15:04   ` Stefan Hajnoczi
2024-01-22 18:48 ` [PATCH v7 3/4] qcow2: add zoned emulation capability Sam Li
2024-03-12 18:30   ` Stefan Hajnoczi
2024-09-23 11:06     ` Sam Li
2024-09-23 13:22       ` Damien Le Moal [this message]
2024-09-23 13:40         ` Sam Li
2024-09-23 13:48           ` Damien Le Moal
2024-09-23 13:57             ` Sam Li
2024-01-22 18:48 ` [PATCH v7 4/4] iotests: test the zoned format feature for qcow2 file Sam Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc821290-2003-4795-a5fa-99a7c55e1374@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=armbru@redhat.com \
    --cc=dmitry.fomichev@wdc.com \
    --cc=eblake@redhat.com \
    --cc=faithilikerun@gmail.com \
    --cc=hare@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).