From: Sam Li <faithilikerun@gmail.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
dmitry.fomichev@wdc.com, qemu-block@nongnu.org,
Eric Blake <eblake@redhat.com>,
hare@suse.de, Kevin Wolf <kwolf@redhat.com>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v7 3/4] qcow2: add zoned emulation capability
Date: Mon, 23 Sep 2024 15:40:54 +0200 [thread overview]
Message-ID: <CAAAx-8JrPFEgBPKWEjXCXi8=ReEMkCEVGe-GEPSWUnfEGcZ=XQ@mail.gmail.com> (raw)
In-Reply-To: <bc821290-2003-4795-a5fa-99a7c55e1374@kernel.org>
Hi Damien,
Damien Le Moal <dlemoal@kernel.org> 于2024年9月23日周一 15:22写道:
>
> On 2024/09/23 13:06, Sam Li wrote:
>
> [...]
>
> >>> @@ -2837,6 +3180,19 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
> >>> qiov_offset += cur_bytes;
> >>> trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
> >>> }
> >>> +
> >>> + if (bs->bl.zoned == BLK_Z_HM) {
> >>> + index = start_offset / zone_size;
> >>> + wp = &bs->wps->wp[index];
> >>> + if (!QCOW2_ZT_IS_CONV(*wp)) {
> >>> + /* Advance the write pointer when the write completes */
> >>
> >> Updating the write pointer after I/O does not prevent other write
> >> requests from beginning at the same offset as this request. Multiple
> >> write request coroutines can run concurrently and only the first one
> >> should succeed. The others should fail if they are using the same
> >> offset.
> >>
> >> The comment above says "Real drives change states before it can write to
> >> the zone" and I think it's appropriate to update the write pointer
> >> before performing the write too. The qcow2 zone emulation code is
> >> different from the file-posix.c passthrough code. We are responsible for
> >> maintaining zoned metadata state and cannot wait for the result of the
> >> I/O to tell us what happened.
>
> Yes, correct. The wp MUST be updated when issuing the IO, with the assumption
> that the write IO will succeed (errors are rare !).
>
> > The problem of updating the write pointer before IO completion is the
> > failure case. It can't be predicted in advance if an IO fails or not.
> > When write I/O fails, the wp should not be updated.
>
> Correct, if an IO fails, the wp should not be updated. However, that is not
> difficult to deal with:
> 1) under the zone lock, advance the wp position when issuing the write IO
> 2) When the write IO completes with success, nothing else needs to be done.
> 3) When *any* write IO completes with error you need to:
> - Lock the zone
> - Do a report zone for the target zone of the failed write to get the current
> wp location
> - Update bs->wps->wp[index] using that current wp location
> - Unlock the zone
>
> With that, one may get a few errors if multiple async writes are being issued,
> but that behavior is consistent with the same happening with a real drive. So no
> issue. And since the report zones gets you the current wp location, the user can
> restart writing from that location once it has dealt with all the previous write
> failures.
I see. To allow the concurrent writes, the lock will only be used on
the failure path while processing append writes.
>
> > The alternative way is to hold the wps lock as is also required for wp
> > accessing. Therefore only one of multiple concurrent write requests
> > will succeed.
>
> That is a very simple solution that avoids the above error recovery, but that
> would be very bad for performance (especially for a pure sequential write
> workload as we would limit IOs to quue depth 1). So if we can avoid this simple
> approach, that would be a lot better.
Yeah, I'll drop this approach. Although, it reminds me of how
file-posix driver emulates zone_append. It holds the lock whenever
accessing wps. Does that limit IOs to QD 1 too? If so, it can be
improved.
-- one zone_append starts
>> wp_lock()
>>> IO processing
>>>> wp_update
>>>>> wp_unlock()
-- ends
https://github.com/qemu/qemu/blob/master/block/file-posix.c#L2492
Sam
>
>
> --
> Damien Le Moal
> Western Digital Research
next prev parent reply other threads:[~2024-09-23 13:42 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-22 18:48 [PATCH v7 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2024-01-22 18:48 ` [PATCH v7 1/4] docs/qcow2: add the zoned format feature Sam Li
2024-01-22 18:48 ` [PATCH v7 2/4] qcow2: add configurations for zoned format extension Sam Li
2024-02-19 11:57 ` Markus Armbruster
2024-02-19 12:05 ` Markus Armbruster
2024-02-19 12:23 ` Sam Li
2024-02-19 14:39 ` Markus Armbruster
2024-02-19 14:48 ` Sam Li
2024-02-19 15:56 ` Markus Armbruster
2024-02-19 16:09 ` Sam Li
2024-02-19 20:42 ` Markus Armbruster
2024-02-19 20:46 ` Sam Li
2024-02-19 21:15 ` Markus Armbruster
2024-02-20 2:25 ` Damien Le Moal
2024-03-12 15:04 ` Stefan Hajnoczi
2024-01-22 18:48 ` [PATCH v7 3/4] qcow2: add zoned emulation capability Sam Li
2024-03-12 18:30 ` Stefan Hajnoczi
2024-09-23 11:06 ` Sam Li
2024-09-23 13:22 ` Damien Le Moal
2024-09-23 13:40 ` Sam Li [this message]
2024-09-23 13:48 ` Damien Le Moal
2024-09-23 13:57 ` Sam Li
2024-01-22 18:48 ` [PATCH v7 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAAx-8JrPFEgBPKWEjXCXi8=ReEMkCEVGe-GEPSWUnfEGcZ=XQ@mail.gmail.com' \
--to=faithilikerun@gmail.com \
--cc=armbru@redhat.com \
--cc=dlemoal@kernel.org \
--cc=dmitry.fomichev@wdc.com \
--cc=eblake@redhat.com \
--cc=hare@suse.de \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).