qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Sam Li <faithilikerun@gmail.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-devel@nongnu.org,  Hanna Reitz <hreitz@redhat.com>,
	dmitry.fomichev@wdc.com, qemu-block@nongnu.org,
	Eric Blake <eblake@redhat.com>,
	hare@suse.de, Kevin Wolf <kwolf@redhat.com>,
	 Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v7 3/4] qcow2: add zoned emulation capability
Date: Mon, 23 Sep 2024 15:40:54 +0200	[thread overview]
Message-ID: <CAAAx-8JrPFEgBPKWEjXCXi8=ReEMkCEVGe-GEPSWUnfEGcZ=XQ@mail.gmail.com> (raw)
In-Reply-To: <bc821290-2003-4795-a5fa-99a7c55e1374@kernel.org>

Hi Damien,

Damien Le Moal <dlemoal@kernel.org> 于2024年9月23日周一 15:22写道:
>
> On 2024/09/23 13:06, Sam Li wrote:
>
> [...]
>
> >>> @@ -2837,6 +3180,19 @@ qcow2_co_pwritev_part(BlockDriverState *bs, int64_t offset, int64_t bytes,
> >>>          qiov_offset += cur_bytes;
> >>>          trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
> >>>      }
> >>> +
> >>> +    if (bs->bl.zoned == BLK_Z_HM) {
> >>> +        index = start_offset / zone_size;
> >>> +        wp = &bs->wps->wp[index];
> >>> +        if (!QCOW2_ZT_IS_CONV(*wp)) {
> >>> +            /* Advance the write pointer when the write completes */
> >>
> >> Updating the write pointer after I/O does not prevent other write
> >> requests from beginning at the same offset as this request. Multiple
> >> write request coroutines can run concurrently and only the first one
> >> should succeed. The others should fail if they are using the same
> >> offset.
> >>
> >> The comment above says "Real drives change states before it can write to
> >> the zone" and I think it's appropriate to update the write pointer
> >> before performing the write too. The qcow2 zone emulation code is
> >> different from the file-posix.c passthrough code. We are responsible for
> >> maintaining zoned metadata state and cannot wait for the result of the
> >> I/O to tell us what happened.
>
> Yes, correct. The wp MUST be updated when issuing the IO, with the assumption
> that the write IO will succeed (errors are rare !).
>
> > The problem of updating the write pointer before IO completion is the
> > failure case.  It can't be predicted in advance if an IO fails or not.
> > When write I/O fails, the wp should not be updated.
>
> Correct, if an IO fails, the wp should not be updated. However, that is not
> difficult to deal with:
> 1) under the zone lock, advance the wp position when issuing the write IO
> 2) When the write IO completes with success, nothing else needs to be done.
> 3) When *any* write IO completes with error you need to:
>         - Lock the zone
>         - Do a report zone for the target zone of the failed write to get the current
> wp location
>         - Update bs->wps->wp[index] using that current wp location
>         - Unlock the zone
>
> With that, one may get a few errors if multiple async writes are being issued,
> but that behavior is consistent with the same happening with a real drive. So no
> issue. And since the report zones gets you the current wp location, the user can
> restart writing from that location once it has dealt with all the previous write
> failures.

I see. To allow the concurrent writes, the lock will only be used on
the failure path while processing append writes.

>
> > The alternative way is to hold the wps lock as is also required for wp
> > accessing. Therefore only one of multiple concurrent write requests
> > will succeed.
>
> That is a very simple solution that avoids the above error recovery, but that
> would be very bad for performance (especially for a pure sequential write
> workload as we would limit IOs to quue depth 1). So if we can avoid this simple
> approach, that would be a lot better.

Yeah, I'll drop this approach. Although, it reminds me of how
file-posix driver emulates zone_append. It holds the lock whenever
accessing wps. Does that limit IOs to QD 1 too? If so, it can be
improved.
-- one zone_append starts
>> wp_lock()
>>> IO processing
>>>> wp_update
>>>>> wp_unlock()
-- ends

https://github.com/qemu/qemu/blob/master/block/file-posix.c#L2492

Sam

>
>
> --
> Damien Le Moal
> Western Digital Research


  reply	other threads:[~2024-09-23 13:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 18:48 [PATCH v7 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2024-01-22 18:48 ` [PATCH v7 1/4] docs/qcow2: add the zoned format feature Sam Li
2024-01-22 18:48 ` [PATCH v7 2/4] qcow2: add configurations for zoned format extension Sam Li
2024-02-19 11:57   ` Markus Armbruster
2024-02-19 12:05     ` Markus Armbruster
2024-02-19 12:23       ` Sam Li
2024-02-19 14:39         ` Markus Armbruster
2024-02-19 14:48           ` Sam Li
2024-02-19 15:56             ` Markus Armbruster
2024-02-19 16:09               ` Sam Li
2024-02-19 20:42                 ` Markus Armbruster
2024-02-19 20:46                   ` Sam Li
2024-02-19 21:15                     ` Markus Armbruster
2024-02-20  2:25                       ` Damien Le Moal
2024-03-12 15:04   ` Stefan Hajnoczi
2024-01-22 18:48 ` [PATCH v7 3/4] qcow2: add zoned emulation capability Sam Li
2024-03-12 18:30   ` Stefan Hajnoczi
2024-09-23 11:06     ` Sam Li
2024-09-23 13:22       ` Damien Le Moal
2024-09-23 13:40         ` Sam Li [this message]
2024-09-23 13:48           ` Damien Le Moal
2024-09-23 13:57             ` Sam Li
2024-01-22 18:48 ` [PATCH v7 4/4] iotests: test the zoned format feature for qcow2 file Sam Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAAx-8JrPFEgBPKWEjXCXi8=ReEMkCEVGe-GEPSWUnfEGcZ=XQ@mail.gmail.com' \
    --to=faithilikerun@gmail.com \
    --cc=armbru@redhat.com \
    --cc=dlemoal@kernel.org \
    --cc=dmitry.fomichev@wdc.com \
    --cc=eblake@redhat.com \
    --cc=hare@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).