All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Sam Li <faithilikerun@gmail.com>
Cc: dlemoal@kernel.org, qemu-devel@nongnu.org,
	Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>,
	dmitry.fomichev@wdc.com, Kevin Wolf <kwolf@redhat.com>,
	cassel@kernel.org, Markus Armbruster <armbru@redhat.com>,
	qemu-block@nongnu.org, Eric Blake <eblake@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	hare@suse.de, Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH v10 2/4] qcow2: add configurations for zoned format extension
Date: Wed, 20 May 2026 13:59:05 -0400	[thread overview]
Message-ID: <20260520175905.GA384978@fedora> (raw)
In-Reply-To: <CAAAx-8Jooodx88LTJvVjHQYKj-SgkCxXkMTi1=ZdR3v3cpUvUg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3595 bytes --]

On Tue, May 19, 2026 at 11:20:18PM +0200, Sam Li wrote:
> On Tue, May 19, 2026 at 5:49 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, May 18, 2026 at 12:21:55AM +0200, Sam Li wrote:
> > > On Thu, May 14, 2026 at 9:49 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > On Sun, May 10, 2026 at 07:50:57PM +0200, Sam Li wrote:
> > > > > +         48 - 55:  zonedmeta_offset
> > > > > +                   The offset of zoned metadata structure in the contained
> > > > > +                   image, in bytes.
> > > >
> > > > Do you want to say anything about the order in which metadata is
> > > > persisted to disk when zones used? I guess the data is written into the
> > > > image file first, then the non-zoned qcow2 L1/L2/refcount metadata is
> > > > updated, and finally the write pointer is written. Write pointers are
> > > > not guaranteed to be updated on disk until the write request followed by
> > > > a flush request are both completed.
> > >
> > > The current ordering is not like that. The write pointer is written
> > > persistently first, then the data writes and the non-zoned qcow2
> > > L1/L2/refcount metadata updates. On IO failure, the corresponding
> > > write pointer is re-read from disk. As noted in the previous comment,
> > > the wp must be updated when issuing the IO, under the assumption that
> > > the write IO will succeed.
> > >
> > > The ordering has been settled this way since v7 to deal with
> > > concurrent zone append writes. If the wp was only updated after data
> > > I/O, two concurrent appends would both have read the same wp and tried
> > > to write to the same position.
> > >
> > > >
> > > > (The idea is that the data must be visible in the qcow2 file before it
> > > > is safe to update the write pointer. Otherwise a power failure would
> > > > leave the file in an inconsistent state where the write pointer has
> > > > advanced but the data was not written.)
> > >
> > > The crash-consistency is a concern...
> >
> > Yes, I'm thinking about crash-consistency. The ordering you described
> > can result in qcow2 images where the write pointer is ahead of the
> > actually written data after a power failure or maybe a QEMU crash.
> >
> > QEMU's block layer must follow the same data integrity behavior that
> > real devices guarantee.
> 
> I may have found a solution to deal with both cases. The fix is to
> update wp in memory instead of flushing it before qcow2 metadata and
> data writes. The zone append write path would become:
> 
> On submission:
> 
> 1) wp_lock()
> 2) Check write alignment
> 3) wp_update (in memory)
> 4) wp_unlock()
> 5) Issue write
> 
> And on completion:
> 1) If no error: wp_flush with locks and return success

The data may not be visible in the qcow2 file yet because qcow2's L1/L2/refcount
cache is not written back to the file until a flush request. I think the
write pointer updates should have a dependency on the qcow2 metadata so
that write pointers are only written after qcow2 metadata.

See block/qcow2-cache.c and qcow2_cache_set_dependency(). The idea is
that one type of cached metadata can set a dependency on another type of
cached metadata so that ordering is guaranteed.

> 2) else, wp_lock()
> 3) read_wp (from disk) and use the read wp value as the current wp
> 4) wp_unlock()
> 5) return IO error
> 
> Sam
> 
> >
> > Damien: Do real zoned block devices guarantee that the updated write
> > pointer is persisted only after appended data has written been
> > persisted?
> >
> > Stefan
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2026-05-20 18:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-10 17:50 [PATCH v10 0/4] Add full zoned storage emulation to qcow2 driver Sam Li
2026-05-10 17:50 ` [PATCH v10 1/4] docs/qcow2: add the zoned format feature Sam Li
2026-05-14 18:47   ` Stefan Hajnoczi
2026-05-17 21:07     ` Sam Li
2026-05-10 17:50 ` [PATCH v10 2/4] qcow2: add configurations for zoned format extension Sam Li
2026-05-14 19:49   ` Stefan Hajnoczi
2026-05-17 22:21     ` Sam Li
2026-05-19 15:49       ` Stefan Hajnoczi
2026-05-19 15:55         ` Damien Le Moal
2026-05-19 21:20         ` Sam Li
2026-05-20 17:59           ` Stefan Hajnoczi [this message]
2026-05-20 18:23             ` Sam Li
2026-05-21 15:18               ` Stefan Hajnoczi
2026-05-18  7:57   ` Markus Armbruster
2026-05-10 17:50 ` [PATCH v10 3/4] qcow2: add zoned emulation capability Sam Li
2026-05-14 20:23   ` Stefan Hajnoczi
2026-05-10 17:50 ` [PATCH v10 4/4] iotests: test the zoned format feature for qcow2 file Sam Li
2026-05-14 18:38 ` [PATCH v10 0/4] Add full zoned storage emulation to qcow2 driver Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520175905.GA384978@fedora \
    --to=stefanha@redhat.com \
    --cc=armbru@redhat.com \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=dmitry.fomichev@wdc.com \
    --cc=eblake@redhat.com \
    --cc=faithilikerun@gmail.com \
    --cc=hare@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=pierrick.bouvier@oss.qualcomm.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.