Re: [Qemu-devel] [PATCH] block: Formats don't need CONSISTENT_READ with NO_IO

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com
Subject: Re: [Qemu-devel] [PATCH] block: Formats don't need CONSISTENT_READ with NO_IO
Date: Fri, 1 Dec 2017 13:41:31 +0100	[thread overview]
Message-ID: <20171201124131.GD4147@localhost.localdomain> (raw)
In-Reply-To: <cb98da75-ca77-edce-6384-d612380e4e3c@redhat.com>

Am 30.11.2017 um 20:05 hat Eric Blake geschrieben:
> On 11/30/2017 12:27 PM, Kevin Wolf wrote:
> > > > @@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
> > > >            /* bs->file always needs to be consistent because of the metadata. We
> > > >             * can never allow other users to resize or write to it. */
> > > > -        perm |= BLK_PERM_CONSISTENT_READ;
> > > > +        if (!(flags & BDRV_O_NO_IO)) {
> > > > +            perm |= BLK_PERM_CONSISTENT_READ;
> > > 
> > > I thought BDRV_O_NO_IO only means we aren't doing I/O on guest-visible data,
> > > but doesn't stop us from reading the metadata.  The comment is telling: if
> > > we can read metadata, then we depend on CONSISTENT_READ for the metadata to
> > > be stable (even if we don't care about guest data consistency).
> > 
> > Yes and no. The trouble is that at the file system level we have only a
> > single bit to describe the consistency of the whole image throughout the
> > whole block driver tree.
> > 
> > We forbid shared BLK_PERM_CONSISTENT_READ for mirror targets (which
> > aren't fully populated yet) and intermediate nodes for commit (which
> > expose corrupted data). Both scenarios are really about the data exposed
> > at the format layer, the metadata stays completely consistent.
> 
> I guess it is the block of writes/resizes that prevents metadata from
> getting inconsistent; CONSISTENT_READ does indeed make more sense if
> interpreted solely in light of will the guest read consistent data
> (and not will the format layer see consistent contents from the
> protocol layer).

Yes, that's what the write/resize locks are meant for.

Consistent read is more about "this image may not contain the useful
data you're expecting".

> > The question is what we do with this as we propagate permissions down to
> > the protocol layer. Strictly speaking, the file at the protocol layer is
> > perfectly consistent, so we might not forbid BLK_PERM_CONSISTENT_READ
> > there. But I think it's more useful to do it anyway so that image
> > locking can prevent the typical case of another process that uses qcow2
> > over file-posix again, where the file-posix node could in theory be
> > considered consistent, but the qcow2 one wouldn't.
> 
> Does that mean we need two separate permission bits, one for whether
> the guest-visible data is consistent, and one for whether the metadata
> is consistent? But as I mentioned above, blocking writes should mean
> that no one else is messing with metadata.

Even this works only in the simple case of format over protocol.

I think this becomes pretty clear when you think of the (admittedly
unlikely) case of nested qcow2. Then you need at least a bit per layer,
one for the guest-visible data, one for the top-level qcow2 metadata and
another one for the second qcow2 metadata.

> > In the end, this is just a pragmatic way to let 'qemu-img info' work
> > while the image is a mirror target or intermediate node for commit, but
> > to forbid cases where corrupted data is used.
> > 
> > Or would you argue that either 'qemu-img info' shouldn't be working or
> > reading corrupted data should be allowed in other processes?
> 
> Having qemu-img info work on a mirror target makes sense; as you pointed
> out, the metadata flushed to disk is consistent (may have leaked clusters,
> but not broken image) at any given point by the writer (it has to be, since
> the writer has to be able to recover the image if there is an abrupt
> restart).

Exactly, that's the reasoning.

> And a second reader can still manage to see inconsistent data
> when reading two separate clusters if the timing is just right, regardless
> of whether the first writer is always able to see consistent data.  So to
> some extent, it's a measure of how risky is the action? qemu-img info is
> read-only, and most of the time will just work; it is very hard to get to
> the rare race condition where two consecutive reads raced with a parallel
> writer combine to result in incorrect interpretation of the data by the
> reader.

This again is more about the write permission. You still need 'qemu-img
info -U' because there is a concurrent writer, so you have to
acknowledge the risk you're mentioning.

> I'm not opposed to your patch, but am trying to make sure that I'm not
> overlooking any problem before giving R-b.  Maybe it's just that the
> comment needs updating in v2.

Do you have a suggestion for the comment?

Kevin

next prev parent reply	other threads:[~2017-12-01 16:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30 16:44 [Qemu-devel] [PATCH] block: Formats don't need CONSISTENT_READ with NO_IO Kevin Wolf
2017-11-30 17:09 ` Eric Blake
2017-11-30 18:27   ` Kevin Wolf
2017-11-30 19:05     ` Eric Blake
2017-12-01 12:41       ` Kevin Wolf [this message]
2017-12-01 13:42         ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201124131.GD4147@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).