All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com
Subject: Re: [Qemu-devel] [PATCH] block: Formats don't need CONSISTENT_READ with NO_IO
Date: Fri, 1 Dec 2017 13:41:31 +0100	[thread overview]
Message-ID: <20171201124131.GD4147@localhost.localdomain> (raw)
In-Reply-To: <cb98da75-ca77-edce-6384-d612380e4e3c@redhat.com>

Am 30.11.2017 um 20:05 hat Eric Blake geschrieben:
> On 11/30/2017 12:27 PM, Kevin Wolf wrote:
> > > > @@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
> > > >            /* bs->file always needs to be consistent because of the metadata. We
> > > >             * can never allow other users to resize or write to it. */
> > > > -        perm |= BLK_PERM_CONSISTENT_READ;
> > > > +        if (!(flags & BDRV_O_NO_IO)) {
> > > > +            perm |= BLK_PERM_CONSISTENT_READ;
> > > 
> > > I thought BDRV_O_NO_IO only means we aren't doing I/O on guest-visible data,
> > > but doesn't stop us from reading the metadata.  The comment is telling: if
> > > we can read metadata, then we depend on CONSISTENT_READ for the metadata to
> > > be stable (even if we don't care about guest data consistency).
> > 
> > Yes and no. The trouble is that at the file system level we have only a
> > single bit to describe the consistency of the whole image throughout the
> > whole block driver tree.
> > 
> > We forbid shared BLK_PERM_CONSISTENT_READ for mirror targets (which
> > aren't fully populated yet) and intermediate nodes for commit (which
> > expose corrupted data). Both scenarios are really about the data exposed
> > at the format layer, the metadata stays completely consistent.
> 
> I guess it is the block of writes/resizes that prevents metadata from
> getting inconsistent; CONSISTENT_READ does indeed make more sense if
> interpreted solely in light of will the guest read consistent data
> (and not will the format layer see consistent contents from the
> protocol layer).

Yes, that's what the write/resize locks are meant for.

Consistent read is more about "this image may not contain the useful
data you're expecting".

> > The question is what we do with this as we propagate permissions down to
> > the protocol layer. Strictly speaking, the file at the protocol layer is
> > perfectly consistent, so we might not forbid BLK_PERM_CONSISTENT_READ
> > there. But I think it's more useful to do it anyway so that image
> > locking can prevent the typical case of another process that uses qcow2
> > over file-posix again, where the file-posix node could in theory be
> > considered consistent, but the qcow2 one wouldn't.
> 
> Does that mean we need two separate permission bits, one for whether
> the guest-visible data is consistent, and one for whether the metadata
> is consistent? But as I mentioned above, blocking writes should mean
> that no one else is messing with metadata.

Even this works only in the simple case of format over protocol.

I think this becomes pretty clear when you think of the (admittedly
unlikely) case of nested qcow2. Then you need at least a bit per layer,
one for the guest-visible data, one for the top-level qcow2 metadata and
another one for the second qcow2 metadata.

> > In the end, this is just a pragmatic way to let 'qemu-img info' work
> > while the image is a mirror target or intermediate node for commit, but
> > to forbid cases where corrupted data is used.
> > 
> > Or would you argue that either 'qemu-img info' shouldn't be working or
> > reading corrupted data should be allowed in other processes?
> 
> Having qemu-img info work on a mirror target makes sense; as you pointed
> out, the metadata flushed to disk is consistent (may have leaked clusters,
> but not broken image) at any given point by the writer (it has to be, since
> the writer has to be able to recover the image if there is an abrupt
> restart).

Exactly, that's the reasoning.

> And a second reader can still manage to see inconsistent data
> when reading two separate clusters if the timing is just right, regardless
> of whether the first writer is always able to see consistent data.  So to
> some extent, it's a measure of how risky is the action? qemu-img info is
> read-only, and most of the time will just work; it is very hard to get to
> the rare race condition where two consecutive reads raced with a parallel
> writer combine to result in incorrect interpretation of the data by the
> reader.

This again is more about the write permission. You still need 'qemu-img
info -U' because there is a concurrent writer, so you have to
acknowledge the risk you're mentioning.

> I'm not opposed to your patch, but am trying to make sure that I'm not
> overlooking any problem before giving R-b.  Maybe it's just that the
> comment needs updating in v2.

Do you have a suggestion for the comment?

Kevin

  reply	other threads:[~2017-12-01 16:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30 16:44 [Qemu-devel] [PATCH] block: Formats don't need CONSISTENT_READ with NO_IO Kevin Wolf
2017-11-30 17:09 ` Eric Blake
2017-11-30 18:27   ` Kevin Wolf
2017-11-30 19:05     ` Eric Blake
2017-12-01 12:41       ` Kevin Wolf [this message]
2017-12-01 13:42         ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201124131.GD4147@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.