Re: [Qemu-devel] [PATCH RFC] block: Tolerate existing writers on read only BdrvChild

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Fam Zheng <famz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
	qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH RFC] block: Tolerate existing writers on read only BdrvChild
Date: Wed, 1 Mar 2017 20:39:13 +0800	[thread overview]
Message-ID: <20170301123913.GA26744@lemon.lan> (raw)
In-Reply-To: <20170301094917.GA4799@noname.redhat.com>

On Wed, 03/01 10:49, Kevin Wolf wrote:
> Am 01.03.2017 um 09:15 hat Fam Zheng geschrieben:
> > In an ideal world, read-write access to an image is inherently
> > exclusive, because we cannot guarantee other readers and writers a
> > consistency view of the whole image at all point. That's what the
> > current permission system does, and it is okay as long as it is entirely
> > internal. But that would change with the coming image locking. In
> > practice, both end users and our test cases use tools like qemu-img and
> > qemu-io to peek at images while guest is running.
> > 
> > Relax a bit and accept other writers in this case.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> Hm. On the one hand I think you're right that people are using things
> like this, on other hand it's also true that the result might not be
> consistent and therefore image locking is right about forbidding these
> actions.
> 
> I think your patch is too permissive, it allows even launching a second
> long-running VM on the image, which will definitely see corrupted data
> sooner or later.
> 
> Maybe what we can do is allow shared writers for read-only images if
> CONSISTENT_READ isn't requested. It's still not 100% correct because we
> can get inconsistent metadata and cause unexpected failure, but this is
> probably tolerable. We would then have to change the allowed actions to
> not request this permission.
> 
> In qemu-img, we have these read-only users:
> 
> * qemu-img check (without -r): Let's keep this blocked, it will only
>   report lots of leaks and leads to invalid bug reports. I've had my
>   share of them.
> 
> * qemu-img compare: Not sure if this makes sense with an image that is
>   in active use?
> 
> * qemu-img convert source: Similarly to qemu-img compare, this doesn't
>   make a whole lot of sense, with one exception: People are using -s to
>   extract internal snapshots while the VM is still running, and this
>   usually works because the snapshot doesn't change. However, I don't
>   want to know what happens if you delete the snapshort from the qemu
>   process while qemu-img convert is running... Doing 'qemu-img snapshot
>   -s ...' with a running VM is more tolerated abuse than a supported
>   feature.
> 
> * qemu-img info: This one makes perfect sense even with a running VM
> 
> * qemu-img map: Results are potentially meaningless with concurrent I/O,
>   but there may be cases where it makes sense.
> 
> * qemu-img snapshot -l: Somewhat similar to qemu-img convert -s, except
>   that it's very quick and doesn't access actual data (could even be
>   BDRV_O_NOIO, I think). Allowing this makes sense, I guess.
> 
> * qemu-img rebase, safe mode, backing files: If we allowed concurrent
>   writes, it wouldn't be safe.
> 
> * qemu-img bench: Well... No. You don't need this on an image with an
>   active VM.
> 
> * qemu-img dd: Same as convert, except that there is no -s.
> 
> So the list of subcommands where we want to support it is rather short.
> We can change blk_new_open() to clear CONSISTENT_READ for BDRV_O_NOIO,
> which could cover 'info' and 'snapshot -l'.
> 
> That leaves us with qemu-io, 'convert -s' and 'map', all of which can be
> imagined to be useful even with a running VM, but all of which can also
> easily produce wrong results in this case. An explicit command line
> option to disable CONSISTENT_READ might be the right tool here.

I'm not sure about this because: 1) this is intrusive from a user PoV, many
scripts and upper layer tools will stop working; 2) CONSISTENT_READ is enforced
by qcow2 in its .bdrv_child_perm implementation even if blk_new_open() doesn't
ask for it, therefore such an option has to impact the whole graph; 3) this
isn't only about asking for "persistent read" perm, but more about granting
"write" in shared_perm, so it feels messy.

A perhaps more contained way is to add a BDRV_O_RELAXED_LOCK flag and use it in
those subcommands, then in the image locking code, the "no other writer" byte is
not locked if it's set. This has a simpler semantic and a more manageable scope.

Fam

next prev parent reply	other threads:[~2017-03-01 12:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01  8:15 [Qemu-devel] [PATCH RFC] block: Tolerate existing writers on read only BdrvChild Fam Zheng
2017-03-01  9:49 ` Kevin Wolf
2017-03-01 12:39   ` Fam Zheng [this message]
2017-03-01 15:16     ` Kevin Wolf
2017-03-01 16:10       ` Fam Zheng
2017-03-01 16:22         ` Kevin Wolf
2017-03-02 11:21           ` Fam Zheng
2017-03-02 14:23             ` Kevin Wolf
2017-03-03  1:46               ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170301123913.GA26744@lemon.lan \
    --to=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).