Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: stefanha@linux.vnet.ibm.com,
	Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org, lcapitulino@redhat.com,
	Federico Simoncelli <fsimonce@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
Date: Wed, 14 Mar 2012 07:11:01 -0600	[thread overview]
Message-ID: <4F6098E5.4010709@redhat.com> (raw)
In-Reply-To: <4F606610.3030809@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4591 bytes --]

On 03/14/2012 03:34 AM, Kevin Wolf wrote:

>> That said, reopen is really hard to be implemented as a transaction
>> without breaking that rule. For example in the blkmirror case you'd
>> need to open the destination image in r/w while the mirroring is in
>> action (already having the same image in r/w mode).
>>
>> There are several solutions here but they are either really hard to
>> implement or non definitive. For example:
>>
>> * We could try to implement the reopen command for each special case,
>>   eg: blkmirror, reopening the same image, etc... and in such cases
>>   reusing the same bs that we already have. The downside is that this
>>   command will be coupled with all this special cases.
> 
> The problem we're trying to solve is that we have a graph of open
> BlockDriverStates (connected by bs->file, backing file relations etc.)
> and we want to transform it into a different graph of open BDSs
> atomically, reusing zero, one or many of the existing BDSs, and possibly
> with changed properties (cache mode, read-only, etc.)
> 
> What we can have reasonably easily (there are patches floating around,
> they just need to be completed), is a bdrv_reopen() that changes flags
> on one given BDS, without changing the file it's backed by. This is
> already broken up into prepare/commit/abort stages as we need it to
> reopen VMDK's split images safely.
> 
> In theory this should be enough to build the new graph by opening yet
> unused BDSs, preparing the reopen of reused ones and only if all of that
> was successful, committing the bdrv_reopen and changing the relations
> between the nodes. I hope that at the same time it's clear that this
> isn't exactly trivial to implement.

Agreed that 1) it looks implementable, and 2) it does not look trivial.

> 
>> * We could use the transaction APIs without actually making it
>>   transaction (if we fail in the middle we can't rollback). The only
>>   advantage of this is that we'd provide a consistent API to libvirt
>>   and we would postpone the problem to the future. Anyway I strongly
>>   discourage this as it's completely unsafe and it's going to break
>>   the transaction semantic. Moreover it's a solution that relies too
>>   much on the hope of finding something appropriate in the future.
> 
> This is not an option. Advertising transactional behaviour and not
> implementing it is just plain wrong.

Absolutely concur.  From libvirt's perspective, I will be assuming that:

if 'transaction' exists, then it supports 'blockdev-snapshot-sync' and
'drive-mirror' (assuming that these are the only two commands that are
in 'transaction' in qemu 1.1), but nothing else is safe to use without a
further probe.  Then, if down the road, we go through the pain of making
'drive-reopen' transactionable, then there will be some new
introspection command (one idea was an optional argument to
query-commands which limits output to just the commands that can also
occur in a 'transaction'), and libvirt will have to use that
introspection before using the additional features.

> 
>> * We could leave it as it is, a distinct command that is not part of
>>   the transaction and that it's closing the old image before opening
>>   the new one.
> 
> Yes, this would be the short-term preliminary solution. I would tend to
> leave it to downstreams to implement it as an extension, though.

Correct - I'm fine with libvirt using the direct, non-atomic,
'drive-reopen' for the immediate needs of oVirt while we work on a more
complete solution for down the road.

> 
>> This is not completely correct, the main intent was to not spread one
>> image chain across two storage domains (making it incomplete if one of
>> them was missing). In the next oVirt release a VM can have different
>> disks on different storage domains, so this wouldn't be a special case
>> but just a normal situation.
> 
> The problem with this kind of argument is that we're not developing only
> for oVirt, but need to look for what makes sense for any management tool
> (or even just direct users of qemu).

Indeed - that's why I still think it's dangerous to not have an atomic
'drive-reopen', even if oVirt can be coded to work in spite of an
initial implementation being non-atomic.  But there's a difference
between wish-lists (atomic reopen) and practicality (what can we
implement now), and I'm not going to insist on the impossible :)

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

next prev parent reply	other threads:[~2012-03-14 13:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 02/10] fix format name for backing file Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 03/10] qapi: complete implementation of unions Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 04/10] rename blockdev-group-snapshot-sync Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 05/10] add mode field to blockdev-snapshot-sync transaction item Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 06/10] qmp: convert blockdev-snapshot-sync to a wrapper around transactions Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 07/10] Add blkmirror block driver Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 08/10] add mirroring to transaction Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 09/10] add drive-mirror command and HMP equivalent Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command Paolo Bonzini
2012-03-13 20:48   ` Eric Blake
2012-03-14  0:14     ` Federico Simoncelli
2012-03-14  9:34       ` Kevin Wolf
2012-03-14 13:11         ` Eric Blake [this message]
2012-03-14 14:30           ` Kevin Wolf
2012-03-14 13:29         ` Federico Simoncelli
2012-03-14  9:17     ` Kevin Wolf
2012-03-14  9:19       ` Paolo Bonzini
2012-03-14  9:35         ` Kevin Wolf
2012-03-09 15:36 ` [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F6098E5.4010709@redhat.com \
    --to=eblake@redhat.com \
    --cc=armbru@redhat.com \
    --cc=fsimonce@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).