From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: stefanha@linux.vnet.ibm.com,
Markus Armbruster <armbru@redhat.com>,
qemu-devel@nongnu.org, lcapitulino@redhat.com,
Federico Simoncelli <fsimonce@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
Date: Wed, 14 Mar 2012 07:11:01 -0600 [thread overview]
Message-ID: <4F6098E5.4010709@redhat.com> (raw)
In-Reply-To: <4F606610.3030809@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4591 bytes --]
On 03/14/2012 03:34 AM, Kevin Wolf wrote:
>> That said, reopen is really hard to be implemented as a transaction
>> without breaking that rule. For example in the blkmirror case you'd
>> need to open the destination image in r/w while the mirroring is in
>> action (already having the same image in r/w mode).
>>
>> There are several solutions here but they are either really hard to
>> implement or non definitive. For example:
>>
>> * We could try to implement the reopen command for each special case,
>> eg: blkmirror, reopening the same image, etc... and in such cases
>> reusing the same bs that we already have. The downside is that this
>> command will be coupled with all this special cases.
>
> The problem we're trying to solve is that we have a graph of open
> BlockDriverStates (connected by bs->file, backing file relations etc.)
> and we want to transform it into a different graph of open BDSs
> atomically, reusing zero, one or many of the existing BDSs, and possibly
> with changed properties (cache mode, read-only, etc.)
>
> What we can have reasonably easily (there are patches floating around,
> they just need to be completed), is a bdrv_reopen() that changes flags
> on one given BDS, without changing the file it's backed by. This is
> already broken up into prepare/commit/abort stages as we need it to
> reopen VMDK's split images safely.
>
> In theory this should be enough to build the new graph by opening yet
> unused BDSs, preparing the reopen of reused ones and only if all of that
> was successful, committing the bdrv_reopen and changing the relations
> between the nodes. I hope that at the same time it's clear that this
> isn't exactly trivial to implement.
Agreed that 1) it looks implementable, and 2) it does not look trivial.
>
>> * We could use the transaction APIs without actually making it
>> transaction (if we fail in the middle we can't rollback). The only
>> advantage of this is that we'd provide a consistent API to libvirt
>> and we would postpone the problem to the future. Anyway I strongly
>> discourage this as it's completely unsafe and it's going to break
>> the transaction semantic. Moreover it's a solution that relies too
>> much on the hope of finding something appropriate in the future.
>
> This is not an option. Advertising transactional behaviour and not
> implementing it is just plain wrong.
Absolutely concur. From libvirt's perspective, I will be assuming that:
if 'transaction' exists, then it supports 'blockdev-snapshot-sync' and
'drive-mirror' (assuming that these are the only two commands that are
in 'transaction' in qemu 1.1), but nothing else is safe to use without a
further probe. Then, if down the road, we go through the pain of making
'drive-reopen' transactionable, then there will be some new
introspection command (one idea was an optional argument to
query-commands which limits output to just the commands that can also
occur in a 'transaction'), and libvirt will have to use that
introspection before using the additional features.
>
>> * We could leave it as it is, a distinct command that is not part of
>> the transaction and that it's closing the old image before opening
>> the new one.
>
> Yes, this would be the short-term preliminary solution. I would tend to
> leave it to downstreams to implement it as an extension, though.
Correct - I'm fine with libvirt using the direct, non-atomic,
'drive-reopen' for the immediate needs of oVirt while we work on a more
complete solution for down the road.
>
>> This is not completely correct, the main intent was to not spread one
>> image chain across two storage domains (making it incomplete if one of
>> them was missing). In the next oVirt release a VM can have different
>> disks on different storage domains, so this wouldn't be a special case
>> but just a normal situation.
>
> The problem with this kind of argument is that we're not developing only
> for oVirt, but need to look for what makes sense for any management tool
> (or even just direct users of qemu).
Indeed - that's why I still think it's dangerous to not have an atomic
'drive-reopen', even if oVirt can be coded to work in spite of an
initial implementation being non-atomic. But there's a difference
between wish-lists (atomic reopen) and practicality (what can we
implement now), and I'm not going to insist on the impossible :)
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
next prev parent reply other threads:[~2012-03-14 13:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 02/10] fix format name for backing file Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 03/10] qapi: complete implementation of unions Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 04/10] rename blockdev-group-snapshot-sync Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 05/10] add mode field to blockdev-snapshot-sync transaction item Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 06/10] qmp: convert blockdev-snapshot-sync to a wrapper around transactions Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 07/10] Add blkmirror block driver Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 08/10] add mirroring to transaction Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 09/10] add drive-mirror command and HMP equivalent Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command Paolo Bonzini
2012-03-13 20:48 ` Eric Blake
2012-03-14 0:14 ` Federico Simoncelli
2012-03-14 9:34 ` Kevin Wolf
2012-03-14 13:11 ` Eric Blake [this message]
2012-03-14 14:30 ` Kevin Wolf
2012-03-14 13:29 ` Federico Simoncelli
2012-03-14 9:17 ` Kevin Wolf
2012-03-14 9:19 ` Paolo Bonzini
2012-03-14 9:35 ` Kevin Wolf
2012-03-09 15:36 ` [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F6098E5.4010709@redhat.com \
--to=eblake@redhat.com \
--cc=armbru@redhat.com \
--cc=fsimonce@redhat.com \
--cc=kwolf@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).