qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Sun, 27 Feb 2011 17:31:04 +0200	[thread overview]
Message-ID: <4D6A6E38.4030700@redhat.com> (raw)
In-Reply-To: <4D6A58E0.9020607@codemonkey.ws>

On 02/27/2011 04:00 PM, Anthony Liguori wrote:
> On 02/27/2011 03:10 AM, Avi Kivity wrote:
>> On 02/24/2011 07:58 PM, Anthony Liguori wrote:
>>>> If you move the cdrom to a different IDE channel, you have to 
>>>> update the stateful non-config file.
>>>>
>>>> Whereas if you do
>>>>
>>>>    $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
>>>>    $ qemu -cdrom ~/foo-media-tray.img
>>>>
>>>> the cd-rom tray state will be tracked in the image file.
>>>
>>>
>>> Yeah, but how do you move it? 
>>
>> There is no need to move the file at all.  Simply point the new drive 
>> at the media tray.
>
> No, I was asking, how do you move the cdrom to a different IDE 
> channel.  Are you using QMP?  Are you changing the command line 
> arguments?

Yes.

If we're doing hot-move (not really relevant to ide-cd) then you'd use 
QMP.  If you're editing a virtual machine that is down, or scheduling a 
change for the next reboot, then you're using command line arguments (or 
cold-plugging into a stopped guest).

Requiring management to remember the old configuration and issue delta 
commands to move the device for the cold-plug case is increased 
complexity IMO.

>
>>
>>> If you do a remove/add through QMP, then the config file will 
>>> reflect things just fine.
>>
>> If all access to the state file is through QMP then it becomes more 
>> palatable.  A bit on that later.
>
> As I think I've mentioned before, I hadn't really thought about an 
> opaque state file but I'm not necessary opposed to it.  I don't see an 
> obvious advantage to making it opaque but I agree it should be 
> accessible via QMP.

The advantage is that we keep the management tool talking to one 
interface (I don't think we should prevent users from interpreting it, 
just make it unnecessary).

>>
>> I thought that's what I'm doing by separating the state out.  It's 
>> easy for management to assemble configuration from their database and 
>> convert it into a centralized representation (like a qemu command 
>> line).  It's a lot harder to disassemble a central state 
>> representation and move it back to the database.
>>
>> Using QMP is better than directly accessing the state file since qemu 
>> does the disassembly for you (provided the command references the 
>> device using its normal path, not some random key).  The file just 
>> becomes a way to survive a crash, and all management needs to know 
>> about is to make it available and back it up.  But it means that 
>> everything must be done via QMP, including assembly of the machine, 
>> otherwise the state file can become stale.
>>
>> Separating the state out to the device is even easier, since 
>> management is already expected to take care of disk images.  All 
>> that's needed is to create the media tray image once, then you can 
>> forget about it completely.
>
> Except that instead of having one state file, we might have a dozen 
> additional "device state" files.

That is fine.  We already have one state file per block device.

>>> QEMU.   No question about it.  At any point in time, we are the 
>>> authoritative source of what the guest's configuration is.  There's 
>>> no doubt about it.  A management tool can try to keep up with us, 
>>> but ultimately we are the only ones that know for sure.
>>>
>>> We have all of this information internally.  Just persisting it is 
>>> not a major architectural change.  It's something we should have 
>>> been doing (arguably) from the very beginning.
>>
>> That's a huge divergence from how management tools are written.
>
> This is one of the reasons why management tooling around QEMU needs 
> quite a bit of improving.
>
> There is simply no way a management tool can do a good job of being an 
> authoritative source of configuration.  The races we're discussion is 
> a good example of why.

What we're discussing is not configuration.  It is non-volatile state.  
Configuration comes from the user; state comes from the guest (the 
management tool may edit state; but the guest cannot edit the 
configuration).

I agree 100% the management tool cannot be the authoritative source of 
state.

My position is:
- the management tool should be 100% in control of configuration (how 
the guest is put together from its components)
- qemu should be 100% in control of state (memory, disk state, NVRAM in 
various components, cd-rom eject state, explosive bolts for payload 
separation, self-destruct mechanism, etc.)
- the management tool should have access to state using the same 
identifiers it used to create the devices that contain the state
- it is preferable to store state "in" the device so that when the 
configuration changes, state is maintained (like hot-unplug of a network 
card with NVRAM followed by hot-plug of the same card).
- the angular momentum of the planet we (presumably) are on won't 
change, whatever we do [1]

>
> But beyond those races, QEMU is the only entity that knows with 
> certainty what bits of information are important to persist in order 
> to preserve a guest across shutdown/restart.  The fact that we've 
> punted this problem for so long has only ensured that management tools 
> are either intrinsically broken or only support the most minimal 
> subset of functionality we actually support.

I'm not arguing about that.  I just want to stress again the difference 
between state and configuration.  Qemu has no authority, in my mind, as 
to configuration.  Only state.

>>   Currently they contain the required guest configuration, a 
>> representation of what's the current live configuration, and they 
>> issue monitor commands to move the live configuration towards the 
>> required configuration (or just generate a qemu command line).  What 
>> you're describing is completely different, I'm not even sure what it is.
>
> Management tools shouldn't have to think about how the monitor 
> commands they issue impact the invocation options of QEMU.

They have to, when creating a guest from scratch.

But I admit, this throws a new light (for me) on things.  What's the 
implications?
- must have a qemu instance running when editing configuration, even 
when the guest is down
- cannot add additional information to configuration; must store it in 
an external database and cross-reference it with the qemu data using the 
device ID
- when editing non-hotpluggable configuration for the next boot, must 
maintain old config somewhere, so we can issue delta commands later 
(might be needed for current way of doing things)
- no transactions/queries/etc except on non-authoritative source
- issues with shared-nothing design (well, can store the configuration 
file using DRBD).

>>
>> If you look at management tools, they believe they are the 
>> authoritative source of configuration information (not guest state, 
>> which is more or less ignored).
>
> It's because we've given them no other option.

It's the natural way of doing it.  You have a web interface that talks 
to a database.  When you want to list all VMs that have network cards on 
the production subnet, you issue a database query and get a recordset.  
How do you do that when the authoritative source of information is 
spread across a cluster?

>
>>>>
>>>> Right, but we should make it easy, not hard.
>>>
>>> Yeah, I fail to see how this makes it hard.  We conveniently are 
>>> saying, hey, this is all the state that needs to be persisted.  
>>> We'll persist it for you if you want, otherwise, we'll expose it in 
>>> a central location.
>>
>> The state-in-a-file is just a blob.  Don't expect the tool to parse 
>> it and reassociate the various bits to its own representation.  
>> Exposing it via QMP commands is a lot better though.
>
> I don't really see this as being a major issue.  There's no such thing 
> as a "blob".  If someone wants to manipulate the state, they will.   
> We need to keep compatibility to support migrating from 
> version-to-version.
>
> I agree that we want to provide QMP interfaces to work with the state 
> file.  But I don't think we should be hostile to manual manipulation.

No, not hostile.  We should make QMP commands sufficient to deal with 
it, that's all.


[1] in fact, it does change, due to tidal effects.

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2011-02-27 15:31 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 17:00 [Qemu-devel] [patch 0/3] live block copy (v2) Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 1/3] add migration_active function Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 2/3] Add support for live block copy Marcelo Tosatti
2011-02-22 20:50   ` [Qemu-devel] " Anthony Liguori
2011-02-22 21:07     ` Marcelo Tosatti
2011-02-22 21:11       ` Anthony Liguori
2011-02-22 23:09         ` Marcelo Tosatti
2011-02-22 23:14           ` Anthony Liguori
2011-02-23 13:01             ` Avi Kivity
2011-02-23 14:35               ` Anthony Liguori
2011-02-23 15:31                 ` Avi Kivity
2011-02-23 16:01                   ` Anthony Liguori
2011-02-23 16:14                     ` Avi Kivity
2011-02-23 16:28                       ` Anthony Liguori
2011-02-23 17:18                         ` Avi Kivity
2011-02-23 20:18                           ` Anthony Liguori
2011-02-23 20:44                             ` Marcelo Tosatti
2011-02-23 21:41                               ` Anthony Liguori
2011-02-24 14:39                                 ` Marcelo Tosatti
2011-02-24  7:37                             ` Markus Armbruster
2011-02-24  8:54                             ` Avi Kivity
2011-02-24 15:00                               ` Anthony Liguori
2011-02-24 15:22                                 ` Avi Kivity
2011-02-24 17:58                                   ` Anthony Liguori
2011-02-27  9:10                                     ` Avi Kivity
2011-02-27  9:55                                       ` Dor Laor
2011-02-27 13:49                                         ` Anthony Liguori
2011-02-27 16:02                                           ` Dor Laor
2011-02-27 17:25                                             ` Anthony Liguori
2011-02-28  8:58                                               ` Dor Laor
2011-02-27 14:00                                       ` Anthony Liguori
2011-02-27 15:31                                         ` Avi Kivity [this message]
2011-02-27 17:41                                           ` Anthony Liguori
2011-02-28  8:38                                             ` Avi Kivity
2011-02-28 12:45                                               ` Anthony Liguori
2011-02-28 13:21                                                 ` Avi Kivity
2011-02-28 17:33                                                   ` Anthony Liguori
2011-02-28 17:47                                                     ` Avi Kivity
2011-02-28 18:12                                                       ` Anthony Liguori
     [not found]                                                         ` <4D6CB556.5060401@redhat.c! om>
     [not found]                                                         ` <4D6CBECF.8090805@redhat.c! om>
2011-03-01  8:59                                                         ` Dor Laor
2011-03-02 12:39                                                           ` Anthony Liguori
2011-03-02 13:00                                                             ` Avi Kivity
2011-03-02 15:07                                                               ` Anthony Liguori
2011-03-01  9:39                                                         ` Avi Kivity
2011-03-01 15:51                                                           ` Anthony Liguori
2011-03-01 22:27                                                             ` Dor Laor
2011-03-02 16:30                                                             ` Avi Kivity
2011-03-02 21:55                                                               ` Anthony Liguori
2011-02-28 18:56                                                       ` Marcelo Tosatti
2011-03-01  9:45                                                         ` Avi Kivity
2011-02-23 16:17                     ` Peter Maydell
2011-02-23 16:30                       ` Anthony Liguori
2011-02-24  5:41                         ` [Qemu-devel] Unsubsribing James Brown
2011-02-24 10:00                           ` Stefan Hajnoczi
2011-02-23 17:26                   ` [Qemu-devel] Re: [patch 2/3] Add support for live block copy Markus Armbruster
2011-02-23 20:06                     ` Anthony Liguori
2011-02-24 12:15                       ` Markus Armbruster
2011-02-25  7:16                   ` Stefan Hajnoczi
2011-02-23 17:49               ` Marcelo Tosatti
2011-02-24  8:58                 ` Avi Kivity
2011-02-24 15:14                   ` Marcelo Tosatti
2011-02-24 15:28                     ` Avi Kivity
2011-02-24 16:39                       ` Marcelo Tosatti
2011-02-24 17:32                         ` Avi Kivity
2011-02-24 17:45                         ` Anthony Liguori
2011-02-27  9:22                           ` Avi Kivity
2011-02-23 12:46         ` Avi Kivity
2011-02-22 20:50   ` Anthony Liguori
2011-02-22 21:16   ` [Qemu-devel] " Anthony Liguori
2011-02-23 19:06   ` Anthony Liguori
2011-02-26  0:02     ` Marcelo Tosatti
2011-02-26 13:45       ` Anthony Liguori
2011-02-28 19:09         ` Marcelo Tosatti
2011-03-01  2:35         ` Marcelo Tosatti
2011-02-26 15:32       ` Anthony Liguori
2011-02-22 17:00 ` [Qemu-devel] [patch 3/3] do not allow migration if block copy in progress Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6A6E38.4030700@redhat.com \
    --to=avi@redhat.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).