All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Sun, 27 Feb 2011 17:31:04 +0200	[thread overview]
Message-ID: <4D6A6E38.4030700@redhat.com> (raw)
In-Reply-To: <4D6A58E0.9020607@codemonkey.ws>

On 02/27/2011 04:00 PM, Anthony Liguori wrote:
> On 02/27/2011 03:10 AM, Avi Kivity wrote:
>> On 02/24/2011 07:58 PM, Anthony Liguori wrote:
>>>> If you move the cdrom to a different IDE channel, you have to 
>>>> update the stateful non-config file.
>>>>
>>>> Whereas if you do
>>>>
>>>>    $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
>>>>    $ qemu -cdrom ~/foo-media-tray.img
>>>>
>>>> the cd-rom tray state will be tracked in the image file.
>>>
>>>
>>> Yeah, but how do you move it? 
>>
>> There is no need to move the file at all.  Simply point the new drive 
>> at the media tray.
>
> No, I was asking, how do you move the cdrom to a different IDE 
> channel.  Are you using QMP?  Are you changing the command line 
> arguments?

Yes.

If we're doing hot-move (not really relevant to ide-cd) then you'd use 
QMP.  If you're editing a virtual machine that is down, or scheduling a 
change for the next reboot, then you're using command line arguments (or 
cold-plugging into a stopped guest).

Requiring management to remember the old configuration and issue delta 
commands to move the device for the cold-plug case is increased 
complexity IMO.

>
>>
>>> If you do a remove/add through QMP, then the config file will 
>>> reflect things just fine.
>>
>> If all access to the state file is through QMP then it becomes more 
>> palatable.  A bit on that later.
>
> As I think I've mentioned before, I hadn't really thought about an 
> opaque state file but I'm not necessary opposed to it.  I don't see an 
> obvious advantage to making it opaque but I agree it should be 
> accessible via QMP.

The advantage is that we keep the management tool talking to one 
interface (I don't think we should prevent users from interpreting it, 
just make it unnecessary).

>>
>> I thought that's what I'm doing by separating the state out.  It's 
>> easy for management to assemble configuration from their database and 
>> convert it into a centralized representation (like a qemu command 
>> line).  It's a lot harder to disassemble a central state 
>> representation and move it back to the database.
>>
>> Using QMP is better than directly accessing the state file since qemu 
>> does the disassembly for you (provided the command references the 
>> device using its normal path, not some random key).  The file just 
>> becomes a way to survive a crash, and all management needs to know 
>> about is to make it available and back it up.  But it means that 
>> everything must be done via QMP, including assembly of the machine, 
>> otherwise the state file can become stale.
>>
>> Separating the state out to the device is even easier, since 
>> management is already expected to take care of disk images.  All 
>> that's needed is to create the media tray image once, then you can 
>> forget about it completely.
>
> Except that instead of having one state file, we might have a dozen 
> additional "device state" files.

That is fine.  We already have one state file per block device.

>>> QEMU.   No question about it.  At any point in time, we are the 
>>> authoritative source of what the guest's configuration is.  There's 
>>> no doubt about it.  A management tool can try to keep up with us, 
>>> but ultimately we are the only ones that know for sure.
>>>
>>> We have all of this information internally.  Just persisting it is 
>>> not a major architectural change.  It's something we should have 
>>> been doing (arguably) from the very beginning.
>>
>> That's a huge divergence from how management tools are written.
>
> This is one of the reasons why management tooling around QEMU needs 
> quite a bit of improving.
>
> There is simply no way a management tool can do a good job of being an 
> authoritative source of configuration.  The races we're discussion is 
> a good example of why.

What we're discussing is not configuration.  It is non-volatile state.  
Configuration comes from the user; state comes from the guest (the 
management tool may edit state; but the guest cannot edit the 
configuration).

I agree 100% the management tool cannot be the authoritative source of 
state.

My position is:
- the management tool should be 100% in control of configuration (how 
the guest is put together from its components)
- qemu should be 100% in control of state (memory, disk state, NVRAM in 
various components, cd-rom eject state, explosive bolts for payload 
separation, self-destruct mechanism, etc.)
- the management tool should have access to state using the same 
identifiers it used to create the devices that contain the state
- it is preferable to store state "in" the device so that when the 
configuration changes, state is maintained (like hot-unplug of a network 
card with NVRAM followed by hot-plug of the same card).
- the angular momentum of the planet we (presumably) are on won't 
change, whatever we do [1]

>
> But beyond those races, QEMU is the only entity that knows with 
> certainty what bits of information are important to persist in order 
> to preserve a guest across shutdown/restart.  The fact that we've 
> punted this problem for so long has only ensured that management tools 
> are either intrinsically broken or only support the most minimal 
> subset of functionality we actually support.

I'm not arguing about that.  I just want to stress again the difference 
between state and configuration.  Qemu has no authority, in my mind, as 
to configuration.  Only state.

>>   Currently they contain the required guest configuration, a 
>> representation of what's the current live configuration, and they 
>> issue monitor commands to move the live configuration towards the 
>> required configuration (or just generate a qemu command line).  What 
>> you're describing is completely different, I'm not even sure what it is.
>
> Management tools shouldn't have to think about how the monitor 
> commands they issue impact the invocation options of QEMU.

They have to, when creating a guest from scratch.

But I admit, this throws a new light (for me) on things.  What's the 
implications?
- must have a qemu instance running when editing configuration, even 
when the guest is down
- cannot add additional information to configuration; must store it in 
an external database and cross-reference it with the qemu data using the 
device ID
- when editing non-hotpluggable configuration for the next boot, must 
maintain old config somewhere, so we can issue delta commands later 
(might be needed for current way of doing things)
- no transactions/queries/etc except on non-authoritative source
- issues with shared-nothing design (well, can store the configuration 
file using DRBD).

>>
>> If you look at management tools, they believe they are the 
>> authoritative source of configuration information (not guest state, 
>> which is more or less ignored).
>
> It's because we've given them no other option.

It's the natural way of doing it.  You have a web interface that talks 
to a database.  When you want to list all VMs that have network cards on 
the production subnet, you issue a database query and get a recordset.  
How do you do that when the authoritative source of information is 
spread across a cluster?

>
>>>>
>>>> Right, but we should make it easy, not hard.
>>>
>>> Yeah, I fail to see how this makes it hard.  We conveniently are 
>>> saying, hey, this is all the state that needs to be persisted.  
>>> We'll persist it for you if you want, otherwise, we'll expose it in 
>>> a central location.
>>
>> The state-in-a-file is just a blob.  Don't expect the tool to parse 
>> it and reassociate the various bits to its own representation.  
>> Exposing it via QMP commands is a lot better though.
>
> I don't really see this as being a major issue.  There's no such thing 
> as a "blob".  If someone wants to manipulate the state, they will.   
> We need to keep compatibility to support migrating from 
> version-to-version.
>
> I agree that we want to provide QMP interfaces to work with the state 
> file.  But I don't think we should be hostile to manual manipulation.

No, not hostile.  We should make QMP commands sufficient to deal with 
it, that's all.


[1] in fact, it does change, due to tidal effects.

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2011-02-27 15:31 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 17:00 [Qemu-devel] [patch 0/3] live block copy (v2) Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 1/3] add migration_active function Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 2/3] Add support for live block copy Marcelo Tosatti
2011-02-22 20:50   ` [Qemu-devel] " Anthony Liguori
2011-02-22 21:07     ` Marcelo Tosatti
2011-02-22 21:11       ` Anthony Liguori
2011-02-22 23:09         ` Marcelo Tosatti
2011-02-22 23:14           ` Anthony Liguori
2011-02-23 13:01             ` Avi Kivity
2011-02-23 14:35               ` Anthony Liguori
2011-02-23 15:31                 ` Avi Kivity
2011-02-23 16:01                   ` Anthony Liguori
2011-02-23 16:14                     ` Avi Kivity
2011-02-23 16:28                       ` Anthony Liguori
2011-02-23 17:18                         ` Avi Kivity
2011-02-23 20:18                           ` Anthony Liguori
2011-02-23 20:44                             ` Marcelo Tosatti
2011-02-23 21:41                               ` Anthony Liguori
2011-02-24 14:39                                 ` Marcelo Tosatti
2011-02-24  7:37                             ` Markus Armbruster
2011-02-24  8:54                             ` Avi Kivity
2011-02-24 15:00                               ` Anthony Liguori
2011-02-24 15:22                                 ` Avi Kivity
2011-02-24 17:58                                   ` Anthony Liguori
2011-02-27  9:10                                     ` Avi Kivity
2011-02-27  9:55                                       ` Dor Laor
2011-02-27 13:49                                         ` Anthony Liguori
2011-02-27 16:02                                           ` Dor Laor
2011-02-27 17:25                                             ` Anthony Liguori
2011-02-28  8:58                                               ` Dor Laor
2011-02-27 14:00                                       ` Anthony Liguori
2011-02-27 15:31                                         ` Avi Kivity [this message]
2011-02-27 17:41                                           ` Anthony Liguori
2011-02-28  8:38                                             ` Avi Kivity
2011-02-28 12:45                                               ` Anthony Liguori
2011-02-28 13:21                                                 ` Avi Kivity
2011-02-28 17:33                                                   ` Anthony Liguori
2011-02-28 17:47                                                     ` Avi Kivity
2011-02-28 18:12                                                       ` Anthony Liguori
     [not found]                                                         ` <4D6CBECF.8090805@redhat.c! om>
     [not found]                                                         ` <4D6CB556.5060401@redhat.c! om>
2011-03-01  8:59                                                         ` Dor Laor
2011-03-02 12:39                                                           ` Anthony Liguori
2011-03-02 13:00                                                             ` Avi Kivity
2011-03-02 15:07                                                               ` Anthony Liguori
2011-03-01  9:39                                                         ` Avi Kivity
2011-03-01 15:51                                                           ` Anthony Liguori
2011-03-01 22:27                                                             ` Dor Laor
2011-03-02 16:30                                                             ` Avi Kivity
2011-03-02 21:55                                                               ` Anthony Liguori
2011-02-28 18:56                                                       ` Marcelo Tosatti
2011-03-01  9:45                                                         ` Avi Kivity
2011-02-23 16:17                     ` Peter Maydell
2011-02-23 16:30                       ` Anthony Liguori
2011-02-24  5:41                         ` [Qemu-devel] Unsubsribing James Brown
2011-02-24 10:00                           ` Stefan Hajnoczi
2011-02-23 17:26                   ` [Qemu-devel] Re: [patch 2/3] Add support for live block copy Markus Armbruster
2011-02-23 20:06                     ` Anthony Liguori
2011-02-24 12:15                       ` Markus Armbruster
2011-02-25  7:16                   ` Stefan Hajnoczi
2011-02-23 17:49               ` Marcelo Tosatti
2011-02-24  8:58                 ` Avi Kivity
2011-02-24 15:14                   ` Marcelo Tosatti
2011-02-24 15:28                     ` Avi Kivity
2011-02-24 16:39                       ` Marcelo Tosatti
2011-02-24 17:32                         ` Avi Kivity
2011-02-24 17:45                         ` Anthony Liguori
2011-02-27  9:22                           ` Avi Kivity
2011-02-23 12:46         ` Avi Kivity
2011-02-22 20:50   ` Anthony Liguori
2011-02-22 21:16   ` [Qemu-devel] " Anthony Liguori
2011-02-23 19:06   ` Anthony Liguori
2011-02-26  0:02     ` Marcelo Tosatti
2011-02-26 13:45       ` Anthony Liguori
2011-02-28 19:09         ` Marcelo Tosatti
2011-03-01  2:35         ` Marcelo Tosatti
2011-02-26 15:32       ` Anthony Liguori
2011-02-22 17:00 ` [Qemu-devel] [patch 3/3] do not allow migration if block copy in progress Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6A6E38.4030700@redhat.com \
    --to=avi@redhat.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.