From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=37539 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PtiaV-0002TU-LJ
	for qemu-devel@nongnu.org; Sun, 27 Feb 2011 10:31:12 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1PtiaU-0005qk-3V
	for qemu-devel@nongnu.org; Sun, 27 Feb 2011 10:31:11 -0500
Received: from mx1.redhat.com ([209.132.183.28]:52411)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1PtiaT-0005qf-P6
	for qemu-devel@nongnu.org; Sun, 27 Feb 2011 10:31:10 -0500
Message-ID: <4D6A6E38.4030700@redhat.com>
Date: Sun, 27 Feb 2011 17:31:04 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
References: <20110222170004.808373778@redhat.com>	<20110222170115.710717278@redhat.com>	<4D642181.4080509@codemonkey.ws>	<20110222210735.GA9372@amt.cnet>	<4D64266A.3060106@codemonkey.ws>	<20110222230935.GA11082@amt.cnet>	<4D644343.4050800@codemonkey.ws>	<4D65051A.6070707@redhat.com>	<4D651B20.70405@codemonkey.ws>	<4D652852.60505@redhat.com>	<4D652F73.3000305@codemonkey.ws>	<4D65324A.5080408@redhat.com>	<4D65359E.3040008@codemonkey.ws>	<4D65416D.8040803@redhat.com>	<4D656B97.5030301@codemonkey.ws>	<4D661CB8.6010305@redhat.com>	<4D667287.9010005@codemonkey.ws>
	<4D6677BE.2030009@redhat.com>	<4D669C46.40909@codemonkey.ws>
	<4D6A150B.8030205@redhat.com> <4D6A58E0.9020607@codemonkey.ws>
In-Reply-To: <4D6A58E0.9020607@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel@nongnu.org

On 02/27/2011 04:00 PM, Anthony Liguori wrote:
> On 02/27/2011 03:10 AM, Avi Kivity wrote:
>> On 02/24/2011 07:58 PM, Anthony Liguori wrote:
>>>> If you move the cdrom to a different IDE channel, you have to 
>>>> update the stateful non-config file.
>>>>
>>>> Whereas if you do
>>>>
>>>>    $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
>>>>    $ qemu -cdrom ~/foo-media-tray.img
>>>>
>>>> the cd-rom tray state will be tracked in the image file.
>>>
>>>
>>> Yeah, but how do you move it? 
>>
>> There is no need to move the file at all.  Simply point the new drive 
>> at the media tray.
>
> No, I was asking, how do you move the cdrom to a different IDE 
> channel.  Are you using QMP?  Are you changing the command line 
> arguments?

Yes.

If we're doing hot-move (not really relevant to ide-cd) then you'd use 
QMP.  If you're editing a virtual machine that is down, or scheduling a 
change for the next reboot, then you're using command line arguments (or 
cold-plugging into a stopped guest).

Requiring management to remember the old configuration and issue delta 
commands to move the device for the cold-plug case is increased 
complexity IMO.

>
>>
>>> If you do a remove/add through QMP, then the config file will 
>>> reflect things just fine.
>>
>> If all access to the state file is through QMP then it becomes more 
>> palatable.  A bit on that later.
>
> As I think I've mentioned before, I hadn't really thought about an 
> opaque state file but I'm not necessary opposed to it.  I don't see an 
> obvious advantage to making it opaque but I agree it should be 
> accessible via QMP.

The advantage is that we keep the management tool talking to one 
interface (I don't think we should prevent users from interpreting it, 
just make it unnecessary).

>>
>> I thought that's what I'm doing by separating the state out.  It's 
>> easy for management to assemble configuration from their database and 
>> convert it into a centralized representation (like a qemu command 
>> line).  It's a lot harder to disassemble a central state 
>> representation and move it back to the database.
>>
>> Using QMP is better than directly accessing the state file since qemu 
>> does the disassembly for you (provided the command references the 
>> device using its normal path, not some random key).  The file just 
>> becomes a way to survive a crash, and all management needs to know 
>> about is to make it available and back it up.  But it means that 
>> everything must be done via QMP, including assembly of the machine, 
>> otherwise the state file can become stale.
>>
>> Separating the state out to the device is even easier, since 
>> management is already expected to take care of disk images.  All 
>> that's needed is to create the media tray image once, then you can 
>> forget about it completely.
>
> Except that instead of having one state file, we might have a dozen 
> additional "device state" files.

That is fine.  We already have one state file per block device.

>>> QEMU.   No question about it.  At any point in time, we are the 
>>> authoritative source of what the guest's configuration is.  There's 
>>> no doubt about it.  A management tool can try to keep up with us, 
>>> but ultimately we are the only ones that know for sure.
>>>
>>> We have all of this information internally.  Just persisting it is 
>>> not a major architectural change.  It's something we should have 
>>> been doing (arguably) from the very beginning.
>>
>> That's a huge divergence from how management tools are written.
>
> This is one of the reasons why management tooling around QEMU needs 
> quite a bit of improving.
>
> There is simply no way a management tool can do a good job of being an 
> authoritative source of configuration.  The races we're discussion is 
> a good example of why.

What we're discussing is not configuration.  It is non-volatile state.  
Configuration comes from the user; state comes from the guest (the 
management tool may edit state; but the guest cannot edit the 
configuration).

I agree 100% the management tool cannot be the authoritative source of 
state.

My position is:
- the management tool should be 100% in control of configuration (how 
the guest is put together from its components)
- qemu should be 100% in control of state (memory, disk state, NVRAM in 
various components, cd-rom eject state, explosive bolts for payload 
separation, self-destruct mechanism, etc.)
- the management tool should have access to state using the same 
identifiers it used to create the devices that contain the state
- it is preferable to store state "in" the device so that when the 
configuration changes, state is maintained (like hot-unplug of a network 
card with NVRAM followed by hot-plug of the same card).
- the angular momentum of the planet we (presumably) are on won't 
change, whatever we do [1]

>
> But beyond those races, QEMU is the only entity that knows with 
> certainty what bits of information are important to persist in order 
> to preserve a guest across shutdown/restart.  The fact that we've 
> punted this problem for so long has only ensured that management tools 
> are either intrinsically broken or only support the most minimal 
> subset of functionality we actually support.

I'm not arguing about that.  I just want to stress again the difference 
between state and configuration.  Qemu has no authority, in my mind, as 
to configuration.  Only state.

>>   Currently they contain the required guest configuration, a 
>> representation of what's the current live configuration, and they 
>> issue monitor commands to move the live configuration towards the 
>> required configuration (or just generate a qemu command line).  What 
>> you're describing is completely different, I'm not even sure what it is.
>
> Management tools shouldn't have to think about how the monitor 
> commands they issue impact the invocation options of QEMU.

They have to, when creating a guest from scratch.

But I admit, this throws a new light (for me) on things.  What's the 
implications?
- must have a qemu instance running when editing configuration, even 
when the guest is down
- cannot add additional information to configuration; must store it in 
an external database and cross-reference it with the qemu data using the 
device ID
- when editing non-hotpluggable configuration for the next boot, must 
maintain old config somewhere, so we can issue delta commands later 
(might be needed for current way of doing things)
- no transactions/queries/etc except on non-authoritative source
- issues with shared-nothing design (well, can store the configuration 
file using DRBD).

>>
>> If you look at management tools, they believe they are the 
>> authoritative source of configuration information (not guest state, 
>> which is more or less ignored).
>
> It's because we've given them no other option.

It's the natural way of doing it.  You have a web interface that talks 
to a database.  When you want to list all VMs that have network cards on 
the production subnet, you issue a database query and get a recordset.  
How do you do that when the authoritative source of information is 
spread across a cluster?

>
>>>>
>>>> Right, but we should make it easy, not hard.
>>>
>>> Yeah, I fail to see how this makes it hard.  We conveniently are 
>>> saying, hey, this is all the state that needs to be persisted.  
>>> We'll persist it for you if you want, otherwise, we'll expose it in 
>>> a central location.
>>
>> The state-in-a-file is just a blob.  Don't expect the tool to parse 
>> it and reassociate the various bits to its own representation.  
>> Exposing it via QMP commands is a lot better though.
>
> I don't really see this as being a major issue.  There's no such thing 
> as a "blob".  If someone wants to manipulate the state, they will.   
> We need to keep compatibility to support migrating from 
> version-to-version.
>
> I agree that we want to provide QMP interfaces to work with the state 
> file.  But I don't think we should be hostile to manual manipulation.

No, not hostile.  We should make QMP commands sufficient to deal with 
it, that's all.


[1] in fact, it does change, due to tidal effects.

-- 
error compiling committee.c: too many arguments to function