From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=39964 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PscgC-0005un-Uc
	for qemu-devel@nongnu.org; Thu, 24 Feb 2011 10:00:34 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PscgB-0008GL-Eg
	for qemu-devel@nongnu.org; Thu, 24 Feb 2011 10:00:32 -0500
Received: from mail-vx0-f173.google.com ([209.85.220.173]:54955)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PscgB-0008GB-AG
	for qemu-devel@nongnu.org; Thu, 24 Feb 2011 10:00:31 -0500
Received: by vxb41 with SMTP id 41so535428vxb.4
	for <qemu-devel@nongnu.org>; Thu, 24 Feb 2011 07:00:30 -0800 (PST)
Message-ID: <4D667287.9010005@codemonkey.ws>
Date: Thu, 24 Feb 2011 09:00:23 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
References: <20110222170004.808373778@redhat.com>	<20110222170115.710717278@redhat.com>	<4D642181.4080509@codemonkey.ws>	<20110222210735.GA9372@amt.cnet>	<4D64266A.3060106@codemonkey.ws>	<20110222230935.GA11082@amt.cnet>	<4D644343.4050800@codemonkey.ws>	<4D65051A.6070707@redhat.com>	<4D651B20.70405@codemonkey.ws>	<4D652852.60505@redhat.com>	<4D652F73.3000305@codemonkey.ws>	<4D65324A.5080408@redhat.com>	<4D65359E.3040008@codemonkey.ws>
	<4D65416D.8040803@redhat.com>	<4D656B97.5030301@codemonkey.ws>
	<4D661CB8.6010305@redhat.com>
In-Reply-To: <4D661CB8.6010305@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel@nongnu.org

On 02/24/2011 02:54 AM, Avi Kivity wrote:
> On 02/23/2011 10:18 PM, Anthony Liguori wrote:
>>> Then the management stack has to worry about yet another way of 
>>> interacting via qemu.
>>
>>
>> { 'StateItem': { 'key': 'str', 'value': 'str' } }
>> { 'StateSection': { 'kind': 'str', 'name': 'str', 'items': [ 
>> 'StateItem' ] } }
>> { 'StateInfo': { 'sections': [ 'StateSection' ] } }
>>
>> { 'query-state', {}, {}, 'StateInfo' }
>>
>> A management tool never need to worry about anything other than this 
>> command if it so chooses.  If we have the pre-machine init mode for 
>> 0.16, then this can even be used to inspect state without running a 
>> guest.
>
> So we have yet another information tree.  If we store the cd-rom eject 
> state here, then we need to make an association between the device 
> path of the cd-rom, and the StateItem key.

And this linkage is key.

Let's say I launch QEMU with:

qemu -cdrom ~/foo.img

And then in the monitor, I do:

(qemu) eject ide1-cd0

The question is, what command can I now use to launch the same qemu 
instance?

When I think of stateful config, what I really think of is a way to spit 
out a command line that essentially becomes, "this is how you now launch 
QEMU".

In this case, it would be:

qemu -cdrom ~/foo.img -device ide-disk,id=ide1-cd0,drive=

Or, we could think of this in terms of:

qemu -cdrom ~/foo.img -readconfig foo.cfg

Where foo.cfg contained:

[device "ide1-cd0"]
driver="ide-disk"
drive=""

So what I'm really suggesting is that we generate foo.cfg whenever 
monitor commands do things that change the command line and introduce a 
new option to reflect this, IOW:

qemu -cdrom ~/foo.img -config foo.cfg


> Far better to store it in the device itself.  For example, we could 
> make a layered block format driver that stores the eject state and a 
> "backing file" containing the actual media.  Eject and media change 
> would be recorded in the block format driver's state.  You could then 
> hot-unplug a USB cd-writer and hot-plug it back into a different 
> guest, implementing a virtual sneakernet.

I think you're far too hung up on "store it in the device itself".  The 
recipe to create the device model is not intrinsic to the device model.  
It's an independent thing that's a combination of the command line 
arguments and any executed monitor commands.

Maybe a better way to think about the stateful config file is a 
mechanism to replay the monitor history.

>>
>> The fact that the state is visible in the filesystem is an 
>> implementation detail.
>
> A detail that has to be catered for by the management stack - it has 
> to provide a safe place for it, back it up, etc.

If it cares for QEMU to preserve state.  Today, this all gets thrown away.

>> It doesn't work for eject unless you interpose an acknowledged 
>> event.  Ultimately, this is a simple problem.  If you want 
>> reliability, we either need symmetric RPCs so that the device model 
>> can call (and wait) to the management layer to acknowledge a change 
>> or QEMU can post an event to the management layer, and maintain the 
>> state in a reliable fashion.
>
> I don't see why it doesn't work.  Please explain.

1) guest eject
2) qemu posts eject event
3) qemu acknowledges eject to the guest
4) management tool sees eject event and updates guest config

There's a race between 3 & 4.  It can only be addressed by interposing 4 
between 2 and 3 OR making qemu persist this state between 2 and 3 such 
that the management tool can reliably query it.

>>>> You still have the race condition around guest initiated events 
>>>> like eject.  Unless you have an acknowledged event from a 
>>>> management tool (which we can't do in QMP today) whereas you don't 
>>>> complete the guest initiated eject operation until management ack's 
>>>> it, we need to store that state ourself.
>>>
>>> I don't see why.
>>>
>>> If management crashes, it queries the eject state when it reconnects 
>>> to qemu.
>>> If qemu crashes, the eject state is lost, but that is fine.  My 
>>> CD-ROM drive tray pulls itself in when the machine is started.
>>
>> Pick any of a number of possible events that change the machine's 
>> state.  We can wave our hands at some things saying they don't matter 
>> and do one off solutions for others, or we can just have a robust way 
>> of handling this consistently.
>
> Both block live copy and cd-rom eject state can be solved with layered 
> block format drivers.  I don't think a central place for random data 
> makes sense.  State belongs near the device that maintains it, esp. if 
> the device is hot-pluggable, so it's easy to associate the state with 
> the device.
>
>>>
>>> You're introducing the need for additional code in the management 
>>> layer, the care and feeding for the stateful non-config file.
>>
>> If a management layer ignores the stateful non-config file, as you 
>> like to call it, it'll get the same semantics it has today.  I think 
>> managing a single thing is a whole lot easier than managing an NVRAM 
>> file, a block migration layering file, and all of the future things 
>> we're going to add once we decide they are important too.
>
> I disagree.  Storing NVRAM as a disk image is a simple extension of 
> existing management tools.  Block live-copy and cd-rom eject state 
> also make sense as per-image state if you take hotunplug and hotplug 
> into account.

Everything can be stored in a block driver but when the data is highly 
structured, isn't it nice to expose it in a structured, human readable 
way?  I know I'd personally prefer a text representation of CMOS than a 
binary blob.

>>
>>>>> If qemu crashes, these events are meaningless.  If management 
>>>>> crashes, it has to query qemu for all state that it wants to keep 
>>>>> track of via events.
>>>>
>>>> Think power failure, not qemu crash.  In the event of a power 
>>>> failure, any hardware change initiated by the guest ought to be 
>>>> consistent with when the guest has restarted.  If you eject the 
>>>> CDROM tray and then lose power, its still ejected after the power 
>>>> comes back on.
>>>
>>> Not on all machines.
>>>
>>> Let's list guest state which is independent of power.  That would be 
>>> wither NVRAM of various types, or physical alterations.  CD-ROM 
>>> eject is one.  Are there others?
>>
>> Any indirect qemu state.  Block migration is an example, but other 
>> examples would be VNC server information (like current password), WCE 
>> setting (depending on whether we modelled eeprom for the drivers), 
>> and persisted device settings (lots of devices have eeprom these days).
>
> Device settings should be stored with the devices, not with qemu.
>
> Suppose we take the cold-plug on startup via the monitor approach.  So 
> we start with a bare machine, cold plug stuff into it.  Now qemu has 
> to reconcile the stateful non-config file with the hardware.  What if 
> something has changed?  A device moved into a different slot?

Sorry, I'm confused.  Is there anything in the stateful config file when 
we start up?  If so, the act of starting up will add a bunch of hardware.

> If a network card has eeprom, we can specify it with -device 
> rtl8139,eeprom=id, where id specifies a disk image for the eeprom.

We could, but then we'll end up with a bunch of little block devices.  
That seems less than ideal to me.

Technically, mac address is stored on eeprom and we store that as a 
device property today.  We can't persist device properties even though 
you can change the mac address of a network card and it does persist 
across reboots.  Are you advocating that we introduce an eeprom for 
every network card (all in a slightly different format) and have special 
tools to manipulate the eeprom to store and view the mac address?

Regards,

Anthony Liguori

>>> I think my solution (multiplexing block format driver) fits the 
>>> requirements for live-copy perfectly.  In fact it has a name - it's 
>>> a RAID-1 driver started in degraded mode.  It could be useful other 
>>> use cases.
>>
>> It feels a bit awkward to me to be honest.
>>
>
> Not to me.
>