qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Thu, 24 Feb 2011 11:58:30 -0600	[thread overview]
Message-ID: <4D669C46.40909@codemonkey.ws> (raw)
In-Reply-To: <4D6677BE.2030009@redhat.com>

On 02/24/2011 09:22 AM, Avi Kivity wrote:
> On 02/24/2011 05:00 PM, Anthony Liguori wrote:
>> On 02/24/2011 02:54 AM, Avi Kivity wrote:
>>> On 02/23/2011 10:18 PM, Anthony Liguori wrote:
>>>>> Then the management stack has to worry about yet another way of 
>>>>> interacting via qemu.
>>>>
>>>>
>>>> { 'StateItem': { 'key': 'str', 'value': 'str' } }
>>>> { 'StateSection': { 'kind': 'str', 'name': 'str', 'items': [ 
>>>> 'StateItem' ] } }
>>>> { 'StateInfo': { 'sections': [ 'StateSection' ] } }
>>>>
>>>> { 'query-state', {}, {}, 'StateInfo' }
>>>>
>>>> A management tool never need to worry about anything other than 
>>>> this command if it so chooses.  If we have the pre-machine init 
>>>> mode for 0.16, then this can even be used to inspect state without 
>>>> running a guest.
>>>
>>> So we have yet another information tree.  If we store the cd-rom 
>>> eject state here, then we need to make an association between the 
>>> device path of the cd-rom, and the StateItem key.
>>
>> And this linkage is key.
>>
>> Let's say I launch QEMU with:
>>
>> qemu -cdrom ~/foo.img
>>
>> And then in the monitor, I do:
>>
>> (qemu) eject ide1-cd0
>>
>> The question is, what command can I now use to launch the same qemu 
>> instance?
>>
>> When I think of stateful config, what I really think of is a way to 
>> spit out a command line that essentially becomes, "this is how you 
>> now launch QEMU".
>>
>> In this case, it would be:
>>
>> qemu -cdrom ~/foo.img -device ide-disk,id=ide1-cd0,drive=
>>
>> Or, we could think of this in terms of:
>>
>> qemu -cdrom ~/foo.img -readconfig foo.cfg
>>
>> Where foo.cfg contained:
>>
>> [device "ide1-cd0"]
>> driver="ide-disk"
>> drive=""
>>
>> So what I'm really suggesting is that we generate foo.cfg whenever 
>> monitor commands do things that change the command line and introduce 
>> a new option to reflect this, IOW:
>>
>> qemu -cdrom ~/foo.img -config foo.cfg
>
> If you move the cdrom to a different IDE channel, you have to update 
> the stateful non-config file.
>
> Whereas if you do
>
>    $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
>    $ qemu -cdrom ~/foo-media-tray.img
>
> the cd-rom tray state will be tracked in the image file.

Yeah, but how do you move it?  If you do a remove/add through QMP, then 
the config file will reflect things just fine.

If you want to do it outside of QEMU, then you can just ignore the 
config file and manage all of the state yourself.  But it's never going 
to work as well (it will be racy) and you're pushing a tremendous amount 
of knowledge that ultimately belongs in QEMU (what state needs to 
persist) to something that isn't QEMU which means it's probably not 
going to be done correctly.

I know you're a big fan of the omnipotent management tool but my 
experience has been that we need to help the management tooling folks 
more by expecting less from them.

>>> Far better to store it in the device itself.  For example, we could 
>>> make a layered block format driver that stores the eject state and a 
>>> "backing file" containing the actual media.  Eject and media change 
>>> would be recorded in the block format driver's state.  You could 
>>> then hot-unplug a USB cd-writer and hot-plug it back into a 
>>> different guest, implementing a virtual sneakernet.
>>
>> I think you're far too hung up on "store it in the device itself".  
>> The recipe to create the device model is not intrinsic to the device 
>> model.  It's an independent thing that's a combination of the command 
>> line arguments and any executed monitor commands.
>>
>> Maybe a better way to think about the stateful config file is a 
>> mechanism to replay the monitor history.
>
> Again the question is who is the authoritative source of the 
> configuration.  Is it the management tool or is it qemu?

QEMU.   No question about it.  At any point in time, we are the 
authoritative source of what the guest's configuration is.  There's no 
doubt about it.  A management tool can try to keep up with us, but 
ultimately we are the only ones that know for sure.

We have all of this information internally.  Just persisting it is not a 
major architectural change.  It's something we should have been doing 
(arguably) from the very beginning.

> The management tool already has to keep track of (the optional parts 
> of) the guest device tree.  It cannot start reading the stateful 
> non-config file at random points in time.  So all that is left is the 
> guest controlled portions of the device tree, which are pretty rare, 
> and random events like live-copy migration.  I think that introducing 
> a new authoritative source of information will create a lot of problems.

QEMU has always been the authoritative source.  Nothing new has been 
introduced.  We never persisted the machine's configuration which meant 
management tools had to try to aggressively keep up with us which is 
intrinsically error prone.  Fixing this will only improve existing 
management tools.

>
>>>>
>>>> The fact that the state is visible in the filesystem is an 
>>>> implementation detail.
>>>
>>> A detail that has to be catered for by the management stack - it has 
>>> to provide a safe place for it, back it up, etc.
>>
>> If it cares for QEMU to preserve state.  Today, this all gets thrown 
>> away.
>
> Right, but we should make it easy, not hard.

Yeah, I fail to see how this makes it hard.  We conveniently are saying, 
hey, this is all the state that needs to be persisted.  We'll persist it 
for you if you want, otherwise, we'll expose it in a central location.

If the tool wants to ignore it and guess based on various combinations 
of other commands, more power to it.

>>
>>>> It doesn't work for eject unless you interpose an acknowledged 
>>>> event.  Ultimately, this is a simple problem.  If you want 
>>>> reliability, we either need symmetric RPCs so that the device model 
>>>> can call (and wait) to the management layer to acknowledge a change 
>>>> or QEMU can post an event to the management layer, and maintain the 
>>>> state in a reliable fashion.
>>>
>>> I don't see why it doesn't work.  Please explain.
>>
>> 1) guest eject
>> 2) qemu posts eject event
>> 3) qemu acknowledges eject to the guest
>> 4) management tool sees eject event and updates guest config
>>
>> There's a race between 3 & 4.  It can only be addressed by 
>> interposing 4 between 2 and 3 OR making qemu persist this state 
>> between 2 and 3 such that the management tool can reliably query it.
>
> If "it" is my cd-rom tray block format driver, it works.  It's really 
> the same in action as the stateful non-config, except it's part of the 
> device/image, not a central location.

Because you've introduced a one-off.  Having a bunch of one-offs 
(especially being a bunch of new block formats!) is not going to make 
things simpler for management tools.

>>> I disagree.  Storing NVRAM as a disk image is a simple extension of 
>>> existing management tools.  Block live-copy and cd-rom eject state 
>>> also make sense as per-image state if you take hotunplug and hotplug 
>>> into account.
>>
>> Everything can be stored in a block driver but when the data is 
>> highly structured, isn't it nice to expose it in a structured, human 
>> readable way?  I know I'd personally prefer a text representation of 
>> CMOS than a binary blob.
>
> Have a tool expose it.  Part of the range is unspecified anyway.

I guess we need to agree to disagree then.

> Using a block format driver means that we don't have to care about a 
> crash during a write, that we can snapshot it, etc.

Why?  We always need to care about a crash during write.  What I've been 
thinking for a config file is the class approach of using a ~ and .# 
file to make sure that we write out the new file and then atomically 
rename it to get the new contents.  Yeah, it's a bit heavy weight but 
this shouldn't be a very common thing to update.

>>> Device settings should be stored with the devices, not with qemu.
>>>
>>> Suppose we take the cold-plug on startup via the monitor approach.  
>>> So we start with a bare machine, cold plug stuff into it.  Now qemu 
>>> has to reconcile the stateful non-config file with the hardware.  
>>> What if something has changed?  A device moved into a different slot?
>>
>> Sorry, I'm confused.  Is there anything in the stateful config file 
>> when we start up?  If so, the act of starting up will add a bunch of 
>> hardware.
>
> Suppose it has information about ide1-cd0's media tray.  Now we 
> restart qemu and cold-plug the cdrom into ide0-cd0.  What happens to 
> the information?

Whether media is present is not a property of a blockdev, it's a 
property of a device.  What does it even mean to use your media-tray 
format with something like a CMOS device?

>> Technically, mac address is stored on eeprom and we store that as a 
>> device property today.  We can't persist device properties even 
>> though you can change the mac address of a network card and it does 
>> persist across reboots.  Are you advocating that we introduce an 
>> eeprom for every network card (all in a slightly different format) 
>> and have special tools to manipulate the eeprom to store and view the 
>> mac address?
>
>
> Yes -- if we really want to support it.  Obviously we have to store 
> the entire eeprom, not just the portion containing the MAC address, so 
> it's not just a key/value store.  A card may even have its firmware in 
> flash.

I think that's overengineering.  I think we can go very far by just 
persisting small amounts of information in a central location.  We're 
not building a cycle-accurate simulator here afterall.

Regards,

Anthony Liguori

  reply	other threads:[~2011-02-24 18:04 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 17:00 [Qemu-devel] [patch 0/3] live block copy (v2) Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 1/3] add migration_active function Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 2/3] Add support for live block copy Marcelo Tosatti
2011-02-22 20:50   ` [Qemu-devel] " Anthony Liguori
2011-02-22 21:07     ` Marcelo Tosatti
2011-02-22 21:11       ` Anthony Liguori
2011-02-22 23:09         ` Marcelo Tosatti
2011-02-22 23:14           ` Anthony Liguori
2011-02-23 13:01             ` Avi Kivity
2011-02-23 14:35               ` Anthony Liguori
2011-02-23 15:31                 ` Avi Kivity
2011-02-23 16:01                   ` Anthony Liguori
2011-02-23 16:14                     ` Avi Kivity
2011-02-23 16:28                       ` Anthony Liguori
2011-02-23 17:18                         ` Avi Kivity
2011-02-23 20:18                           ` Anthony Liguori
2011-02-23 20:44                             ` Marcelo Tosatti
2011-02-23 21:41                               ` Anthony Liguori
2011-02-24 14:39                                 ` Marcelo Tosatti
2011-02-24  7:37                             ` Markus Armbruster
2011-02-24  8:54                             ` Avi Kivity
2011-02-24 15:00                               ` Anthony Liguori
2011-02-24 15:22                                 ` Avi Kivity
2011-02-24 17:58                                   ` Anthony Liguori [this message]
2011-02-27  9:10                                     ` Avi Kivity
2011-02-27  9:55                                       ` Dor Laor
2011-02-27 13:49                                         ` Anthony Liguori
2011-02-27 16:02                                           ` Dor Laor
2011-02-27 17:25                                             ` Anthony Liguori
2011-02-28  8:58                                               ` Dor Laor
2011-02-27 14:00                                       ` Anthony Liguori
2011-02-27 15:31                                         ` Avi Kivity
2011-02-27 17:41                                           ` Anthony Liguori
2011-02-28  8:38                                             ` Avi Kivity
2011-02-28 12:45                                               ` Anthony Liguori
2011-02-28 13:21                                                 ` Avi Kivity
2011-02-28 17:33                                                   ` Anthony Liguori
2011-02-28 17:47                                                     ` Avi Kivity
2011-02-28 18:12                                                       ` Anthony Liguori
     [not found]                                                         ` <4D6CB556.5060401@redhat.c! om>
     [not found]                                                         ` <4D6CBECF.8090805@redhat.c! om>
2011-03-01  8:59                                                         ` Dor Laor
2011-03-02 12:39                                                           ` Anthony Liguori
2011-03-02 13:00                                                             ` Avi Kivity
2011-03-02 15:07                                                               ` Anthony Liguori
2011-03-01  9:39                                                         ` Avi Kivity
2011-03-01 15:51                                                           ` Anthony Liguori
2011-03-01 22:27                                                             ` Dor Laor
2011-03-02 16:30                                                             ` Avi Kivity
2011-03-02 21:55                                                               ` Anthony Liguori
2011-02-28 18:56                                                       ` Marcelo Tosatti
2011-03-01  9:45                                                         ` Avi Kivity
2011-02-23 16:17                     ` Peter Maydell
2011-02-23 16:30                       ` Anthony Liguori
2011-02-24  5:41                         ` [Qemu-devel] Unsubsribing James Brown
2011-02-24 10:00                           ` Stefan Hajnoczi
2011-02-23 17:26                   ` [Qemu-devel] Re: [patch 2/3] Add support for live block copy Markus Armbruster
2011-02-23 20:06                     ` Anthony Liguori
2011-02-24 12:15                       ` Markus Armbruster
2011-02-25  7:16                   ` Stefan Hajnoczi
2011-02-23 17:49               ` Marcelo Tosatti
2011-02-24  8:58                 ` Avi Kivity
2011-02-24 15:14                   ` Marcelo Tosatti
2011-02-24 15:28                     ` Avi Kivity
2011-02-24 16:39                       ` Marcelo Tosatti
2011-02-24 17:32                         ` Avi Kivity
2011-02-24 17:45                         ` Anthony Liguori
2011-02-27  9:22                           ` Avi Kivity
2011-02-23 12:46         ` Avi Kivity
2011-02-22 20:50   ` Anthony Liguori
2011-02-22 21:16   ` [Qemu-devel] " Anthony Liguori
2011-02-23 19:06   ` Anthony Liguori
2011-02-26  0:02     ` Marcelo Tosatti
2011-02-26 13:45       ` Anthony Liguori
2011-02-28 19:09         ` Marcelo Tosatti
2011-03-01  2:35         ` Marcelo Tosatti
2011-02-26 15:32       ` Anthony Liguori
2011-02-22 17:00 ` [Qemu-devel] [patch 3/3] do not allow migration if block copy in progress Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D669C46.40909@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=Jes.Sorensen@redhat.com \
    --cc=avi@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).