From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=43605 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PsWxu-0007Ij-3Y for qemu-devel@nongnu.org; Thu, 24 Feb 2011 03:54:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PsWxs-0000Os-Jw for qemu-devel@nongnu.org; Thu, 24 Feb 2011 03:54:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:29679) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PsWxs-0000Of-8a for qemu-devel@nongnu.org; Thu, 24 Feb 2011 03:54:24 -0500 Message-ID: <4D661CB8.6010305@redhat.com> Date: Thu, 24 Feb 2011 10:54:16 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy References: <20110222170004.808373778@redhat.com> <20110222170115.710717278@redhat.com> <4D642181.4080509@codemonkey.ws> <20110222210735.GA9372@amt.cnet> <4D64266A.3060106@codemonkey.ws> <20110222230935.GA11082@amt.cnet> <4D644343.4050800@codemonkey.ws> <4D65051A.6070707@redhat.com> <4D651B20.70405@codemonkey.ws> <4D652852.60505@redhat.com> <4D652F73.3000305@codemonkey.ws> <4D65324A.5080408@redhat.com> <4D65359E.3040008@codemonkey.ws> <4D65416D.8040803@redhat.com> <4D656B97.5030301@codemonkey.ws> In-Reply-To: <4D656B97.5030301@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti , qemu-devel@nongnu.org On 02/23/2011 10:18 PM, Anthony Liguori wrote: >> Then the management stack has to worry about yet another way of >> interacting via qemu. > > > { 'StateItem': { 'key': 'str', 'value': 'str' } } > { 'StateSection': { 'kind': 'str', 'name': 'str', 'items': [ > 'StateItem' ] } } > { 'StateInfo': { 'sections': [ 'StateSection' ] } } > > { 'query-state', {}, {}, 'StateInfo' } > > A management tool never need to worry about anything other than this > command if it so chooses. If we have the pre-machine init mode for > 0.16, then this can even be used to inspect state without running a > guest. So we have yet another information tree. If we store the cd-rom eject state here, then we need to make an association between the device path of the cd-rom, and the StateItem key. Far better to store it in the device itself. For example, we could make a layered block format driver that stores the eject state and a "backing file" containing the actual media. Eject and media change would be recorded in the block format driver's state. You could then hot-unplug a USB cd-writer and hot-plug it back into a different guest, implementing a virtual sneakernet. > > The fact that the state is visible in the filesystem is an > implementation detail. A detail that has to be catered for by the management stack - it has to provide a safe place for it, back it up, etc. > >> I'd like to limit it to the monitor. >> >>>> >>>> Doesn't the stateful non-config file becomes a failure point? It >>>> has to be on shared and redundant storage? >>> >>> It depends on what your availability model is and how frequently >>> your management tool backs up the config. As of right now, we have >>> a pretty glaring reliability hole here so adding a stateful >>> "non-config" can only improve things. >> >> I think the solutions I pointed out close the hole with the existing >> interfaces. > > It doesn't work for eject unless you interpose an acknowledged event. > Ultimately, this is a simple problem. If you want reliability, we > either need symmetric RPCs so that the device model can call (and > wait) to the management layer to acknowledge a change or QEMU can post > an event to the management layer, and maintain the state in a reliable > fashion. I don't see why it doesn't work. Please explain. >>> You still have the race condition around guest initiated events like >>> eject. Unless you have an acknowledged event from a management tool >>> (which we can't do in QMP today) whereas you don't complete the >>> guest initiated eject operation until management ack's it, we need >>> to store that state ourself. >> >> I don't see why. >> >> If management crashes, it queries the eject state when it reconnects >> to qemu. >> If qemu crashes, the eject state is lost, but that is fine. My >> CD-ROM drive tray pulls itself in when the machine is started. > > Pick any of a number of possible events that change the machine's > state. We can wave our hands at some things saying they don't matter > and do one off solutions for others, or we can just have a robust way > of handling this consistently. Both block live copy and cd-rom eject state can be solved with layered block format drivers. I don't think a central place for random data makes sense. State belongs near the device that maintains it, esp. if the device is hot-pluggable, so it's easy to associate the state with the device. >> >> You're introducing the need for additional code in the management >> layer, the care and feeding for the stateful non-config file. > > If a management layer ignores the stateful non-config file, as you > like to call it, it'll get the same semantics it has today. I think > managing a single thing is a whole lot easier than managing an NVRAM > file, a block migration layering file, and all of the future things > we're going to add once we decide they are important too. I disagree. Storing NVRAM as a disk image is a simple extension of existing management tools. Block live-copy and cd-rom eject state also make sense as per-image state if you take hotunplug and hotplug into account. > >>>> If qemu crashes, these events are meaningless. If management >>>> crashes, it has to query qemu for all state that it wants to keep >>>> track of via events. >>> >>> Think power failure, not qemu crash. In the event of a power >>> failure, any hardware change initiated by the guest ought to be >>> consistent with when the guest has restarted. If you eject the >>> CDROM tray and then lose power, its still ejected after the power >>> comes back on. >> >> Not on all machines. >> >> Let's list guest state which is independent of power. That would be >> wither NVRAM of various types, or physical alterations. CD-ROM eject >> is one. Are there others? > > Any indirect qemu state. Block migration is an example, but other > examples would be VNC server information (like current password), WCE > setting (depending on whether we modelled eeprom for the drivers), and > persisted device settings (lots of devices have eeprom these days). Device settings should be stored with the devices, not with qemu. Suppose we take the cold-plug on startup via the monitor approach. So we start with a bare machine, cold plug stuff into it. Now qemu has to reconcile the stateful non-config file with the hardware. What if something has changed? A device moved into a different slot? If a network card has eeprom, we can specify it with -device rtl8139,eeprom=id, where id specifies a disk image for the eeprom. >> I think my solution (multiplexing block format driver) fits the >> requirements for live-copy perfectly. In fact it has a name - it's a >> RAID-1 driver started in degraded mode. It could be useful other use >> cases. > > It feels a bit awkward to me to be honest. > Not to me. -- error compiling committee.c: too many arguments to function