From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=37539 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PtiaV-0002TU-LJ for qemu-devel@nongnu.org; Sun, 27 Feb 2011 10:31:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PtiaU-0005qk-3V for qemu-devel@nongnu.org; Sun, 27 Feb 2011 10:31:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52411) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PtiaT-0005qf-P6 for qemu-devel@nongnu.org; Sun, 27 Feb 2011 10:31:10 -0500 Message-ID: <4D6A6E38.4030700@redhat.com> Date: Sun, 27 Feb 2011 17:31:04 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy References: <20110222170004.808373778@redhat.com> <20110222170115.710717278@redhat.com> <4D642181.4080509@codemonkey.ws> <20110222210735.GA9372@amt.cnet> <4D64266A.3060106@codemonkey.ws> <20110222230935.GA11082@amt.cnet> <4D644343.4050800@codemonkey.ws> <4D65051A.6070707@redhat.com> <4D651B20.70405@codemonkey.ws> <4D652852.60505@redhat.com> <4D652F73.3000305@codemonkey.ws> <4D65324A.5080408@redhat.com> <4D65359E.3040008@codemonkey.ws> <4D65416D.8040803@redhat.com> <4D656B97.5030301@codemonkey.ws> <4D661CB8.6010305@redhat.com> <4D667287.9010005@codemonkey.ws> <4D6677BE.2030009@redhat.com> <4D669C46.40909@codemonkey.ws> <4D6A150B.8030205@redhat.com> <4D6A58E0.9020607@codemonkey.ws> In-Reply-To: <4D6A58E0.9020607@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti , qemu-devel@nongnu.org On 02/27/2011 04:00 PM, Anthony Liguori wrote: > On 02/27/2011 03:10 AM, Avi Kivity wrote: >> On 02/24/2011 07:58 PM, Anthony Liguori wrote: >>>> If you move the cdrom to a different IDE channel, you have to >>>> update the stateful non-config file. >>>> >>>> Whereas if you do >>>> >>>> $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img >>>> $ qemu -cdrom ~/foo-media-tray.img >>>> >>>> the cd-rom tray state will be tracked in the image file. >>> >>> >>> Yeah, but how do you move it? >> >> There is no need to move the file at all. Simply point the new drive >> at the media tray. > > No, I was asking, how do you move the cdrom to a different IDE > channel. Are you using QMP? Are you changing the command line > arguments? Yes. If we're doing hot-move (not really relevant to ide-cd) then you'd use QMP. If you're editing a virtual machine that is down, or scheduling a change for the next reboot, then you're using command line arguments (or cold-plugging into a stopped guest). Requiring management to remember the old configuration and issue delta commands to move the device for the cold-plug case is increased complexity IMO. > >> >>> If you do a remove/add through QMP, then the config file will >>> reflect things just fine. >> >> If all access to the state file is through QMP then it becomes more >> palatable. A bit on that later. > > As I think I've mentioned before, I hadn't really thought about an > opaque state file but I'm not necessary opposed to it. I don't see an > obvious advantage to making it opaque but I agree it should be > accessible via QMP. The advantage is that we keep the management tool talking to one interface (I don't think we should prevent users from interpreting it, just make it unnecessary). >> >> I thought that's what I'm doing by separating the state out. It's >> easy for management to assemble configuration from their database and >> convert it into a centralized representation (like a qemu command >> line). It's a lot harder to disassemble a central state >> representation and move it back to the database. >> >> Using QMP is better than directly accessing the state file since qemu >> does the disassembly for you (provided the command references the >> device using its normal path, not some random key). The file just >> becomes a way to survive a crash, and all management needs to know >> about is to make it available and back it up. But it means that >> everything must be done via QMP, including assembly of the machine, >> otherwise the state file can become stale. >> >> Separating the state out to the device is even easier, since >> management is already expected to take care of disk images. All >> that's needed is to create the media tray image once, then you can >> forget about it completely. > > Except that instead of having one state file, we might have a dozen > additional "device state" files. That is fine. We already have one state file per block device. >>> QEMU. No question about it. At any point in time, we are the >>> authoritative source of what the guest's configuration is. There's >>> no doubt about it. A management tool can try to keep up with us, >>> but ultimately we are the only ones that know for sure. >>> >>> We have all of this information internally. Just persisting it is >>> not a major architectural change. It's something we should have >>> been doing (arguably) from the very beginning. >> >> That's a huge divergence from how management tools are written. > > This is one of the reasons why management tooling around QEMU needs > quite a bit of improving. > > There is simply no way a management tool can do a good job of being an > authoritative source of configuration. The races we're discussion is > a good example of why. What we're discussing is not configuration. It is non-volatile state. Configuration comes from the user; state comes from the guest (the management tool may edit state; but the guest cannot edit the configuration). I agree 100% the management tool cannot be the authoritative source of state. My position is: - the management tool should be 100% in control of configuration (how the guest is put together from its components) - qemu should be 100% in control of state (memory, disk state, NVRAM in various components, cd-rom eject state, explosive bolts for payload separation, self-destruct mechanism, etc.) - the management tool should have access to state using the same identifiers it used to create the devices that contain the state - it is preferable to store state "in" the device so that when the configuration changes, state is maintained (like hot-unplug of a network card with NVRAM followed by hot-plug of the same card). - the angular momentum of the planet we (presumably) are on won't change, whatever we do [1] > > But beyond those races, QEMU is the only entity that knows with > certainty what bits of information are important to persist in order > to preserve a guest across shutdown/restart. The fact that we've > punted this problem for so long has only ensured that management tools > are either intrinsically broken or only support the most minimal > subset of functionality we actually support. I'm not arguing about that. I just want to stress again the difference between state and configuration. Qemu has no authority, in my mind, as to configuration. Only state. >> Currently they contain the required guest configuration, a >> representation of what's the current live configuration, and they >> issue monitor commands to move the live configuration towards the >> required configuration (or just generate a qemu command line). What >> you're describing is completely different, I'm not even sure what it is. > > Management tools shouldn't have to think about how the monitor > commands they issue impact the invocation options of QEMU. They have to, when creating a guest from scratch. But I admit, this throws a new light (for me) on things. What's the implications? - must have a qemu instance running when editing configuration, even when the guest is down - cannot add additional information to configuration; must store it in an external database and cross-reference it with the qemu data using the device ID - when editing non-hotpluggable configuration for the next boot, must maintain old config somewhere, so we can issue delta commands later (might be needed for current way of doing things) - no transactions/queries/etc except on non-authoritative source - issues with shared-nothing design (well, can store the configuration file using DRBD). >> >> If you look at management tools, they believe they are the >> authoritative source of configuration information (not guest state, >> which is more or less ignored). > > It's because we've given them no other option. It's the natural way of doing it. You have a web interface that talks to a database. When you want to list all VMs that have network cards on the production subnet, you issue a database query and get a recordset. How do you do that when the authoritative source of information is spread across a cluster? > >>>> >>>> Right, but we should make it easy, not hard. >>> >>> Yeah, I fail to see how this makes it hard. We conveniently are >>> saying, hey, this is all the state that needs to be persisted. >>> We'll persist it for you if you want, otherwise, we'll expose it in >>> a central location. >> >> The state-in-a-file is just a blob. Don't expect the tool to parse >> it and reassociate the various bits to its own representation. >> Exposing it via QMP commands is a lot better though. > > I don't really see this as being a major issue. There's no such thing > as a "blob". If someone wants to manipulate the state, they will. > We need to keep compatibility to support migrating from > version-to-version. > > I agree that we want to provide QMP interfaces to work with the state > file. But I don't think we should be hostile to manual manipulation. No, not hostile. We should make QMP commands sufficient to deal with it, that's all. [1] in fact, it does change, due to tidal effects. -- error compiling committee.c: too many arguments to function