From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Laa6Z-0004WU-98 for qemu-devel@nongnu.org; Fri, 20 Feb 2009 13:28:07 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Laa6U-0004UH-JX for qemu-devel@nongnu.org; Fri, 20 Feb 2009 13:28:06 -0500 Received: from [199.232.76.173] (port=52939 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Laa6U-0004UE-G7 for qemu-devel@nongnu.org; Fri, 20 Feb 2009 13:28:02 -0500 Received: from mail-qy0-f20.google.com ([209.85.221.20]:39648) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Laa6U-0004o0-1b for qemu-devel@nongnu.org; Fri, 20 Feb 2009 13:28:02 -0500 Received: by qyk13 with SMTP id 13so1938902qyk.10 for ; Fri, 20 Feb 2009 10:28:01 -0800 (PST) Message-ID: <499EF612.8060905@codemonkey.ws> Date: Fri, 20 Feb 2009 12:27:30 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] More robust migration References: <499EBFD8.50307@amd.com> <499EC92E.9000401@codemonkey.ws> <20090220163705.GB9726@shareable.org> In-Reply-To: <20090220163705.GB9726@shareable.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Jamie Lokier wrote: > Anthony Liguori wrote: > >>> 2. Introduce a length field to the header of each device. >>> >> IMHO, this would reduce robustness. It's also difficult because of the >> way savevm registration works. You don't know how large a section is >> until it's written and migration streams are not seekable. >> > > The way HTTP deals with not knowing the size in advance is is to split > data into chunks, each chunk the size of a small write buffer, and a > chunk size is written in front of each one. This allows storing > sections of binary data whose size isn't known in advance, but still > safely skip them. > > >>> This would allow to skip unknown (or unwanted) devices. >>> >> No good can come from this. If you have an unknown section, you must >> throw and error and stop the migration. What if this is for a device >> that the guest is interacting with? The device just disappears after >> migration? All savevm state is state that affects the functionality of >> a guest. Throwing away this state will change the functionality of the >> VM and migration should not affect guest functionality. >> > > What if you're migrating from a snapshot made on a host with some > pass-through USB device to another host which cannot provide the same > device. In that case I'd like the option for the guest to see the > device has disappeared. Maybe it's stopped working (HPET), or maybe > it's unplugged (anything hot unpluggable). > Stop working is IMHO unacceptable. Devices that support hot plugging, you can hot unplug and *then* perform the migration. In general, hot unplugging requires guest cooperation FWIW. Bad things will often happen if you just yank a USB cable out of your computer. > That's preferable to not being able to use the snapshot at all, > effectively having to trash it. > I disagree. Something that is broken in an unknown way is not better than having something gracefully fail. If you do hardware pass through, forget about snapshotting/migration/etc. >> What are the use cases where you think this would be beneficial? I >> really see the change in semantics from the old way (throwing away >> unknown sections) to the new way (requiring strict versioning and >> validating all sections) as being a huge step toward robustness. >> > > I've been upset at a "savevm" which I wrote with some past version of > QEMU that I couldn't load in a later version. It wasn't obvious why, > just that it refused. And I didn't have the old version, or even know > which the old version was. And even if I could have reconstructed the > old QEMU - I wanted to migrate to a newer version. It's no fun having > to reconstruct a carefully primed guest snapshot test state from its > reboot, if that can be avoided. > Device configuration files will go a long way to upgrading. Sometimes you have to blacklist older versions of devices because there were bugs in the save/restore functions. In that case, there's really nothing we can do. Your snapshot was invalid. >> My primary goal for migration is robustness. I do not think it's a good >> idea to support any circumstances that could introduce changes in guest >> visible state during a live migration. >> > > What about safe hotpluggable devices? > Make your changes in the guest to allow safe unplug, then unplug, then migrate. Regards, Anthony Liguori