From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LaX7C-0000pA-PO for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:34 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LaX7C-0000oq-3Y for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:34 -0500 Received: from [199.232.76.173] (port=32924 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LaX7B-0000of-VM for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:33 -0500 Received: from mail-qy0-f20.google.com ([209.85.221.20]:53702) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LaX7B-0003CP-AD for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:33 -0500 Received: by qyk13 with SMTP id 13so1779695qyk.10 for ; Fri, 20 Feb 2009 07:16:30 -0800 (PST) Message-ID: <499EC92E.9000401@codemonkey.ws> Date: Fri, 20 Feb 2009 09:15:58 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] More robust migration References: <499EBFD8.50307@amd.com> In-Reply-To: <499EBFD8.50307@amd.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi Andre, Andre Przywara wrote: > Hi, > > after fiddling around with migration (and the data dumped into the > stream) I found the current concept possesses some shortcomings. Yikes :-) FWIW, I focused a lot on robustness in the implementation so hopefully a lot of what you mention below were conscious decisions with very specific reasoning. > I am interested in your opinions whether it is worth to implement a > new improved format. FWIW, the format is sufficiently versioned that it isn't necessary to completely change it (not that I think it needs changing). > Issues I would like to address: > 1. Transfer configuration data. Currently there is no VM configuration > data transferred with the stream. Yes, the difficulty here is that we need to transfer the machine configuration but not the host configuration. Management tools should decide how to configure the host on the target side but we should be passing the machine configuration. If you've been following the config file threads, I've mentioned this as a use case for the current design a number of times. We would pass a flattened device tree as another savevm section with a well known name (like "machine"). Given the semantics of the current migration protocol, this would ensure that the machine generated on the remote node was exactly the same as the source node. > One has to start QEMU/KVM with the _exact_ same parameters on the > other side to allow migration. If there would be a pseudo-device > (transferred first) holding these parameters (and other runtime > dependent stuff like kvm_enabled()) this would ease migration a lot. FWIW, there's nothing preventing migrating from TCG -> KVM. I think one can debate about whether host config should be migrated too. I'd argue that in the core migration protocol, host config should not be present. I think you can have an easier to use migration protocol (like the old ssh protocol) that also transferred host config. But in the general case, you want management tools to be able to manipulate host config upon migration. > 2. Introduce a length field to the header of each device. IMHO, this would reduce robustness. It's also difficult because of the way savevm registration works. You don't know how large a section is until it's written and migration streams are not seekable. > This would allow to skip unknown (or unwanted) devices. No good can come from this. If you have an unknown section, you must throw and error and stop the migration. What if this is for a device that the guest is interacting with? The device just disappears after migration? All savevm state is state that affects the functionality of a guest. Throwing away this state will change the functionality of the VM and migration should not affect guest functionality. > I know this imposes a bit of a challenge, because the length is not > always known in advance, but one could overcome this (by using the > buffer to patch in the length later for instance). What are the use cases where you think this would be beneficial? I really see the change in semantics from the old way (throwing away unknown sections) to the new way (requiring strict versioning and validating all sections) as being a huge step toward robustness. > > 3. Make the device versioning really bulletproof. Currently some > devices dump different data depending on runtime (or better > time-of-creation) state (for instance hw/i8254.c: if (s->irq_timer)...). If you look carefully, s->irq_timer will always be set. The checks are unnecessary. > Another example is the (x86?) CPU state, which differs with KVM > en/disabled. Not in upstream QEMU... > Some devices even dump host system dependent structures (like struct > vecio in virtio-blk.c). That is awful and needs to be fixed. It should have never been committed like that. > > Also one could create some kind of (limited) upward compatibility, so > older QEMU versions ignore additional, but optional fields in a device > state (similar to the ext2 compatibility scheme). Maybe this could be > done by an external converter program. To me, ignoring is always a bad thing. It's almost always going to be unsafe. Doesn't this decrease robustness by being less conservative? > 4. Allow optional devices. Some devices are always started (like > HPET), although they don't need to be used by the OS. If one migrates > such a guest from say KVM-83 to KVM-81, it will fail, because KVM-81 > does not support HPET. One could migrate the device only if it has > been used. There's no way you can migrate from KVM-83 to KVM-81 if you've enabled the HPET. It cannot be made to work. There is a -no-hpet option though. If you are a management tool that needs to support migration from multiple versions, you should use -no-hpet. Also, if you need to migrate from KVM-81 to KVM-83, you should use -no-hpet with KVM-83 to avoid changing the guest visible state. In the long run, the machine configuration file will address this in a more thorough manner. FWIW, -no-hpet was added specifically to deal with migration. > In general I would like to know whether QEMU migration is intended to > be used in such a flexible manner or whether the requirement of the > exact same software version on both side is not a limitation in > everyday use. My primary goal for migration is robustness. I do not think it's a good idea to support any circumstances that could introduce changes in guest visible state during a live migration. Live migration is a critical feature for many production environments. To be useful IMHO, it has to be bullet-proof. Regards, Anthony Liguori > Awaiting your comments! > > Regards, > Andre. >