From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LaX7C-0000pA-PO
	for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:34 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LaX7C-0000oq-3Y
	for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:34 -0500
Received: from [199.232.76.173] (port=32924 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LaX7B-0000of-VM
	for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:33 -0500
Received: from mail-qy0-f20.google.com ([209.85.221.20]:53702)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1LaX7B-0003CP-AD
	for qemu-devel@nongnu.org; Fri, 20 Feb 2009 10:16:33 -0500
Received: by qyk13 with SMTP id 13so1779695qyk.10
	for <qemu-devel@nongnu.org>; Fri, 20 Feb 2009 07:16:30 -0800 (PST)
Message-ID: <499EC92E.9000401@codemonkey.ws>
Date: Fri, 20 Feb 2009 09:15:58 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC] More robust migration
References: <499EBFD8.50307@amd.com>
In-Reply-To: <499EBFD8.50307@amd.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Hi Andre,

Andre Przywara wrote:
> Hi,
>
> after fiddling around with migration (and the data dumped into the 
> stream) I found the current concept possesses some shortcomings.

Yikes :-)  FWIW, I focused a lot on robustness in the implementation so 
hopefully a lot of what you mention below were conscious decisions with 
very specific reasoning.

> I am interested in your opinions whether it is worth to implement a 
> new improved format.

FWIW, the format is sufficiently versioned that it isn't necessary to 
completely change it (not that I think it needs changing).

> Issues I would like to address:
> 1. Transfer configuration data. Currently there is no VM configuration 
> data transferred with the stream.

Yes, the difficulty here is that we need to transfer the machine 
configuration but not the host configuration.  Management tools should 
decide how to configure the host on the target side but we should be 
passing the machine configuration.

If you've been following the config file threads, I've mentioned this as 
a use case for the current design a number of times.  We would pass a 
flattened device tree as another savevm section with a well known name 
(like "machine").  Given the semantics of the current migration 
protocol, this would ensure that the machine generated on the remote 
node was exactly the same as the source node.

> One has to start QEMU/KVM with the _exact_ same parameters on the 
> other side to allow migration. If there would be a pseudo-device 
> (transferred first) holding these parameters (and other runtime 
> dependent stuff like kvm_enabled()) this would ease migration a lot.

FWIW, there's nothing preventing migrating from TCG -> KVM.

I think one can debate about whether host config should be migrated 
too.  I'd argue that in the core migration protocol, host config should 
not be present.  I think you can have an easier to use migration 
protocol (like the old ssh protocol) that also transferred host config.  
But in the general case, you want management tools to be able to 
manipulate host config upon migration.

> 2. Introduce a length field to the header of each device.

IMHO, this would reduce robustness.  It's also difficult because of the 
way savevm registration works.  You don't know how large a section is 
until it's written and migration streams are not seekable.

> This would allow to skip unknown (or unwanted) devices.

No good can come from this.  If you have an unknown section, you must 
throw and error and stop the migration.  What if this is for a device 
that the guest is interacting with?  The device just disappears after 
migration?   All savevm state is state that affects the functionality of 
a guest.  Throwing away this state will change the functionality of the 
VM and migration should not affect guest functionality.

> I know this imposes a bit of a challenge, because the length is not 
> always known in advance, but one could overcome this (by using the 
> buffer to patch in the length later for instance).

What are the use cases where you think this would be beneficial?  I 
really see the change in semantics from the old way (throwing away 
unknown sections) to the new way (requiring strict versioning and 
validating all sections) as being a huge step toward robustness.

>
> 3. Make the device versioning really bulletproof. Currently some 
> devices dump different data depending on runtime (or better 
> time-of-creation) state (for instance hw/i8254.c: if (s->irq_timer)...).

If you look carefully, s->irq_timer will always be set.  The checks are 
unnecessary.

> Another example is the (x86?) CPU state, which differs with KVM 
> en/disabled.

Not in upstream QEMU...

> Some devices even dump host system dependent structures (like struct 
> vecio in virtio-blk.c).

That is awful and needs to be fixed.  It should have never been 
committed like that.

>
> Also one could create some kind of (limited) upward compatibility, so 
> older QEMU versions ignore additional, but optional fields in a device 
> state (similar to the ext2 compatibility scheme). Maybe this could be 
> done by an external converter program.

To me, ignoring is always a bad thing.  It's almost always going to be 
unsafe.  Doesn't this decrease robustness by being less conservative?

> 4. Allow optional devices. Some devices are always started (like 
> HPET), although they don't need to be used by the OS. If one migrates 
> such a guest from say KVM-83 to KVM-81, it will fail, because KVM-81 
> does not support HPET. One could migrate the device only if it has 
> been used.

There's no way you can migrate from KVM-83 to KVM-81 if you've enabled 
the HPET.  It cannot be made to work.

There is a -no-hpet option though.  If you are a management tool that 
needs to support migration from multiple versions, you should use 
-no-hpet.  Also, if you need to migrate from KVM-81 to KVM-83, you 
should use -no-hpet with KVM-83 to avoid changing the guest visible state.

In the long run, the machine configuration file will address this in a 
more thorough manner.  FWIW, -no-hpet was added specifically to deal 
with migration.

> In general I would like to know whether QEMU migration is intended to 
> be used in such a flexible manner or whether the requirement of the 
> exact same software version on both side is not a limitation in 
> everyday use.

My primary goal for migration is robustness.  I do not think it's a good 
idea to support any circumstances that could introduce changes in guest 
visible state during a live migration.

Live migration is a critical feature for many production environments.  
To be useful IMHO, it has to be bullet-proof.

Regards,

Anthony Liguori

> Awaiting your comments!
>
> Regards,
> Andre.
>