From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory Date: Wed, 04 Jan 2012 12:28:26 +0100 Message-ID: <4F0437DA.8080600@redhat.com> References: <032f49425e7284e9f050064cd30855bb@mail.dlh.net> <4F03AD98.7020700@linux.vnet.ibm.com> <4F042FA1.5090909@dlh.net> <4F04326F.8080808@redhat.com> <4F043689.2000604@dlh.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Shu Ming , qemu-devel@nongnu.org, kvm@vger.kernel.org To: Peter Lieven Return-path: Received: from mx1.redhat.com ([209.132.183.28]:38585 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752785Ab2ADL2e (ORCPT ); Wed, 4 Jan 2012 06:28:34 -0500 In-Reply-To: <4F043689.2000604@dlh.net> Sender: kvm-owner@vger.kernel.org List-ID: On 01/04/2012 12:22 PM, Peter Lieven wrote: >> There were patches to move RAM migration to a separate thread. The >> problem is that they broke block migration. >> >> However, asynchronous NBD is in and streaming will follow suit soon. >> As soon as we have those two features, we might as well remove the >> block migration code. > > ok, so its a matter of time, right? Well, there are other solutions of varying complexity in the works, that might remove the need for the migration thread or at least reduce the problem (post-copy migration, XBRLE, vectorized hot loops). But yes, we are aware of the problem and we should solve it in one way or the other. > would it make sense to patch ram_save_block to always process a full ram > block? If I understand the proposal, then migration would hardly be live anymore. The biggest RAM block in a 32G machine is, well, 32G big. Other RAM blocks are for the VRAM and for some BIOS data, but they are very small in proportion. > - in stage 3 the vm is stopped, right? so there can't be any more dirty > blocks after scanning the whole memory once? No, stage 3 is entered when there are very few dirty memory pages remaining. This may happen after scanning the whole memory many times. It may even never happen if migration does not converge because of low bandwidth or too strict downtime requirements. Paolo