From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:44867) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiOvm-0000kC-PV for qemu-devel@nongnu.org; Wed, 04 Jan 2012 06:22:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RiOvj-0000Bi-Bv for qemu-devel@nongnu.org; Wed, 04 Jan 2012 06:22:54 -0500 Received: from ssl.dlh.net ([91.198.192.8]:37575) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiOvj-0000BV-71 for qemu-devel@nongnu.org; Wed, 04 Jan 2012 06:22:51 -0500 Message-ID: <4F043689.2000604@dlh.net> Date: Wed, 04 Jan 2012 12:22:49 +0100 From: Peter Lieven MIME-Version: 1.0 References: <032f49425e7284e9f050064cd30855bb@mail.dlh.net> <4F03AD98.7020700@linux.vnet.ibm.com> <4F042FA1.5090909@dlh.net> <4F04326F.8080808@redhat.com> In-Reply-To: <4F04326F.8080808@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Shu Ming , qemu-devel@nongnu.org, kvm@vger.kernel.org On 04.01.2012 12:05, Paolo Bonzini wrote: > On 01/04/2012 11:53 AM, Peter Lieven wrote: >> On 04.01.2012 02:38, Shu Ming wrote: >>> On 2012-1-4 2:04, Peter Lieven wrote: >>>> Hi all, >>>> >>>> is there any known issue when migrating VMs with a lot of (e.g. 32GB) >>>> of memory. >>>> It seems that there is some portion in the migration code which takes >>>> too much time when the number >>>> of memory pages is large. >>>> >>>> Symptoms are: Irresponsive VNC connection, VM stalls and also >>>> irresponsive QEMU Monitor (via TCP). >>>> >>>> The problem seems to be worse on 10G connections between 2 Nodes (i >>>> already tried limiting the >>>> bandwidth with the migrate_set_speed command) than on 1G connections. >>> Is the migration accomplished finally? How long will that be? I did a >>> test on VM with 4G and it took me about two seconds. >> it seems that the majority of time (90%) is lost in: >> >> cpu_physical_memory_reset_dirty(current_addr, >> current_addr + TARGET_PAGE_SIZE, >> MIGRATION_DIRTY_FLAG); >> >> anyone any idea, to improve this? > > There were patches to move RAM migration to a separate thread. The > problem is that they broke block migration. > > However, asynchronous NBD is in and streaming will follow suit soon. > As soon as we have those two features, we might as well remove the > block migration code. ok, so its a matter of time, right? would it make sense to patch ram_save_block to always process a full ram block? i think of copying the dirty information for the whole block then reset the dirty information for the complete block and then process the the pages that have been dirty before the reset. questions: - how big can ram blocks be? - is it possible that ram blocks differ in size? - in stage 3 the vm is stopped, right? so there can't be any more dirty blocks after scanning the whole memory once? peter > > Paolo