From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Lieven Subject: Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory Date: Wed, 04 Jan 2012 15:21:56 +0100 Message-ID: <4F046084.3080104@dlh.net> References: <032f49425e7284e9f050064cd30855bb@mail.dlh.net> <4F03AD98.7020700@linux.vnet.ibm.com> <4F042FA1.5090909@dlh.net> <4F04326F.8080808@redhat.com> <4F043689.2000604@dlh.net> <4F0437DA.8080600@redhat.com> <4F043B12.60501@dlh.net> <4F0445EE.9010905@redhat.com> <4F044F42.2050508@dlh.net> <4F045ED0.1030309@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Shu Ming , qemu-devel@nongnu.org, kvm@vger.kernel.org To: Paolo Bonzini Return-path: Received: from ssl.dlh.net ([91.198.192.8]:33208 "EHLO ssl.dlh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754388Ab2ADOV5 (ORCPT ); Wed, 4 Jan 2012 09:21:57 -0500 In-Reply-To: <4F045ED0.1030309@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 04.01.2012 15:14, Paolo Bonzini wrote: > On 01/04/2012 02:08 PM, Peter Lieven wrote: >> >> thus my only option at the moment is to limit the runtime of the while >> loop in stage 2 or >> are there any post 1.0 patches in git that might already help? > > No; even though (as I said) people are aware of the problems and do > plan to fix them, don't hold your breath. :( ok, just for the record. if someone wants the time limit patch for the while loop in stage 2 (which solves the problem for me) and after some tweaking is able to provide a throughput of approx. 450MB/s in my case, i attached it. it also solves the case that due to a lot of dups the rate_limit does not kick in and end the while loop. --- qemu-kvm-1.0/arch_init.c.orig 2012-01-04 14:21:02.000000000 +0100 +++ qemu-kvm-1.0/arch_init.c 2012-01-04 14:27:34.000000000 +0100 @@ -301,6 +301,8 @@ bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); + int pages_read = 0; + while ((ret = qemu_file_rate_limit(f)) == 0) { int bytes_sent; @@ -309,6 +311,11 @@ if (bytes_sent == 0) { /* no more blocks */ break; } + if (!(++pages_read & 0xff)) { + if ((qemu_get_clock_ns(rt_clock) - bwidth) > migrate_max_downtime()) + break; /* we have spent more than allowed downtime in this iteration */ + } } if (ret < 0) {