From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:44867)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@dlh.net>)
	id 1RiOvm-0000kC-PV
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 06:22:58 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@dlh.net>) id 1RiOvj-0000Bi-Bv
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 06:22:54 -0500
Received: from ssl.dlh.net ([91.198.192.8]:37575)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@dlh.net>)
	id 1RiOvj-0000BV-71
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 06:22:51 -0500
Message-ID: <4F043689.2000604@dlh.net>
Date: Wed, 04 Jan 2012 12:22:49 +0100
From: Peter Lieven <pl@dlh.net>
MIME-Version: 1.0
References: <032f49425e7284e9f050064cd30855bb@mail.dlh.net>
	<4F03AD98.7020700@linux.vnet.ibm.com>
	<4F042FA1.5090909@dlh.net> <4F04326F.8080808@redhat.com>
In-Reply-To: <4F04326F.8080808@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of
	memory
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shu Ming <shuming@linux.vnet.ibm.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org

On 04.01.2012 12:05, Paolo Bonzini wrote:
> On 01/04/2012 11:53 AM, Peter Lieven wrote:
>> On 04.01.2012 02:38, Shu Ming wrote:
>>> On 2012-1-4 2:04, Peter Lieven wrote:
>>>> Hi all,
>>>>
>>>> is there any known issue when migrating VMs with a lot of (e.g. 32GB)
>>>> of memory.
>>>> It seems that there is some portion in the migration code which takes
>>>> too much time when the number
>>>> of memory pages is large.
>>>>
>>>> Symptoms are: Irresponsive VNC connection, VM stalls and also
>>>> irresponsive QEMU Monitor (via TCP).
>>>>
>>>> The problem seems to be worse on 10G connections between 2 Nodes (i
>>>> already tried limiting the
>>>> bandwidth with the migrate_set_speed command) than on 1G connections.
>>> Is the migration accomplished finally? How long will that be? I did a
>>> test on VM with 4G and it took me about two seconds.
>> it seems that the majority of time (90%) is lost in:
>>
>> cpu_physical_memory_reset_dirty(current_addr,
>> current_addr + TARGET_PAGE_SIZE,
>> MIGRATION_DIRTY_FLAG);
>>
>> anyone any idea, to improve this?
>
> There were patches to move RAM migration to a separate thread.  The 
> problem is that they broke block migration.
>
> However, asynchronous NBD is in and streaming will follow suit soon.  
> As soon as we have those two features, we might as well remove the 
> block migration code.

ok, so its a matter of time, right?

would it make sense to patch ram_save_block to always process a full ram 
block?
i think of copying the dirty information for the whole block then reset 
the dirty information for the complete block and then process
the the pages that have been dirty before the reset.

questions:
  - how big can ram blocks be?
  - is it possible that ram blocks differ in size?
  - in stage 3 the vm is stopped, right? so there can't be any more 
dirty blocks after scanning the whole memory once?

peter

>
> Paolo