From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:52842)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1RiPxQ-0000wZ-VJ
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 07:28:44 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1RiPxO-0001k9-JT
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 07:28:40 -0500
Received: from mx1.redhat.com ([209.132.183.28]:26211)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1RiPxO-0001k4-60
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 07:28:38 -0500
Message-ID: <4F0445EE.9010905@redhat.com>
Date: Wed, 04 Jan 2012 13:28:30 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <032f49425e7284e9f050064cd30855bb@mail.dlh.net>
	<4F03AD98.7020700@linux.vnet.ibm.com>
	<4F042FA1.5090909@dlh.net> <4F04326F.8080808@redhat.com>
	<4F043689.2000604@dlh.net> <4F0437DA.8080600@redhat.com>
	<4F043B12.60501@dlh.net>
In-Reply-To: <4F043B12.60501@dlh.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of
	memory
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Lieven <pl@dlh.net>
Cc: Shu Ming <shuming@linux.vnet.ibm.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org

On 01/04/2012 12:42 PM, Peter Lieven wrote:
>>
> ok, then i misunderstood the ram blocks thing. i thought the guest ram
> would consist of a collection of ram blocks.
> then let me describe it differntly. would it make sense to process
> bigger portions of memory (e.g. 1M) in stage 2 to reduce the number of
> calls to cpu_physical_memory_reset_dirty and instead run it on bigger
> portions of memory. we might loose a few dirty pages but they will be
> tracked in the next iteration in stage 2 or in stage 3 at least. what
> would be necessary is that nobody marks a page dirty
> while i copy the dirty information for the portion of memory i want to
> process.

Dirty memory tracking is done by the hypervisor and must be done at page 
granularity.

>>> - in stage 3 the vm is stopped, right? so there can't be any more dirty
>>> blocks after scanning the whole memory once?
>>
>> No, stage 3 is entered when there are very few dirty memory pages
>> remaining.  This may happen after scanning the whole memory many
>> times.  It may even never happen if migration does not converge
>> because of low bandwidth or too strict downtime requirements.
>>
> ok, is there a chance that i lose one final page if it is modified just
> after i walked over it and i found no other page dirty (so bytes_sent = 0).

No, of course not.  Stage 3 will send all missing pages while the VM is 
stopped.  There is a chance that the guest will go crazy and start 
touching lots of pages at exactly the wrong time, and thus the downtime 
will be longer than expected.  However, that's a necessary evil; if you 
cannot accept that, post-copy migration would provide a completely 
different set of tradeoffs.

(BTW, bytes_sent = 0 is very rare).

Paolo