From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=45964 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PNeXK-0007fb-NY
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 23:43:46 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PNR1q-0000N1-0U
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 09:17:59 -0500
Received: from mail-gx0-f195.google.com ([209.85.161.195]:36832)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PNR1p-0000Ly-Qr
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 09:17:57 -0500
Received: by gxk6 with SMTP id 6so1895236gxk.10
	for <qemu-devel@nongnu.org>; Tue, 30 Nov 2010 06:17:46 -0800 (PST)
Message-ID: <4CF50783.90402@codemonkey.ws>
Date: Tue, 30 Nov 2010 08:17:39 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <cover.1290552026.git.quintela@redhat.com>	<9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com>
	<4CF45D67.5010906@codemonkey.ws> <4CF4A478.8080209@redhat.com>
	<4CF5008F.2090306@codemonkey.ws> <4CF5030B.40703@redhat.com>
In-Reply-To: <4CF5030B.40703@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too
	long
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Juan Quintela <quintela@trasno.org>, qemu-devel@nongnu.org, kvm-devel <kvm@vger.kernel.org>, Juan Quintela <quintela@redhat.com>

On 11/30/2010 07:58 AM, Avi Kivity wrote:
> On 11/30/2010 03:47 PM, Anthony Liguori wrote:
>> On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
>>> On 11/30/2010 03:11 AM, Anthony Liguori wrote:
>>>>
>>>> BufferedFile should hit the qemu_file_rate_limit check when the socket
>>>> buffer gets filled up.
>>>
>>> The problem is that the file rate limit is not hit because work is 
>>> done elsewhere.  The rate can limit the bandwidth used and makes 
>>> QEMU aware that socket operations may block (because that's what the 
>>> buffered file freeze/unfreeze logic does); but it cannot be used to 
>>> limit the _time_ spent in the migration code.
>>
>> Yes, it can, if you set the rate limit sufficiently low.
>>
>> The caveats are 1) the kvm.ko interface for dirty bits doesn't scale 
>> for large memory guests so we spend a lot more CPU time walking it 
>> than we should 2) zero pages cause us to burn a lot more CPU time 
>> than we otherwise would because compressing them is so effective.
>
> What's the problem with burning that cpu?  per guest page, compressing 
> takes less than sending.  Is it just an issue of qemu mutex hold time?

If you have a 512GB guest, then you have a 16MB dirty bitmap which ends 
up being an 128MB dirty bitmap in QEMU because we represent dirty bits 
with 8 bits.

Walking 16mb (or 128mb) of memory just fine find a few pages to send 
over the wire is a big waste of CPU time.  If kvm.ko used a multi-level 
table to represent dirty info, we could walk the memory mapping at 2MB 
chunks allowing us to skip a large amount of the comparisons.

>> In the short term, fixing (2) by accounting zero pages as full sized 
>> pages should "fix" the problem.
>>
>> In the long term, we need a new dirty bit interface from kvm.ko that 
>> uses a multi-level table.  That should dramatically improve scan 
>> performance. 
>
> Why would a multi-level table help?  (or rather, please explain what 
> you mean by a multi-level table).
>
> Something we could do is divide memory into more slots, and polling 
> each slot when we start to scan its page range.  That reduces the time 
> between sampling a page's dirtiness and sending it off, and reduces 
> the latency incurred by the sampling.  There are also 
> non-interface-changing ways to reduce this latency, like O(1) write 
> protection, or using dirty bits instead of write protection when 
> available.

BTW, we should also refactor qemu to use the kvm dirty bitmap directly 
instead of mapping it to the main dirty bitmap.

>> We also need to implement live migration in a separate thread that 
>> doesn't carry qemu_mutex while it runs.
>
> IMO that's the biggest hit currently.

Yup.  That's the Correct solution to the problem.

Regards,

Anthony Liguori