From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [PATCH 09/10] Exit loop if we have been there too long
Date: Tue, 30 Nov 2010 08:50:20 -0600
Message-ID: <4CF50F2C.7090503@codemonkey.ws>
References: <cover.1290552026.git.quintela@redhat.com>	<9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com> <4CF45D67.5010906@codemonkey.ws> <4CF4A478.8080209@redhat.com> <4CF5008F.2090306@codemonkey.ws> <4CF5030B.40703@redhat.com> <4CF50783.90402@codemonkey.ws> <4CF509C1.9@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>, qemu-devel@nongnu.org,
	Juan Quintela <quintela@trasno.org>,
	kvm-devel <kvm@vger.kernel.org>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-qw0-f66.google.com ([209.85.216.66]:56481 "EHLO
	mail-qw0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751761Ab0K3Oub (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 30 Nov 2010 09:50:31 -0500
Received: by qwk3 with SMTP id 3so88066qwk.1
        for <kvm@vger.kernel.org>; Tue, 30 Nov 2010 06:50:30 -0800 (PST)
In-Reply-To: <4CF509C1.9@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/30/2010 08:27 AM, Avi Kivity wrote:
> On 11/30/2010 04:17 PM, Anthony Liguori wrote:
>>> What's the problem with burning that cpu?  per guest page, 
>>> compressing takes less than sending.  Is it just an issue of qemu 
>>> mutex hold time?
>>
>>
>> If you have a 512GB guest, then you have a 16MB dirty bitmap which 
>> ends up being an 128MB dirty bitmap in QEMU because we represent 
>> dirty bits with 8 bits.
>
> Was there not a patchset to split each bit into its own bitmap?  And 
> then copy the kvm or qemu master bitmap into each client bitmap as it 
> became needed?
>
>> Walking 16mb (or 128mb) of memory just fine find a few pages to send 
>> over the wire is a big waste of CPU time.  If kvm.ko used a 
>> multi-level table to represent dirty info, we could walk the memory 
>> mapping at 2MB chunks allowing us to skip a large amount of the 
>> comparisons.
>
> There's no reason to assume dirty pages would be clustered.  If 0.2% 
> of memory were dirty, but scattered uniformly, there would be no win 
> from the two-level bitmap.  A loss, in fact: 2MB can be represented as 
> 512 bits or 64 bytes, just one cache line.  Any two-level thing will 
> need more.
>
> We might have a more compact encoding for sparse bitmaps, like 
> run-length encoding.
>
>>
>>>> In the short term, fixing (2) by accounting zero pages as full 
>>>> sized pages should "fix" the problem.
>>>>
>>>> In the long term, we need a new dirty bit interface from kvm.ko 
>>>> that uses a multi-level table.  That should dramatically improve 
>>>> scan performance. 
>>>
>>> Why would a multi-level table help?  (or rather, please explain what 
>>> you mean by a multi-level table).
>>>
>>> Something we could do is divide memory into more slots, and polling 
>>> each slot when we start to scan its page range.  That reduces the 
>>> time between sampling a page's dirtiness and sending it off, and 
>>> reduces the latency incurred by the sampling.  There are also 
>>> non-interface-changing ways to reduce this latency, like O(1) write 
>>> protection, or using dirty bits instead of write protection when 
>>> available.
>>
>> BTW, we should also refactor qemu to use the kvm dirty bitmap 
>> directly instead of mapping it to the main dirty bitmap.
>
> That's what the patch set I was alluding to did.  Or maybe I imagined 
> the whole thing.

No, it just split the main bitmap into three bitmaps.  I'm suggesting 
that we have the dirty interface have two implementations, one that 
refers to the 8-bit bitmap when TCG in use and another one that uses the 
KVM representation.

TCG really needs multiple dirty bits but KVM doesn't.  A shared 
implementation really can't be optimal.

>
>>>> We also need to implement live migration in a separate thread that 
>>>> doesn't carry qemu_mutex while it runs.
>>>
>>> IMO that's the biggest hit currently.
>>
>> Yup.  That's the Correct solution to the problem.
>
> Then let's just Do it.
>

Yup.

Regards,

Anthony Liguori