From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=43550 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNeTA-0006UT-G6 for qemu-devel@nongnu.org; Tue, 30 Nov 2010 23:39:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNRXQ-0003PD-G7 for qemu-devel@nongnu.org; Tue, 30 Nov 2010 09:50:37 -0500 Received: from mail-qw0-f67.google.com ([209.85.216.67]:51981) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNRXQ-0003Ek-Cq for qemu-devel@nongnu.org; Tue, 30 Nov 2010 09:50:36 -0500 Received: by qwf6 with SMTP id 6so179259qwf.10 for ; Tue, 30 Nov 2010 06:50:30 -0800 (PST) Message-ID: <4CF50F2C.7090503@codemonkey.ws> Date: Tue, 30 Nov 2010 08:50:20 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com> <4CF45D67.5010906@codemonkey.ws> <4CF4A478.8080209@redhat.com> <4CF5008F.2090306@codemonkey.ws> <4CF5030B.40703@redhat.com> <4CF50783.90402@codemonkey.ws> <4CF509C1.9@redhat.com> In-Reply-To: <4CF509C1.9@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Paolo Bonzini , Juan Quintela , qemu-devel@nongnu.org, kvm-devel , Juan Quintela On 11/30/2010 08:27 AM, Avi Kivity wrote: > On 11/30/2010 04:17 PM, Anthony Liguori wrote: >>> What's the problem with burning that cpu? per guest page, >>> compressing takes less than sending. Is it just an issue of qemu >>> mutex hold time? >> >> >> If you have a 512GB guest, then you have a 16MB dirty bitmap which >> ends up being an 128MB dirty bitmap in QEMU because we represent >> dirty bits with 8 bits. > > Was there not a patchset to split each bit into its own bitmap? And > then copy the kvm or qemu master bitmap into each client bitmap as it > became needed? > >> Walking 16mb (or 128mb) of memory just fine find a few pages to send >> over the wire is a big waste of CPU time. If kvm.ko used a >> multi-level table to represent dirty info, we could walk the memory >> mapping at 2MB chunks allowing us to skip a large amount of the >> comparisons. > > There's no reason to assume dirty pages would be clustered. If 0.2% > of memory were dirty, but scattered uniformly, there would be no win > from the two-level bitmap. A loss, in fact: 2MB can be represented as > 512 bits or 64 bytes, just one cache line. Any two-level thing will > need more. > > We might have a more compact encoding for sparse bitmaps, like > run-length encoding. > >> >>>> In the short term, fixing (2) by accounting zero pages as full >>>> sized pages should "fix" the problem. >>>> >>>> In the long term, we need a new dirty bit interface from kvm.ko >>>> that uses a multi-level table. That should dramatically improve >>>> scan performance. >>> >>> Why would a multi-level table help? (or rather, please explain what >>> you mean by a multi-level table). >>> >>> Something we could do is divide memory into more slots, and polling >>> each slot when we start to scan its page range. That reduces the >>> time between sampling a page's dirtiness and sending it off, and >>> reduces the latency incurred by the sampling. There are also >>> non-interface-changing ways to reduce this latency, like O(1) write >>> protection, or using dirty bits instead of write protection when >>> available. >> >> BTW, we should also refactor qemu to use the kvm dirty bitmap >> directly instead of mapping it to the main dirty bitmap. > > That's what the patch set I was alluding to did. Or maybe I imagined > the whole thing. No, it just split the main bitmap into three bitmaps. I'm suggesting that we have the dirty interface have two implementations, one that refers to the 8-bit bitmap when TCG in use and another one that uses the KVM representation. TCG really needs multiple dirty bits but KVM doesn't. A shared implementation really can't be optimal. > >>>> We also need to implement live migration in a separate thread that >>>> doesn't carry qemu_mutex while it runs. >>> >>> IMO that's the biggest hit currently. >> >> Yup. That's the Correct solution to the problem. > > Then let's just Do it. > Yup. Regards, Anthony Liguori