From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=45964 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNeXK-0007fb-NY for qemu-devel@nongnu.org; Tue, 30 Nov 2010 23:43:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNR1q-0000N1-0U for qemu-devel@nongnu.org; Tue, 30 Nov 2010 09:17:59 -0500 Received: from mail-gx0-f195.google.com ([209.85.161.195]:36832) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNR1p-0000Ly-Qr for qemu-devel@nongnu.org; Tue, 30 Nov 2010 09:17:57 -0500 Received: by gxk6 with SMTP id 6so1895236gxk.10 for ; Tue, 30 Nov 2010 06:17:46 -0800 (PST) Message-ID: <4CF50783.90402@codemonkey.ws> Date: Tue, 30 Nov 2010 08:17:39 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com> <4CF45D67.5010906@codemonkey.ws> <4CF4A478.8080209@redhat.com> <4CF5008F.2090306@codemonkey.ws> <4CF5030B.40703@redhat.com> In-Reply-To: <4CF5030B.40703@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Paolo Bonzini , Juan Quintela , qemu-devel@nongnu.org, kvm-devel , Juan Quintela On 11/30/2010 07:58 AM, Avi Kivity wrote: > On 11/30/2010 03:47 PM, Anthony Liguori wrote: >> On 11/30/2010 01:15 AM, Paolo Bonzini wrote: >>> On 11/30/2010 03:11 AM, Anthony Liguori wrote: >>>> >>>> BufferedFile should hit the qemu_file_rate_limit check when the socket >>>> buffer gets filled up. >>> >>> The problem is that the file rate limit is not hit because work is >>> done elsewhere. The rate can limit the bandwidth used and makes >>> QEMU aware that socket operations may block (because that's what the >>> buffered file freeze/unfreeze logic does); but it cannot be used to >>> limit the _time_ spent in the migration code. >> >> Yes, it can, if you set the rate limit sufficiently low. >> >> The caveats are 1) the kvm.ko interface for dirty bits doesn't scale >> for large memory guests so we spend a lot more CPU time walking it >> than we should 2) zero pages cause us to burn a lot more CPU time >> than we otherwise would because compressing them is so effective. > > What's the problem with burning that cpu? per guest page, compressing > takes less than sending. Is it just an issue of qemu mutex hold time? If you have a 512GB guest, then you have a 16MB dirty bitmap which ends up being an 128MB dirty bitmap in QEMU because we represent dirty bits with 8 bits. Walking 16mb (or 128mb) of memory just fine find a few pages to send over the wire is a big waste of CPU time. If kvm.ko used a multi-level table to represent dirty info, we could walk the memory mapping at 2MB chunks allowing us to skip a large amount of the comparisons. >> In the short term, fixing (2) by accounting zero pages as full sized >> pages should "fix" the problem. >> >> In the long term, we need a new dirty bit interface from kvm.ko that >> uses a multi-level table. That should dramatically improve scan >> performance. > > Why would a multi-level table help? (or rather, please explain what > you mean by a multi-level table). > > Something we could do is divide memory into more slots, and polling > each slot when we start to scan its page range. That reduces the time > between sampling a page's dirtiness and sending it off, and reduces > the latency incurred by the sampling. There are also > non-interface-changing ways to reduce this latency, like O(1) write > protection, or using dirty bits instead of write protection when > available. BTW, we should also refactor qemu to use the kvm dirty bitmap directly instead of mapping it to the main dirty bitmap. >> We also need to implement live migration in a separate thread that >> doesn't carry qemu_mutex while it runs. > > IMO that's the biggest hit currently. Yup. That's the Correct solution to the problem. Regards, Anthony Liguori