All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@trasno.org>,
	kvm-devel <kvm@vger.kernel.org>
Subject: Re: [PATCH 09/10] Exit loop if we have been there too long
Date: Tue, 30 Nov 2010 08:17:39 -0600	[thread overview]
Message-ID: <4CF50783.90402@codemonkey.ws> (raw)
In-Reply-To: <4CF5030B.40703@redhat.com>

On 11/30/2010 07:58 AM, Avi Kivity wrote:
> On 11/30/2010 03:47 PM, Anthony Liguori wrote:
>> On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
>>> On 11/30/2010 03:11 AM, Anthony Liguori wrote:
>>>>
>>>> BufferedFile should hit the qemu_file_rate_limit check when the socket
>>>> buffer gets filled up.
>>>
>>> The problem is that the file rate limit is not hit because work is 
>>> done elsewhere.  The rate can limit the bandwidth used and makes 
>>> QEMU aware that socket operations may block (because that's what the 
>>> buffered file freeze/unfreeze logic does); but it cannot be used to 
>>> limit the _time_ spent in the migration code.
>>
>> Yes, it can, if you set the rate limit sufficiently low.
>>
>> The caveats are 1) the kvm.ko interface for dirty bits doesn't scale 
>> for large memory guests so we spend a lot more CPU time walking it 
>> than we should 2) zero pages cause us to burn a lot more CPU time 
>> than we otherwise would because compressing them is so effective.
>
> What's the problem with burning that cpu?  per guest page, compressing 
> takes less than sending.  Is it just an issue of qemu mutex hold time?

If you have a 512GB guest, then you have a 16MB dirty bitmap which ends 
up being an 128MB dirty bitmap in QEMU because we represent dirty bits 
with 8 bits.

Walking 16mb (or 128mb) of memory just fine find a few pages to send 
over the wire is a big waste of CPU time.  If kvm.ko used a multi-level 
table to represent dirty info, we could walk the memory mapping at 2MB 
chunks allowing us to skip a large amount of the comparisons.

>> In the short term, fixing (2) by accounting zero pages as full sized 
>> pages should "fix" the problem.
>>
>> In the long term, we need a new dirty bit interface from kvm.ko that 
>> uses a multi-level table.  That should dramatically improve scan 
>> performance. 
>
> Why would a multi-level table help?  (or rather, please explain what 
> you mean by a multi-level table).
>
> Something we could do is divide memory into more slots, and polling 
> each slot when we start to scan its page range.  That reduces the time 
> between sampling a page's dirtiness and sending it off, and reduces 
> the latency incurred by the sampling.  There are also 
> non-interface-changing ways to reduce this latency, like O(1) write 
> protection, or using dirty bits instead of write protection when 
> available.

BTW, we should also refactor qemu to use the kvm dirty bitmap directly 
instead of mapping it to the main dirty bitmap.

>> We also need to implement live migration in a separate thread that 
>> doesn't carry qemu_mutex while it runs.
>
> IMO that's the biggest hit currently.

Yup.  That's the Correct solution to the problem.

Regards,

Anthony Liguori


  reply	other threads:[~2010-11-30 15:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1290552026.git.quintela@redhat.com>
     [not found] ` <9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com>
     [not found]   ` <4CF45D67.5010906@codemonkey.ws>
     [not found]     ` <4CF4A478.8080209@redhat.com>
2010-11-30 13:47       ` [PATCH 09/10] Exit loop if we have been there too long Anthony Liguori
2010-11-30 13:58         ` Avi Kivity
2010-11-30 14:17           ` Anthony Liguori [this message]
2010-11-30 14:27             ` Avi Kivity
2010-11-30 14:50               ` Anthony Liguori
2010-12-01 12:40                 ` Avi Kivity
2010-11-30 17:43               ` Juan Quintela
2010-12-01  1:20               ` Takuya Yoshikawa
2010-12-01  1:52                 ` Juan Quintela
2010-12-01  2:22                   ` Takuya Yoshikawa
2010-12-01 12:35                   ` Avi Kivity
2010-12-01 13:45                     ` Juan Quintela
2010-12-02  1:31                     ` Takuya Yoshikawa
2010-12-02  8:37                       ` Avi Kivity
2010-11-30 14:12         ` Paolo Bonzini
2010-11-30 15:00           ` Anthony Liguori
2010-11-30 17:59             ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF50783.90402@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=quintela@trasno.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.