All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
To: Avi Kivity <avi@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	Marcelo Tosatti <mtosatti@redhat.com>, KVM <kvm@vger.kernel.org>,
	quintela@redhat.com, qemu-devel@nongnu.org,
	Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Subject: Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
Date: Wed, 30 Nov 2011 14:02:42 +0900	[thread overview]
Message-ID: <4ED5B8F2.2090804@oss.ntt.co.jp> (raw)
In-Reply-To: <4ED4E626.5010507@redhat.com>

CCing qemu devel, Juan,

(2011/11/29 23:03), Avi Kivity wrote:
> On 11/29/2011 02:01 PM, Avi Kivity wrote:
>> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>>
>>>
>>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>>> the following on write fault:
>>>>
>>>
>>>
>>> Thanks for the detail.
>>>
>>> Um, propagating slot bit to parent ptes is little slow, especially, it
>>> is the overload for no Xwindow guests which is dirty logged only in the
>>> migration(i guess most linux guests are running on this mode and migration
>>> is not frequent). No?
>>
>> You need to propagate very infrequently.  The first pte added to a page
>> will need to propagate, but the second (if from the same slot, which is
>> likely) will already have the bit set in the page, so we're assured it's
>> set in all its parents.
>
> btw, if you plan to work on this, let's agree on pseudocode/data
> structures first to minimize churn.  I'll also want this documented in
> mmu.txt.  Of course we can still end up with something different than
> planned, but let's at least try to think of the issues in advance.
>

I want to hear the overall view as well.

Now we are trying to improve cases when there are too many dirty pages during
live migration.

I did some measurements of live migration some months ago on 10Gbps dedicated line,
two servers were directly connected, and checked that transferring only a few MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads: it
matches simple math calculation.

In another test, I found that even in a relatively normal workload, it needed a few
seconds of pause at the last timing.

	Juan has more data?

So, the current scheme is not scalable with respect to the number of dirty pages,
and administrators should control not to migrate during such workload if possible.

	Server consolidation in the night will be OK, but dynamic load balancing
	may not work well in such restrictions: I am now more interested in the
	former.

Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization.  Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?


So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.

IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?

In general, what kind of improvements actually needed for live migration?

Thanks,
	Takuya

WARNING: multiple messages have this Message-ID (diff)
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
To: Avi Kivity <avi@redhat.com>
Cc: KVM <kvm@vger.kernel.org>,
	quintela@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org,
	Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Subject: Re: [Qemu-devel] [PATCH 0/4] KVM: Dirty logging optimization using rmap
Date: Wed, 30 Nov 2011 14:02:42 +0900	[thread overview]
Message-ID: <4ED5B8F2.2090804@oss.ntt.co.jp> (raw)
In-Reply-To: <4ED4E626.5010507@redhat.com>

CCing qemu devel, Juan,

(2011/11/29 23:03), Avi Kivity wrote:
> On 11/29/2011 02:01 PM, Avi Kivity wrote:
>> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>>
>>>
>>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>>> the following on write fault:
>>>>
>>>
>>>
>>> Thanks for the detail.
>>>
>>> Um, propagating slot bit to parent ptes is little slow, especially, it
>>> is the overload for no Xwindow guests which is dirty logged only in the
>>> migration(i guess most linux guests are running on this mode and migration
>>> is not frequent). No?
>>
>> You need to propagate very infrequently.  The first pte added to a page
>> will need to propagate, but the second (if from the same slot, which is
>> likely) will already have the bit set in the page, so we're assured it's
>> set in all its parents.
>
> btw, if you plan to work on this, let's agree on pseudocode/data
> structures first to minimize churn.  I'll also want this documented in
> mmu.txt.  Of course we can still end up with something different than
> planned, but let's at least try to think of the issues in advance.
>

I want to hear the overall view as well.

Now we are trying to improve cases when there are too many dirty pages during
live migration.

I did some measurements of live migration some months ago on 10Gbps dedicated line,
two servers were directly connected, and checked that transferring only a few MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads: it
matches simple math calculation.

In another test, I found that even in a relatively normal workload, it needed a few
seconds of pause at the last timing.

	Juan has more data?

So, the current scheme is not scalable with respect to the number of dirty pages,
and administrators should control not to migrate during such workload if possible.

	Server consolidation in the night will be OK, but dynamic load balancing
	may not work well in such restrictions: I am now more interested in the
	former.

Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization.  Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?


So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.

IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?

In general, what kind of improvements actually needed for live migration?

Thanks,
	Takuya

  reply	other threads:[~2011-11-30  5:02 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
2011-11-14  9:21 ` [PATCH 1/4] KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() Takuya Yoshikawa
2011-11-14  9:22 ` [PATCH 2/4] KVM: MMU: Split gfn_to_rmap() into two functions Takuya Yoshikawa
2011-11-14  9:23 ` [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging Takuya Yoshikawa
2011-11-14 10:07   ` Avi Kivity
2011-11-14 10:24     ` Avi Kivity
2011-12-20  4:29     ` Takuya Yoshikawa
2011-12-23 11:14       ` Marcelo Tosatti
2011-12-24  2:52         ` Takuya Yoshikawa
2011-12-27 13:50           ` Marcelo Tosatti
2011-12-27 14:03             ` Avi Kivity
2011-12-27 15:03               ` Takuya Yoshikawa
2011-12-27 15:06                 ` Avi Kivity
2011-12-27 15:15                   ` Takuya Yoshikawa
2011-12-27 15:18                     ` Avi Kivity
2011-11-14  9:24 ` [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect() Takuya Yoshikawa
2011-11-14 10:22   ` Avi Kivity
2011-11-14 10:29     ` Takuya Yoshikawa
2011-11-14 10:25 ` [PATCH 0/4] KVM: Dirty logging optimization using rmap Avi Kivity
2011-11-14 10:56   ` Takuya Yoshikawa
2011-11-14 12:39     ` Avi Kivity
2011-11-16  4:28       ` Takuya Yoshikawa
2011-11-16  9:06         ` Avi Kivity
2011-11-29 10:01           ` Xiao Guangrong
2011-11-29 10:09             ` Xiao Guangrong
2011-11-29 10:35               ` Takuya Yoshikawa
2011-11-29 11:20                 ` Avi Kivity
2011-11-29 11:56                   ` Xiao Guangrong
2011-11-29 12:01                     ` Avi Kivity
2011-11-29 14:03                       ` Avi Kivity
2011-11-30  5:02                         ` Takuya Yoshikawa [this message]
2011-11-30  5:02                           ` [Qemu-devel] " Takuya Yoshikawa
2011-11-30  5:15                           ` Takuya Yoshikawa
2011-11-30  5:15                             ` [Qemu-devel] " Takuya Yoshikawa
2011-12-01 15:18                             ` Avi Kivity
2011-12-01 15:18                               ` [Qemu-devel] " Avi Kivity
2011-12-03  4:37                               ` Takuya Yoshikawa
2011-12-03  4:37                                 ` [Qemu-devel] " Takuya Yoshikawa
2011-12-04 10:20                                 ` Avi Kivity
2011-12-04 10:20                                   ` [Qemu-devel] " Avi Kivity
2011-11-30  7:10                         ` Xiao Guangrong
2011-11-30  7:03                       ` Xiao Guangrong
2011-12-01 15:11                         ` Avi Kivity
2011-11-16  8:17       ` Takuya Yoshikawa
2011-11-16  8:17         ` [Qemu-devel] " Takuya Yoshikawa
2011-11-17  9:28 ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED5B8F2.2090804@oss.ntt.co.jp \
    --to=yoshikawa.takuya@oss.ntt.co.jp \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=takuya.yoshikawa@gmail.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.