From mboxrd@z Thu Jan  1 00:00:00 1970
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Subject: Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
Date: Wed, 30 Nov 2011 14:02:42 +0900
Message-ID: <4ED5B8F2.2090804@oss.ntt.co.jp>
References: <20111114182041.43570cdf.yoshikawa.takuya@oss.ntt.co.jp> <4EC0EC90.1090202@redhat.com> <4EC0F3D3.9090907@oss.ntt.co.jp> <4EC10BFE.7050704@redhat.com> <4EC33C0B.1060807@oss.ntt.co.jp> <4EC37D18.4010609@redhat.com> <loom.20111129T104524-678@post.gmane.org> <4ED4AF43.2040003@linux.vnet.ibm.com> <4ED4B574.8090907@oss.ntt.co.jp> <4ED4BFEB.5010600@redhat.com> <4ED4C85A.5020509@linux.vnet.ibm.com> <4ED4C9A3.50504@redhat.com> <4ED4E626.5010507@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	KVM <kvm@vger.kernel.org>, quintela@redhat.com,
	qemu-devel@nongnu.org,
	Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from serv2.oss.ntt.co.jp ([222.151.198.100]:40496 "EHLO
	serv2.oss.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750926Ab1K3FCG (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 30 Nov 2011 00:02:06 -0500
In-Reply-To: <4ED4E626.5010507@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

CCing qemu devel, Juan,

(2011/11/29 23:03), Avi Kivity wrote:
> On 11/29/2011 02:01 PM, Avi Kivity wrote:
>> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>>
>>>
>>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>>> the following on write fault:
>>>>
>>>
>>>
>>> Thanks for the detail.
>>>
>>> Um, propagating slot bit to parent ptes is little slow, especially, it
>>> is the overload for no Xwindow guests which is dirty logged only in the
>>> migration(i guess most linux guests are running on this mode and migration
>>> is not frequent). No?
>>
>> You need to propagate very infrequently.  The first pte added to a page
>> will need to propagate, but the second (if from the same slot, which is
>> likely) will already have the bit set in the page, so we're assured it's
>> set in all its parents.
>
> btw, if you plan to work on this, let's agree on pseudocode/data
> structures first to minimize churn.  I'll also want this documented in
> mmu.txt.  Of course we can still end up with something different than
> planned, but let's at least try to think of the issues in advance.
>

I want to hear the overall view as well.

Now we are trying to improve cases when there are too many dirty pages during
live migration.

I did some measurements of live migration some months ago on 10Gbps dedicated line,
two servers were directly connected, and checked that transferring only a few MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads: it
matches simple math calculation.

In another test, I found that even in a relatively normal workload, it needed a few
seconds of pause at the last timing.

	Juan has more data?

So, the current scheme is not scalable with respect to the number of dirty pages,
and administrators should control not to migrate during such workload if possible.

	Server consolidation in the night will be OK, but dynamic load balancing
	may not work well in such restrictions: I am now more interested in the
	former.

Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization.  Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?


So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.

IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?

In general, what kind of improvements actually needed for live migration?

Thanks,
	Takuya

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:41528)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yoshikawa.takuya@oss.ntt.co.jp>) id 1RVcJ8-0002eh-DJ
	for qemu-devel@nongnu.org; Wed, 30 Nov 2011 00:02:11 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yoshikawa.takuya@oss.ntt.co.jp>) id 1RVcJ7-0007Jt-80
	for qemu-devel@nongnu.org; Wed, 30 Nov 2011 00:02:10 -0500
Received: from serv2.oss.ntt.co.jp ([222.151.198.100]:40499)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yoshikawa.takuya@oss.ntt.co.jp>) id 1RVcJ6-0007Ji-PI
	for qemu-devel@nongnu.org; Wed, 30 Nov 2011 00:02:09 -0500
Message-ID: <4ED5B8F2.2090804@oss.ntt.co.jp>
Date: Wed, 30 Nov 2011 14:02:42 +0900
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
MIME-Version: 1.0
References: <20111114182041.43570cdf.yoshikawa.takuya@oss.ntt.co.jp>
	<4EC0EC90.1090202@redhat.com> <4EC0F3D3.9090907@oss.ntt.co.jp>
	<4EC10BFE.7050704@redhat.com> <4EC33C0B.1060807@oss.ntt.co.jp>
	<4EC37D18.4010609@redhat.com>
	<loom.20111129T104524-678@post.gmane.org>
	<4ED4AF43.2040003@linux.vnet.ibm.com>
	<4ED4B574.8090907@oss.ntt.co.jp> <4ED4BFEB.5010600@redhat.com>
	<4ED4C85A.5020509@linux.vnet.ibm.com>
	<4ED4C9A3.50504@redhat.com> <4ED4E626.5010507@redhat.com>
In-Reply-To: <4ED4E626.5010507@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 0/4] KVM: Dirty logging optimization using
	rmap
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: KVM <kvm@vger.kernel.org>, quintela@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel@nongnu.org, Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>, Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

CCing qemu devel, Juan,

(2011/11/29 23:03), Avi Kivity wrote:
> On 11/29/2011 02:01 PM, Avi Kivity wrote:
>> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>>
>>>
>>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>>> the following on write fault:
>>>>
>>>
>>>
>>> Thanks for the detail.
>>>
>>> Um, propagating slot bit to parent ptes is little slow, especially, it
>>> is the overload for no Xwindow guests which is dirty logged only in the
>>> migration(i guess most linux guests are running on this mode and migration
>>> is not frequent). No?
>>
>> You need to propagate very infrequently.  The first pte added to a page
>> will need to propagate, but the second (if from the same slot, which is
>> likely) will already have the bit set in the page, so we're assured it's
>> set in all its parents.
>
> btw, if you plan to work on this, let's agree on pseudocode/data
> structures first to minimize churn.  I'll also want this documented in
> mmu.txt.  Of course we can still end up with something different than
> planned, but let's at least try to think of the issues in advance.
>

I want to hear the overall view as well.

Now we are trying to improve cases when there are too many dirty pages during
live migration.

I did some measurements of live migration some months ago on 10Gbps dedicated line,
two servers were directly connected, and checked that transferring only a few MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads: it
matches simple math calculation.

In another test, I found that even in a relatively normal workload, it needed a few
seconds of pause at the last timing.

	Juan has more data?

So, the current scheme is not scalable with respect to the number of dirty pages,
and administrators should control not to migrate during such workload if possible.

	Server consolidation in the night will be OK, but dynamic load balancing
	may not work well in such restrictions: I am now more interested in the
	former.

Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization.  Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?


So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.

IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?

In general, what kind of improvements actually needed for live migration?

Thanks,
	Takuya