From mboxrd@z Thu Jan 1 00:00:00 1970 From: Takuya Yoshikawa Subject: Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap Date: Wed, 30 Nov 2011 14:02:42 +0900 Message-ID: <4ED5B8F2.2090804@oss.ntt.co.jp> References: <20111114182041.43570cdf.yoshikawa.takuya@oss.ntt.co.jp> <4EC0EC90.1090202@redhat.com> <4EC0F3D3.9090907@oss.ntt.co.jp> <4EC10BFE.7050704@redhat.com> <4EC33C0B.1060807@oss.ntt.co.jp> <4EC37D18.4010609@redhat.com> <4ED4AF43.2040003@linux.vnet.ibm.com> <4ED4B574.8090907@oss.ntt.co.jp> <4ED4BFEB.5010600@redhat.com> <4ED4C85A.5020509@linux.vnet.ibm.com> <4ED4C9A3.50504@redhat.com> <4ED4E626.5010507@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Xiao Guangrong , Marcelo Tosatti , KVM , quintela@redhat.com, qemu-devel@nongnu.org, Takuya Yoshikawa To: Avi Kivity Return-path: Received: from serv2.oss.ntt.co.jp ([222.151.198.100]:40496 "EHLO serv2.oss.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750926Ab1K3FCG (ORCPT ); Wed, 30 Nov 2011 00:02:06 -0500 In-Reply-To: <4ED4E626.5010507@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: CCing qemu devel, Juan, (2011/11/29 23:03), Avi Kivity wrote: > On 11/29/2011 02:01 PM, Avi Kivity wrote: >> On 11/29/2011 01:56 PM, Xiao Guangrong wrote: >>> On 11/29/2011 07:20 PM, Avi Kivity wrote: >>> >>> >>>> We used to have a bitmap in a shadow page with a bit set for every slot >>>> pointed to by the page. If we extend this to non-leaf pages (so, when >>>> we set a bit, we propagate it through its parent_ptes list), then we do >>>> the following on write fault: >>>> >>> >>> >>> Thanks for the detail. >>> >>> Um, propagating slot bit to parent ptes is little slow, especially, it >>> is the overload for no Xwindow guests which is dirty logged only in the >>> migration(i guess most linux guests are running on this mode and migration >>> is not frequent). No? >> >> You need to propagate very infrequently. The first pte added to a page >> will need to propagate, but the second (if from the same slot, which is >> likely) will already have the bit set in the page, so we're assured it's >> set in all its parents. > > btw, if you plan to work on this, let's agree on pseudocode/data > structures first to minimize churn. I'll also want this documented in > mmu.txt. Of course we can still end up with something different than > planned, but let's at least try to think of the issues in advance. > I want to hear the overall view as well. Now we are trying to improve cases when there are too many dirty pages during live migration. I did some measurements of live migration some months ago on 10Gbps dedicated line, two servers were directly connected, and checked that transferring only a few MBs of memory took ms order of latency, even if I excluded other QEMU side overheads: it matches simple math calculation. In another test, I found that even in a relatively normal workload, it needed a few seconds of pause at the last timing. Juan has more data? So, the current scheme is not scalable with respect to the number of dirty pages, and administrators should control not to migrate during such workload if possible. Server consolidation in the night will be OK, but dynamic load balancing may not work well in such restrictions: I am now more interested in the former. Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when I did the rmap optimization. Now it takes a few ms or so for write protecting such number of pages, IIRC: that is not so bad compared to the overall latency? So, though I like O(1) method, I want to hear the expected improvements in a bit more detail, if possible. IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write protections with respect to the total number of dirty pages: distributed, but actually each page fault, which should be logged, does some write protection? In general, what kind of improvements actually needed for live migration? Thanks, Takuya From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:41528) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RVcJ8-0002eh-DJ for qemu-devel@nongnu.org; Wed, 30 Nov 2011 00:02:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RVcJ7-0007Jt-80 for qemu-devel@nongnu.org; Wed, 30 Nov 2011 00:02:10 -0500 Received: from serv2.oss.ntt.co.jp ([222.151.198.100]:40499) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RVcJ6-0007Ji-PI for qemu-devel@nongnu.org; Wed, 30 Nov 2011 00:02:09 -0500 Message-ID: <4ED5B8F2.2090804@oss.ntt.co.jp> Date: Wed, 30 Nov 2011 14:02:42 +0900 From: Takuya Yoshikawa MIME-Version: 1.0 References: <20111114182041.43570cdf.yoshikawa.takuya@oss.ntt.co.jp> <4EC0EC90.1090202@redhat.com> <4EC0F3D3.9090907@oss.ntt.co.jp> <4EC10BFE.7050704@redhat.com> <4EC33C0B.1060807@oss.ntt.co.jp> <4EC37D18.4010609@redhat.com> <4ED4AF43.2040003@linux.vnet.ibm.com> <4ED4B574.8090907@oss.ntt.co.jp> <4ED4BFEB.5010600@redhat.com> <4ED4C85A.5020509@linux.vnet.ibm.com> <4ED4C9A3.50504@redhat.com> <4ED4E626.5010507@redhat.com> In-Reply-To: <4ED4E626.5010507@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 0/4] KVM: Dirty logging optimization using rmap List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: KVM , quintela@redhat.com, Marcelo Tosatti , qemu-devel@nongnu.org, Xiao Guangrong , Takuya Yoshikawa CCing qemu devel, Juan, (2011/11/29 23:03), Avi Kivity wrote: > On 11/29/2011 02:01 PM, Avi Kivity wrote: >> On 11/29/2011 01:56 PM, Xiao Guangrong wrote: >>> On 11/29/2011 07:20 PM, Avi Kivity wrote: >>> >>> >>>> We used to have a bitmap in a shadow page with a bit set for every slot >>>> pointed to by the page. If we extend this to non-leaf pages (so, when >>>> we set a bit, we propagate it through its parent_ptes list), then we do >>>> the following on write fault: >>>> >>> >>> >>> Thanks for the detail. >>> >>> Um, propagating slot bit to parent ptes is little slow, especially, it >>> is the overload for no Xwindow guests which is dirty logged only in the >>> migration(i guess most linux guests are running on this mode and migration >>> is not frequent). No? >> >> You need to propagate very infrequently. The first pte added to a page >> will need to propagate, but the second (if from the same slot, which is >> likely) will already have the bit set in the page, so we're assured it's >> set in all its parents. > > btw, if you plan to work on this, let's agree on pseudocode/data > structures first to minimize churn. I'll also want this documented in > mmu.txt. Of course we can still end up with something different than > planned, but let's at least try to think of the issues in advance. > I want to hear the overall view as well. Now we are trying to improve cases when there are too many dirty pages during live migration. I did some measurements of live migration some months ago on 10Gbps dedicated line, two servers were directly connected, and checked that transferring only a few MBs of memory took ms order of latency, even if I excluded other QEMU side overheads: it matches simple math calculation. In another test, I found that even in a relatively normal workload, it needed a few seconds of pause at the last timing. Juan has more data? So, the current scheme is not scalable with respect to the number of dirty pages, and administrators should control not to migrate during such workload if possible. Server consolidation in the night will be OK, but dynamic load balancing may not work well in such restrictions: I am now more interested in the former. Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when I did the rmap optimization. Now it takes a few ms or so for write protecting such number of pages, IIRC: that is not so bad compared to the overall latency? So, though I like O(1) method, I want to hear the expected improvements in a bit more detail, if possible. IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write protections with respect to the total number of dirty pages: distributed, but actually each page fault, which should be logged, does some write protection? In general, what kind of improvements actually needed for live migration? Thanks, Takuya