From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44813)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.weiyang@huawei.com>) id 1aitbH-0004KD-B8
	for qemu-devel@nongnu.org; Wed, 23 Mar 2016 20:58:12 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <richard.weiyang@huawei.com>) id 1aitbD-00077y-8I
	for qemu-devel@nongnu.org; Wed, 23 Mar 2016 20:58:11 -0400
Received: from szxga03-in.huawei.com ([119.145.14.66]:62339)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <richard.weiyang@huawei.com>) id 1aitbC-00071s-2G
	for qemu-devel@nongnu.org; Wed, 23 Mar 2016 20:58:07 -0400
Date: Thu, 24 Mar 2016 08:52:56 +0800
From: Wei Yang <richard.weiyang@huawei.com>
Message-ID: <20160324005256.GA14956@linux-gk3p>
References: <1458632629-4649-1-git-send-email-liang.z.li@intel.com>
	<20160323013715.GB13750@linux-gk3p>
	<F2CBF3009FA73547804AE4C663CAB28E04159CCD@shsmsx102.ccr.corp.intel.com>
	<20160323094643.GA18660@linux-gk3p>
	<F2CBF3009FA73547804AE4C663CAB28E0415AABD@shsmsx102.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0415AABD@shsmsx102.ccr.corp.intel.com>
Subject: Re: [Qemu-devel] [RFC Design Doc]Speed up live migration by
	skipping free pages
Reply-To: Wei Yang <richard.weiyang@huawei.com>
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "rkagan@virtuozzo.com" <rkagan@virtuozzo.com>, "linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>, "ehabkost@redhat.com" <ehabkost@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "mst@redhat.com" <mst@redhat.com>, "simhan@hpe.com" <simhan@hpe.com>, "quintela@redhat.com" <quintela@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "dgilbert@redhat.com" <dgilbert@redhat.com>, "jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>, "mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>, "amit.shah@redhat.com" <amit.shah@redhat.com>, "pbonzini@redhat.com" <pbonzini@redhat.com>, Wei Yang <richard.weiyang@huawei.com>, "rth@twiddle.net" <rth@twiddle.net>

On Wed, Mar 23, 2016 at 02:35:42PM +0000, Li, Liang Z wrote:
>> >No special purpose. Maybe it's caused by the email client. I didn't
>> >find the character in the original doc.
>> >
>>=20
>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg00715.html
>>=20
>> You could take a look at this link, there is a '>' before From.
>
>Yes, there is.=20
>
>> >> >
>> >> >6. Handling page cache in the guest
>> >> >The memory used for page cache in the guest will change depends on
>> >> >the workload, if guest run some block IO intensive work load, there
>> >> >will
>> >>
>> >> Would this improvement benefit a lot when guest only has little free =
page?
>> >
>> >Yes, the improvement is very obvious.
>> >
>>=20
>> Good to know this.
>>=20
>> >> In your Performance data Case 2, I think it mimic this kind of case.
>> >> While the memory consuming task is stopped before migration. If it
>> >> continues, would we still perform better than before?
>> >
>> >Actually, my RFC patch didn't consider the page cache, Roman raised this
>> issue.
>> >so I add this part in this doc.
>> >
>> >Case 2 didn't mimic this kind of scenario, the work load is an memory
>> >consuming work load, not an block IO intensive work load, so there are
>> >not many page cache in this case.
>> >
>> >If the work load in case 2 continues, as long as it not write all the
>> >memory it allocates, we still can get benefits.
>> >
>>=20
>> Sounds I have little knowledge on page cache, and its relationship betwe=
en
>> free page and I/O intensive work.
>>=20
>> Here is some personal understanding, I would appreciate if you could cor=
rect
>> me.
>>=20
>>                 +---------+
>>                 |PageCache|
>>                 +---------+
>>       +---------+---------+---------+---------+
>>       |Page     |Page     |Free Page|Page     |
>>       +---------+---------+---------+---------+
>>=20
>> Free Page is a page in the free_list, PageCache is some page cached in C=
PU's
>> cache line?
>
>No, page cache is quite different with CPU cache line.
>" In computing, a page cache, sometimes also called disk cache,[2] is a tr=
ansparent cache
> for the pages originating from a secondary storage device such as a hard =
disk drive (HDD).
> The operating system keeps a page cache in otherwise unused portions of t=
he main
> memory (RAM), resulting in quicker access to the contents of cached pages=
 and=20
>overall performance improvements "
>you can refer to https://en.wikipedia.org/wiki/Page_cache
>for more details.
>

My poor knowledge~ Should google it before I imagine the meaning of the
terminology.

If my understanding is correct, the Page Cache is counted as Free Page, whi=
le
actually we should migrate them instead of filter them.

>
>> When memory consuming task runs, it leads to little Free Page in the who=
le
>> system. What's the consequence when I/O intensive work runs? I guess it
>> still leads to little Free Page. And will have some problem in sync on
>> PageCache?
>>=20
>> >>
>> >> I am thinking is it possible to have a threshold or configurable
>> >> threshold to utilize free page bitmap optimization?
>> >>
>> >
>> >Could you elaborate your idea? How does it work?
>> >
>>=20
>> Let's back to Case 2. We run a memory consuming task which will leads to
>> little Free Page in the whole system. Which means from Qemu perspective,
>> little of the dirty_memory is filtered by Free Page list. My original qu=
estion is
>> whether your solution benefits in this scenario. As you mentioned it wor=
ks
>> fine. So maybe this threshold is not necessary.
>>=20
>I didn't quite understand your question before.=20
>The benefits we get depends on the  count of free pages we can filter out.
>This is always true.
>
>> My original idea is in Qemu we can calculate the percentage of the Free =
Page
>> in the whole system. If it finds there is only little percentage of Free=
 Page,
>> then we don't need to bother to use this method.
>>=20
>
>I got you. The threshold can be used for optimization, but the effect is v=
ery limited.
>If there are only a few of free pages, the process of constructing the fre=
e page
>bitmap is very quick.=20
>But we can stop doing the following things, e.g. sending the free page bit=
map and doing
>the bitmap operation, theoretically, that may help to save some time, mayb=
e several ms.
>

Ha, you got what I mean.

>I think a VM has no free pages at all is very rare, in the worst case, the=
re are still several
> MB of free pages. The proper threshold should be determined by comparing =
 the extra
> time spends on processing the free page bitmap and the time spends on sen=
ding
>the several MB of free pages though the network. If the formal is longer, =
we can stop
>using this method. So we should take the network bandwidth into considerat=
ion, it's=20
>too complicated and not worth to do.
>

Yes, after some thinking, it maybe not that easy and worth to do this
optimization.

>Thanks
>
>Liang
>> Have a nice day~
>>=20
>> >Liang
>> >
>> >>
>> >> --
>> >> Richard Yang\nHelp you, Help me
>>=20
>> --
>> Richard Yang\nHelp you, Help me
>=04=EF=BF=BD{.n=EF=BF=BD+=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=
=BF=BD=EF=BF=BD+%=EF=BF=BD=EF=BF=BDlzwm=EF=BF=BD=EF=BF=BDb=EF=BF=BD=EB=A7=
=B2=EF=BF=BD=EF=BF=BDr=EF=BF=BD=EF=BF=BDzK=EF=BF=BD{ay=EF=BF=BD=1D=CA=87=DA=
=99=EF=BF=BD,j=07=EF=BF=BD=EF=BF=BDf=EF=BF=BD=EF=BF=BD=EF=BF=BDh=EF=BF=BD=
=EF=BF=BD=EF=BF=BDz=EF=BF=BD=1E=EF=BF=BDw=EF=BF=BD=EF=BF=BD=EF=BF=BD=0C=EF=
=BF=BD=EF=BF=BD=EF=BF=BDj:+v=EF=BF=BD=EF=BF=BD=EF=BF=BDw=EF=BF=BDj=EF=BF=BD=
m=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=07=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BDzZ+=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=DD=A2j"=EF=BF=BD=EF=BF=
=BD!=EF=BF=BDi
--=20
Richard Yang\nHelp you, Help me