From mboxrd@z Thu Jan  1 00:00:00 1970
From: Izik Eidus <ieidus@redhat.com>
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from
 one page into another
Date: Thu, 13 Nov 2008 12:38:07 +0200
Message-ID: <491C038F.5020007@redhat.com>
References: <1226409701-14831-1-git-send-email-ieidus@redhat.com>	<1226409701-14831-2-git-send-email-ieidus@redhat.com>	<1226409701-14831-3-git-send-email-ieidus@redhat.com>	<20081111114555.eb808843.akpm@linux-foundation.org>	<4919F1C0.2050009@redhat.com>	<Pine.LNX.4.64.0811111520590.27767@quilx.com>	<4919F7EE.3070501@redhat.com>	<Pine.LNX.4.64.0811111527500.27767@quilx.com>	<20081111222421.GL10818@random.random>	<20081112111931.0e40c27d.kamezawa.hiroyu@jp.fujitsu.com>	<491AAA84.5040801@redhat.com>	<491AB9D0.7060802@qumranet.com> <20081113151129.35c17962.kamezawa.hiroyu@jp.fujitsu.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Izik Eidus <izik@qumranet.com>, Avi Kivity <avi@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, izike@qumranet.com
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:55557 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751747AbYKMKif (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 13 Nov 2008 05:38:35 -0500
In-Reply-To: <20081113151129.35c17962.kamezawa.hiroyu@jp.fujitsu.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

=D7=A6=D7=99=D7=98=D7=95=D7=98 KAMEZAWA Hiroyuki:
> Thank you for answers.
>
> On Wed, 12 Nov 2008 13:11:12 +0200
> Izik Eidus <izik@qumranet.com> wrote:
>
>  =20
>> Avi Kivity wrote:
>>    =20
>>> KAMEZAWA Hiroyuki wrote:
>>>      =20
>>>> Can I make a question ? (I'm working for memory cgroup.)
>>>>
>>>> Now, we do charge to anonymous page when
>>>>   - charge(+1) when it's mapped firstly (mapcount 0->1)
>>>>   - uncharge(-1) it's fully unmapped (mapcount 1->0) vir=20
>>>> page_remove_rmap().
>>>>
>>>> My quesion is
>>>>  - PageKSM pages are not necessary to be tracked by memory cgroup =
?
>>>>        =20
>> When we reaplacing page using page_replace() we have:
>> oldpage - > anonymous page that is going to be replaced by newpage
>> newpage -> kernel allocated page (KsmPage)
>> so about oldpage we are calling page_remove_rmap() that will notify =
cgroup
>> and about newpage it wont be count inside cgroup beacuse it is file =
rmap=20
>> page
>> (we are calling to page_add_file_rmap), so right now PageKSM wont ev=
er=20
>> be tracked by cgroup.
>>
>>    =20
> If not in radix-tree, it's not tracked.
> (But we don't want to track non-LRU pages which are not freeable.)
>  =20

Yes.

>
>  =20
>>>>  - Can we know that "the page is just replaced and we don't necess=
ary=20
>>>> to do
>>>>    charge/uncharge".
>>>>        =20
>> The caller of page_replace does know it, the only problem is that=20
>> page_remove_rmap()
>> automaticly change the cgroup for anonymous pages,
>> if we want it not to change the cgroup, we can:
>> increase the cgroup count before page_remove (but in that case what=20
>> happen if we reach to the limit???)
>> give parameter to page_remove_rmap() that we dont want the cgroup to=
 be=20
>> changed.
>>    =20
>
> Hmm, current mem cgroup works via page_cgroup struct to track pages.
>
>    page <-> page_cgroup has one-to-one relation ship.
>
> So, "exchanging page" itself causes trouble. But I may be able to pro=
vide
> necessary hooks to you as I did in page migraiton.
>  =20

But if we dont track KsmPages, and we call to page_remove_rmap() on the=
=20
anonymous page that we
replace, why would it be a problem?
(should everything be correct in that case???)

>  =20
>>>>  - annonymous page from KSM is worth to be tracked by memory cgrou=
p ?
>>>>    (IOW, it's on LRU and can be swapped-out ?)
>>>>        =20
>> KSM have no anonymous pages (it share anonymous pages into KsmPAGE -=
>=20
>> kernel allocated page without mapping)
>> so it isnt in LRU and it cannt be swapped, only when KsmPAGEs will b=
e=20
>> break by do_wp_page() the duplication will be able to swap.
>>
>>    =20
> Ok, thank you for confirmation.
>
>  =20
>>>>  =20
>>>>        =20
>>> My feeling is that shared pages should be accounted as if they were=
=20
>>> not shared; that is, a share page should be accounted for each proc=
ess=20
>>> that shares it.  Perhaps sharing within a cgroup should be counted =
as=20
>>> 1 page for all the ptes pointing to it.
>>>
>>>
>>>      =20
>
> If KSM pages are on radix-tree, it will be accounted automatically.
> Now, we have "Unevictable" LRU and mlocked() pages are smartly isolat=
ed into its
> own LRU. So, just doing
>
>  - inode's radix-tree
>  - make all pages mlocked.
>  - provide special page fault handler for your purpose
>  =20

Well in this version that i am going to merge the pages arent going to=20
be swappable,
Latter after Ksm will get merged we will make the KsmPages swappable...
so i think working with cgroups would be effective / useful only when=20
KsmPages will start be swappable...
Do you agree?
(What i am saying is that right now lets dont count the KsmPages inside=
=20
the cgroup, lets do it when KsmPages
will be swappable)

If you feel this pages should be counted in the cgroup i have no proble=
m=20
to do it via hooks like page migration is doing.

thanks.

> is simple one. But ok, whatever implementation you'll do, I have to c=
heck it
> and consider whether it should be tracked or not. Then, add codes to =
memcg to
> track it or ignore it or comments on your patches ;)
>
> It's helpful to add me to CC: when you post this set again.
>  =20

Sure will.

> Thanks,
> -Kame
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kerne=
l" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>  =20