From mboxrd@z Thu Jan 1 00:00:00 1970 From: Izik Eidus Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Date: Thu, 13 Nov 2008 12:38:07 +0200 Message-ID: <491C038F.5020007@redhat.com> References: <1226409701-14831-1-git-send-email-ieidus@redhat.com> <1226409701-14831-2-git-send-email-ieidus@redhat.com> <1226409701-14831-3-git-send-email-ieidus@redhat.com> <20081111114555.eb808843.akpm@linux-foundation.org> <4919F1C0.2050009@redhat.com> <4919F7EE.3070501@redhat.com> <20081111222421.GL10818@random.random> <20081112111931.0e40c27d.kamezawa.hiroyu@jp.fujitsu.com> <491AAA84.5040801@redhat.com> <491AB9D0.7060802@qumranet.com> <20081113151129.35c17962.kamezawa.hiroyu@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Izik Eidus , Avi Kivity , Andrea Arcangeli , Christoph Lameter , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, chrisw@redhat.com, izike@qumranet.com To: KAMEZAWA Hiroyuki Return-path: Received: from mx2.redhat.com ([66.187.237.31]:55557 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751747AbYKMKif (ORCPT ); Thu, 13 Nov 2008 05:38:35 -0500 In-Reply-To: <20081113151129.35c17962.kamezawa.hiroyu@jp.fujitsu.com> Sender: kvm-owner@vger.kernel.org List-ID: =D7=A6=D7=99=D7=98=D7=95=D7=98 KAMEZAWA Hiroyuki: > Thank you for answers. > > On Wed, 12 Nov 2008 13:11:12 +0200 > Izik Eidus wrote: > > =20 >> Avi Kivity wrote: >> =20 >>> KAMEZAWA Hiroyuki wrote: >>> =20 >>>> Can I make a question ? (I'm working for memory cgroup.) >>>> >>>> Now, we do charge to anonymous page when >>>> - charge(+1) when it's mapped firstly (mapcount 0->1) >>>> - uncharge(-1) it's fully unmapped (mapcount 1->0) vir=20 >>>> page_remove_rmap(). >>>> >>>> My quesion is >>>> - PageKSM pages are not necessary to be tracked by memory cgroup = ? >>>> =20 >> When we reaplacing page using page_replace() we have: >> oldpage - > anonymous page that is going to be replaced by newpage >> newpage -> kernel allocated page (KsmPage) >> so about oldpage we are calling page_remove_rmap() that will notify = cgroup >> and about newpage it wont be count inside cgroup beacuse it is file = rmap=20 >> page >> (we are calling to page_add_file_rmap), so right now PageKSM wont ev= er=20 >> be tracked by cgroup. >> >> =20 > If not in radix-tree, it's not tracked. > (But we don't want to track non-LRU pages which are not freeable.) > =20 Yes. > > =20 >>>> - Can we know that "the page is just replaced and we don't necess= ary=20 >>>> to do >>>> charge/uncharge". >>>> =20 >> The caller of page_replace does know it, the only problem is that=20 >> page_remove_rmap() >> automaticly change the cgroup for anonymous pages, >> if we want it not to change the cgroup, we can: >> increase the cgroup count before page_remove (but in that case what=20 >> happen if we reach to the limit???) >> give parameter to page_remove_rmap() that we dont want the cgroup to= be=20 >> changed. >> =20 > > Hmm, current mem cgroup works via page_cgroup struct to track pages. > > page <-> page_cgroup has one-to-one relation ship. > > So, "exchanging page" itself causes trouble. But I may be able to pro= vide > necessary hooks to you as I did in page migraiton. > =20 But if we dont track KsmPages, and we call to page_remove_rmap() on the= =20 anonymous page that we replace, why would it be a problem? (should everything be correct in that case???) > =20 >>>> - annonymous page from KSM is worth to be tracked by memory cgrou= p ? >>>> (IOW, it's on LRU and can be swapped-out ?) >>>> =20 >> KSM have no anonymous pages (it share anonymous pages into KsmPAGE -= >=20 >> kernel allocated page without mapping) >> so it isnt in LRU and it cannt be swapped, only when KsmPAGEs will b= e=20 >> break by do_wp_page() the duplication will be able to swap. >> >> =20 > Ok, thank you for confirmation. > > =20 >>>> =20 >>>> =20 >>> My feeling is that shared pages should be accounted as if they were= =20 >>> not shared; that is, a share page should be accounted for each proc= ess=20 >>> that shares it. Perhaps sharing within a cgroup should be counted = as=20 >>> 1 page for all the ptes pointing to it. >>> >>> >>> =20 > > If KSM pages are on radix-tree, it will be accounted automatically. > Now, we have "Unevictable" LRU and mlocked() pages are smartly isolat= ed into its > own LRU. So, just doing > > - inode's radix-tree > - make all pages mlocked. > - provide special page fault handler for your purpose > =20 Well in this version that i am going to merge the pages arent going to=20 be swappable, Latter after Ksm will get merged we will make the KsmPages swappable... so i think working with cgroups would be effective / useful only when=20 KsmPages will start be swappable... Do you agree? (What i am saying is that right now lets dont count the KsmPages inside= =20 the cgroup, lets do it when KsmPages will be swappable) If you feel this pages should be counted in the cgroup i have no proble= m=20 to do it via hooks like page migration is doing. thanks. > is simple one. But ok, whatever implementation you'll do, I have to c= heck it > and consider whether it should be tracked or not. Then, add codes to = memcg to > track it or ignore it or comments on your patches ;) > > It's helpful to add me to CC: when you post this set again. > =20 Sure will. > Thanks, > -Kame > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kerne= l" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > =20