From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by e28smtp05.in.ibm.com (8.13.1/8.13.1) with ESMTP id m3B4xNlZ009960 for ; Fri, 11 Apr 2008 10:29:23 +0530 Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m3B4xLv51310888 for ; Fri, 11 Apr 2008 10:29:21 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.13.1/8.13.3) with ESMTP id m3B4xUmh017888 for ; Fri, 11 Apr 2008 04:59:31 GMT Message-ID: <47FEEFC1.4080509@linux.vnet.ibm.com> Date: Fri, 11 Apr 2008 10:27:37 +0530 From: Balbir Singh Reply-To: balbir@linux.vnet.ibm.com MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/3] memcg: remove refcnt References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com List-ID: KAMEZAWA Hiroyuki wrote: > This patch is based on 2.6.25-rc8-mm1 + mem_cgroup_per_zone() fix. > (already in -mm) > > This patch is a set for removing refcnt from memory resource controller's > page_cgroup. Instead of ref_cnt, this patch uses page_mapped(). > By this, we can avoid unnecesary locks and calls to some extent. > > Brief Patch Desc. > [1/3] change migration handling .... charge new-page before migration. > [2/3] remove refcnt .... remove refcnt from page_cgroup. > [3/3] handle swapcache .... handle swapcache again. > > [1/3] works for better page migration handling. > [2/3] works for better speed. (depends on [1/3]) > [3/3] works for swap-cache. (depends on [2/3]) > > > > Unix bench execl result(ia64): > No controller : 43.0 2654.7 617.4 > with controller : 43.0 2461.3 572.4 > after this patch: 43.0 2553.6 593.9 > > If page_cgroup->ref_cnt is necessary (for some purpose), please tell me. > > Plan: > I'd like to push this set before complicated radix-tree page_cgroup set. > But this should be reviewd before going ahead. > I think this makes a lot of sense. We can push the optimizations independent of the radix tree, so that it is easy to debug and develop. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <47FF57A7.5000704@mxp.nes.nec.co.jp> Date: Fri, 11 Apr 2008 21:20:55 +0900 From: Daisuke Nishimura MIME-Version: 1.0 Subject: Re: [RFC][PATCH 3/3] account swapcache References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "balbir@linux.vnet.ibm.com" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com, Hugh Dickins , "IKEDA, Munehiro" List-ID: Hi, KAMEZAWA-san. KAMEZAWA Hiroyuki wrote: > Now swapcache is not accounted. (because it had some troubles.) > > This is retrying account swap cache, based on remove-refcnt patch. > > This does. > * When a page is swap-cache, mem_cgroup_uncharge_page() will *not* > uncharge page even if page->mapcount == 0. > * When a page is removed from swap-cache, mem_cgroup_uncharge_page() > is called again. > * A swapcache page is newly charged only when it's mapped. > > Signed-off-by: KAMEZAWA Hiroyuki > I agree with the idea that swap caches should be charged as memory. (I think they may be charged as swap at the same time.) IMO, not charging swap caches as memory occasionally causes a problem that swap caches are not freed even when a process that owns those pages try to free them(e.g. task exit). For example: Some pages are being reclaimed via memcg memory reclaim. Assume that shrink_page_list() has already moved those pages to swap cache, unmapped them from ptes, removed from mz->lru, and is working on other pages on page_list. Those swap cache pages are unlocked and page_count of them are 2(swap cache, isolate_page). At the same time on other CPU, if the process that owns those pages are trying to free them, free_swap_and_cache() cannot free those pages unless vm_swap_full, because find_get_pages() increases page_count. I think this rare case itself also exists on global memory reclaim, but global memory reclaim does not assume that those pases have been freed, so, if it need to free more memory, those pases will be freed later because they remain on global inactive list. The problem here is that those swap cache pages are uncharged from memcg, so memcg can never reclaim those pages that belonged to the group. Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 14 Apr 2008 09:47:09 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [RFC][PATCH 3/3] account swapcache Message-Id: <20080414094709.fb9c3745.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <47FF57A7.5000704@mxp.nes.nec.co.jp> References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> <47FF57A7.5000704@mxp.nes.nec.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Daisuke Nishimura Cc: "linux-mm@kvack.org" , "balbir@linux.vnet.ibm.com" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com, Hugh Dickins , "IKEDA, Munehiro" List-ID: On Fri, 11 Apr 2008 21:20:55 +0900 Daisuke Nishimura wrote: > IMO, not charging swap caches as memory occasionally causes a problem > that swap caches are not freed even when a process that owns > those pages try to free them(e.g. task exit). > > For example: > > Some pages are being reclaimed via memcg memory reclaim. > > Assume that shrink_page_list() has already moved those pages > to swap cache, unmapped them from ptes, removed from mz->lru, > and is working on other pages on page_list. > Those swap cache pages are unlocked and > page_count of them are 2(swap cache, isolate_page). > > At the same time on other CPU, if the process that owns those > pages are trying to free them, free_swap_and_cache() cannot > free those pages unless vm_swap_full, because find_get_pages() > increases page_count. > > I think this rare case itself also exists on global memory reclaim, > but global memory reclaim does not assume that those pases have > been freed, so, if it need to free more memory, those pases > will be freed later because they remain on global inactive list. > yes. > The problem here is that those swap cache pages are uncharged > from memcg, so memcg can never reclaim those pages that belonged > to the group. > why "never" uncharged ? Assume "page" is SwapCache and unmapped and clean. == shrink_page_list() -> PageSwapCache() == true -> PageWriteback() == false -> PageDirty() == false -> PagePrivate() == true or false -> remove_mapping() -> page_count() == 2 -> PageDirty() == false -> PageSwapCache() == true -> __delete_from_swap_cache() -> true -> page will be freed == page shirinking can free SwapCache regardless of vm_swap_full() result. Of course, my patch handles __delete_from_swap_cache(). Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <48030FE9.1040401@mtf.biglobe.ne.jp> Date: Mon, 14 Apr 2008 17:03:53 +0900 From: Daisuke Nishimura Reply-To: Daisuke Nishimura MIME-Version: 1.0 Subject: Re: [RFC][PATCH 3/3] account swapcache References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> <47FF57A7.5000704@mxp.nes.nec.co.jp> <20080414094709.fb9c3745.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20080414094709.fb9c3745.kamezawa.hiroyu@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "balbir@linux.vnet.ibm.com" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com, Hugh Dickins , "IKEDA, Munehiro" List-ID: KAMEZAWA Hiroyuki wrote: > On Fri, 11 Apr 2008 21:20:55 +0900 > Daisuke Nishimura wrote: >> IMO, not charging swap caches as memory occasionally causes a problem >> that swap caches are not freed even when a process that owns >> those pages try to free them(e.g. task exit). >> >> For example: >> >> Some pages are being reclaimed via memcg memory reclaim. >> >> Assume that shrink_page_list() has already moved those pages >> to swap cache, unmapped them from ptes, removed from mz->lru, >> and is working on other pages on page_list. >> Those swap cache pages are unlocked and >> page_count of them are 2(swap cache, isolate_page). >> >> At the same time on other CPU, if the process that owns those >> pages are trying to free them, free_swap_and_cache() cannot >> free those pages unless vm_swap_full, because find_get_pages() >> increases page_count. >> >> I think this rare case itself also exists on global memory reclaim, >> but global memory reclaim does not assume that those pases have >> been freed, so, if it need to free more memory, those pases >> will be freed later because they remain on global inactive list. >> > yes. > >> The problem here is that those swap cache pages are uncharged >> from memcg, so memcg can never reclaim those pages that belonged >> to the group. >> > why "never" uncharged ? > > Assume "page" is SwapCache and unmapped and clean. > == > shrink_page_list() > -> PageSwapCache() == true > -> PageWriteback() == false > -> PageDirty() == false > -> PagePrivate() == true or false > -> remove_mapping() > -> page_count() == 2 > -> PageDirty() == false > -> PageSwapCache() == true > -> __delete_from_swap_cache() > -> true > -> page will be freed > == > You are right. I was thinking the case below. Assume some anonymous pages(mapped, referenced, !SwapCache) are being reclaimed. shrink_page_list() -> add_to_swap() <- makes the page dirty. -> try_to_unmap() <- uncharged from memcg and removed from mz->lru. -> PageDirty() == true sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced goto keep_locked -> unlocks the page and will work on other pages on page_list. And, if on other CPU the process that owns those pages is exiting at the timing of my example above, those pages remain only on global lru, and are never charged(mapped) because the process exits. I said "never" because once they are removed from mz->lru, mem_cgroup_isolate_pages() doesn't select those pages unless they are charged(mapped) again. > page shirinking can free SwapCache regardless of vm_swap_full() result. > Of course, my patch handles __delete_from_swap_cache(). > Yes. I think your patch can handle the case what I'm saying. Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 14 Apr 2008 17:23:21 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [RFC][PATCH 3/3] account swapcache Message-Id: <20080414172321.b97c4eb9.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <48030FE9.1040401@mtf.biglobe.ne.jp> References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> <47FF57A7.5000704@mxp.nes.nec.co.jp> <20080414094709.fb9c3745.kamezawa.hiroyu@jp.fujitsu.com> <48030FE9.1040401@mtf.biglobe.ne.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Daisuke Nishimura Cc: Daisuke Nishimura , "linux-mm@kvack.org" , "balbir@linux.vnet.ibm.com" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com, Hugh Dickins , "IKEDA, Munehiro" List-ID: On Mon, 14 Apr 2008 17:03:53 +0900 Daisuke Nishimura wrote: > I was thinking the case below. > Assume some anonymous pages(mapped, referenced, !SwapCache) > are being reclaimed. > Numbering for below. (1) > shrink_page_list() (2)> -> add_to_swap() <- makes the page dirty. (3)> -> try_to_unmap() <- uncharged from memcg and removed from mz->lru. (4)> -> PageDirty() == true (5)> sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced (6)> goto keep_locked (7)> -> unlocks the page and will work on other pages on page_list. > > And, if on other CPU the process that owns those pages is exiting > at the timing of my example above, those pages remain only on > global lru, and are never charged(mapped) because the process exits. > > I said "never" because once they are removed from mz->lru, > mem_cgroup_isolate_pages() doesn't select those pages > unless they are charged(mapped) again. > I'm sorry if I don't catch your points. Because of (1), it's marked as SwapCache. At (2) , page is not removed from mz->lru because it's SwapCache. (see my patch) page is still on mz->lru after (7). After a process exits, this page will be reclaimed when page-recalim for page_cgroup find this. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <48031775.10008@mtf.biglobe.ne.jp> Date: Mon, 14 Apr 2008 17:36:05 +0900 From: Daisuke Nishimura Reply-To: Daisuke Nishimura MIME-Version: 1.0 Subject: Re: [RFC][PATCH 3/3] account swapcache References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> <47FF57A7.5000704@mxp.nes.nec.co.jp> <20080414094709.fb9c3745.kamezawa.hiroyu@jp.fujitsu.com> <48030FE9.1040401@mtf.biglobe.ne.jp> <20080414172321.b97c4eb9.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20080414172321.b97c4eb9.kamezawa.hiroyu@jp.fujitsu.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "balbir@linux.vnet.ibm.com" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com, Hugh Dickins , "IKEDA, Munehiro" List-ID: KAMEZAWA Hiroyuki wrote: > On Mon, 14 Apr 2008 17:03:53 +0900 > Daisuke Nishimura wrote: > >> I was thinking the case below. >> Assume some anonymous pages(mapped, referenced, !SwapCache) >> are being reclaimed. >> > > Numbering for below. > > (1) > shrink_page_list() > (2)> -> add_to_swap() <- makes the page dirty. > (3)> -> try_to_unmap() <- uncharged from memcg and removed from mz->lru. > (4)> -> PageDirty() == true > (5)> sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced > (6)> goto keep_locked > (7)> -> unlocks the page and will work on other pages on page_list. >> And, if on other CPU the process that owns those pages is exiting >> at the timing of my example above, those pages remain only on >> global lru, and are never charged(mapped) because the process exits. >> >> I said "never" because once they are removed from mz->lru, >> mem_cgroup_isolate_pages() doesn't select those pages >> unless they are charged(mapped) again. >> > I'm sorry if I don't catch your points. > > Because of (1), it's marked as SwapCache. > At (2) , page is not removed from mz->lru because it's SwapCache. (see my patch) > page is still on mz->lru after (7). > > After a process exits, this page will be reclaimed when page-recalim for > page_cgroup find this. > > Thanks, > -Kame > I was saying the case when swapcaches are not charged. I showed one of the problems if they are not charged. Sorry for confusing you. I agree that your patch handles this case :-) Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 14 Apr 2008 17:45:15 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [RFC][PATCH 3/3] account swapcache Message-Id: <20080414174515.dfcd69a0.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <48031775.10008@mtf.biglobe.ne.jp> References: <20080408190734.70ab55b0.kamezawa.hiroyu@jp.fujitsu.com> <20080408191311.73b167bb.kamezawa.hiroyu@jp.fujitsu.com> <47FF57A7.5000704@mxp.nes.nec.co.jp> <20080414094709.fb9c3745.kamezawa.hiroyu@jp.fujitsu.com> <48030FE9.1040401@mtf.biglobe.ne.jp> <20080414172321.b97c4eb9.kamezawa.hiroyu@jp.fujitsu.com> <48031775.10008@mtf.biglobe.ne.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Daisuke Nishimura Cc: Daisuke Nishimura , "linux-mm@kvack.org" , "balbir@linux.vnet.ibm.com" , "xemul@openvz.org" , "yamamoto@valinux.co.jp" , lizf@cn.fujitsu.com, Hugh Dickins , "IKEDA, Munehiro" List-ID: On Mon, 14 Apr 2008 17:36:05 +0900 Daisuke Nishimura wrote: > I was saying the case when swapcaches are not charged. > I showed one of the problems if they are not charged. > > Sorry for confusing you. > no problem. > I agree that your patch handles this case :-) > Thank you for review :) Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org