* [PATCH 3/4] memcg: avoid account not-on-LRU pages
@ 2008-09-29 10:23 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-29 10:23 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, nishimura@mxp.nes.nec.co.jp, Andrew Morton
There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to handle
specical case) We'd like to make use of usual objrmap/radix-tree's protcol
and don't want to account out-of-vm's control pages.
When special_mapping_fault() is called, page->mapping is tend to be NULL
and it's charged as Anonymous page.
insert_page() also handles some special pages from drivers.
This patch is for avoiding to account special pages.
Changlog: v5 -> v6
- modified Documentation.
- fixed to charge only when a page is newly allocated.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
mm/memory.c | 29 +++++++++++++----------------
mm/rmap.c | 4 ++--
3 files changed, 31 insertions(+), 26 deletions(-)
Index: mmotm-2.6.27-rc7+/mm/memory.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memory.c
+++ mmotm-2.6.27-rc7+/mm/memory.c
@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
pte_t *pte;
spinlock_t *ptl;
- retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
- if (retval)
- goto out;
-
retval = -EINVAL;
if (PageAnon(page))
- goto out_uncharge;
+ goto out;
retval = -ENOMEM;
flush_dcache_page(page);
pte = get_locked_pte(mm, addr, &ptl);
if (!pte)
- goto out_uncharge;
+ goto out;
retval = -EBUSY;
if (!pte_none(*pte))
goto out_unlock;
@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
return retval;
out_unlock:
pte_unmap_unlock(pte, ptl);
-out_uncharge:
- mem_cgroup_uncharge_page(page);
out:
return retval;
}
@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
struct page *page;
pte_t entry;
int anon = 0;
+ int charged = 0;
struct page *dirty_page = NULL;
struct vm_fault vmf;
int ret;
@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
ret = VM_FAULT_OOM;
goto out;
}
+ if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
+ ret = VM_FAULT_OOM;
+ page_cache_release(page);
+ goto out;
+ }
+ charged = 1;
/*
* Don't let another task, with possibly unlocked vma,
* keep the mlocked page.
@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
}
- if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
- ret = VM_FAULT_OOM;
- goto out;
- }
-
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
/*
@@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
/* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, entry);
} else {
- mem_cgroup_uncharge_page(page);
- if (anon)
+ if (charged)
+ mem_cgroup_uncharge_page(page);
+ if (anon) {
page_cache_release(page);
- else
+ } else
anon = 1; /* no anon but release faulted_page */
}
Index: mmotm-2.6.27-rc7+/mm/rmap.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/rmap.c
+++ mmotm-2.6.27-rc7+/mm/rmap.c
@@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
page_clear_dirty(page);
set_page_dirty(page);
}
-
- mem_cgroup_uncharge_page(page);
+ if (PageAnon(page))
+ mem_cgroup_uncharge_page(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
/*
Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
===================================================================
--- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
+++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
@@ -112,14 +112,22 @@ the per cgroup LRU.
2.2.1 Accounting details
-All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
-RSS pages are accounted at the time of page_add_*_rmap() unless they've already
-been accounted for earlier. A file page will be accounted for as Page Cache;
-it's mapped into the page tables of a process, duplicate accounting is carefully
-avoided. Page Cache pages are accounted at the time of add_to_page_cache().
-The corresponding routines that remove a page from the page tables or removes
-a page from Page Cache is used to decrement the accounting counters of the
-cgroup.
+All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
+(some pages which never be reclaimable and will not be on global LRU
+ are not accounted. we just accounts pages under usual vm management.)
+
+RSS pages are accounted at page_fault unless they've already been accounted
+for earlier. A file page will be accounted for as Page Cache when it's
+inserted into inode (radix-tree). While it's mapped into the page tables of
+processes, duplicate accounting is carefully avoided.
+
+A RSS page is unaccounted when it's fully unmapped. A PageCache page is
+unaccounted when it's removed from radix-tree.
+
+At page migration, accounting information is kept.
+
+Note: we just account pages-on-lru because our purpose is to control amount
+of used pages. not-on-lru pages are tend to be out-of-control from vm view.
2.3 Shared Page Accounting
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
2008-09-29 10:23 ` KAMEZAWA Hiroyuki
@ 2008-09-29 11:19 ` Daisuke Nishimura
-1 siblings, 0 replies; 34+ messages in thread
From: Daisuke Nishimura @ 2008-09-29 11:19 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: nishimura, linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, Andrew Morton
On Mon, 29 Sep 2008 19:23:39 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
>
> When special_mapping_fault() is called, page->mapping is tend to be NULL
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
>
> This patch is for avoiding to account special pages.
>
> Changlog: v5 -> v6
> - modified Documentation.
> - fixed to charge only when a page is newly allocated.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
> mm/memory.c | 29 +++++++++++++----------------
> mm/rmap.c | 4 ++--
> 3 files changed, 31 insertions(+), 26 deletions(-)
>
> Index: mmotm-2.6.27-rc7+/mm/memory.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
> +++ mmotm-2.6.27-rc7+/mm/memory.c
> @@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
> pte_t *pte;
> spinlock_t *ptl;
>
> - retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
> - if (retval)
> - goto out;
> -
> retval = -EINVAL;
> if (PageAnon(page))
> - goto out_uncharge;
> + goto out;
> retval = -ENOMEM;
> flush_dcache_page(page);
> pte = get_locked_pte(mm, addr, &ptl);
> if (!pte)
> - goto out_uncharge;
> + goto out;
> retval = -EBUSY;
> if (!pte_none(*pte))
> goto out_unlock;
> @@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
> return retval;
> out_unlock:
> pte_unmap_unlock(pte, ptl);
> -out_uncharge:
> - mem_cgroup_uncharge_page(page);
> out:
> return retval;
> }
> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> struct page *page;
> pte_t entry;
> int anon = 0;
> + int charged = 0;
> struct page *dirty_page = NULL;
> struct vm_fault vmf;
> int ret;
> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> ret = VM_FAULT_OOM;
> goto out;
> }
> + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> + ret = VM_FAULT_OOM;
> + page_cache_release(page);
> + goto out;
> + }
> + charged = 1;
> /*
> * Don't let another task, with possibly unlocked vma,
> * keep the mlocked page.
> @@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
>
> }
>
> - if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> - ret = VM_FAULT_OOM;
> - goto out;
> - }
> -
> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>
> /*
> @@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
> /* no need to invalidate: a not-present page won't be cached */
> update_mmu_cache(vma, address, entry);
> } else {
> - mem_cgroup_uncharge_page(page);
> - if (anon)
> + if (charged)
> + mem_cgroup_uncharge_page(page);
> + if (anon) {
> page_cache_release(page);
> - else
> + } else
> anon = 1; /* no anon but release faulted_page */
> }
>
checkpatch reports a warning here.
I think it should be like
@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, entry);
} else {
- mem_cgroup_uncharge_page(page);
+ if (charged)
+ mem_cgroup_uncharge_page(page);
if (anon)
page_cache_release(page);
else
Thanks,
Daisuke Nishimura.
> Index: mmotm-2.6.27-rc7+/mm/rmap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> +++ mmotm-2.6.27-rc7+/mm/rmap.c
> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> page_clear_dirty(page);
> set_page_dirty(page);
> }
> -
> - mem_cgroup_uncharge_page(page);
> + if (PageAnon(page))
> + mem_cgroup_uncharge_page(page);
> __dec_zone_page_state(page,
> PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
> /*
> Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
> +++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
> @@ -112,14 +112,22 @@ the per cgroup LRU.
>
> 2.2.1 Accounting details
>
> -All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
> -RSS pages are accounted at the time of page_add_*_rmap() unless they've already
> -been accounted for earlier. A file page will be accounted for as Page Cache;
> -it's mapped into the page tables of a process, duplicate accounting is carefully
> -avoided. Page Cache pages are accounted at the time of add_to_page_cache().
> -The corresponding routines that remove a page from the page tables or removes
> -a page from Page Cache is used to decrement the accounting counters of the
> -cgroup.
> +All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
> +(some pages which never be reclaimable and will not be on global LRU
> + are not accounted. we just accounts pages under usual vm management.)
> +
> +RSS pages are accounted at page_fault unless they've already been accounted
> +for earlier. A file page will be accounted for as Page Cache when it's
> +inserted into inode (radix-tree). While it's mapped into the page tables of
> +processes, duplicate accounting is carefully avoided.
> +
> +A RSS page is unaccounted when it's fully unmapped. A PageCache page is
> +unaccounted when it's removed from radix-tree.
> +
> +At page migration, accounting information is kept.
> +
> +Note: we just account pages-on-lru because our purpose is to control amount
> +of used pages. not-on-lru pages are tend to be out-of-control from vm view.
>
> 2.3 Shared Page Accounting
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
@ 2008-09-29 11:19 ` Daisuke Nishimura
0 siblings, 0 replies; 34+ messages in thread
From: Daisuke Nishimura @ 2008-09-29 11:19 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: nishimura, linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, Andrew Morton
On Mon, 29 Sep 2008 19:23:39 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
>
> When special_mapping_fault() is called, page->mapping is tend to be NULL
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
>
> This patch is for avoiding to account special pages.
>
> Changlog: v5 -> v6
> - modified Documentation.
> - fixed to charge only when a page is newly allocated.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
> mm/memory.c | 29 +++++++++++++----------------
> mm/rmap.c | 4 ++--
> 3 files changed, 31 insertions(+), 26 deletions(-)
>
> Index: mmotm-2.6.27-rc7+/mm/memory.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
> +++ mmotm-2.6.27-rc7+/mm/memory.c
> @@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
> pte_t *pte;
> spinlock_t *ptl;
>
> - retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
> - if (retval)
> - goto out;
> -
> retval = -EINVAL;
> if (PageAnon(page))
> - goto out_uncharge;
> + goto out;
> retval = -ENOMEM;
> flush_dcache_page(page);
> pte = get_locked_pte(mm, addr, &ptl);
> if (!pte)
> - goto out_uncharge;
> + goto out;
> retval = -EBUSY;
> if (!pte_none(*pte))
> goto out_unlock;
> @@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
> return retval;
> out_unlock:
> pte_unmap_unlock(pte, ptl);
> -out_uncharge:
> - mem_cgroup_uncharge_page(page);
> out:
> return retval;
> }
> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> struct page *page;
> pte_t entry;
> int anon = 0;
> + int charged = 0;
> struct page *dirty_page = NULL;
> struct vm_fault vmf;
> int ret;
> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> ret = VM_FAULT_OOM;
> goto out;
> }
> + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> + ret = VM_FAULT_OOM;
> + page_cache_release(page);
> + goto out;
> + }
> + charged = 1;
> /*
> * Don't let another task, with possibly unlocked vma,
> * keep the mlocked page.
> @@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
>
> }
>
> - if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> - ret = VM_FAULT_OOM;
> - goto out;
> - }
> -
> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>
> /*
> @@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
> /* no need to invalidate: a not-present page won't be cached */
> update_mmu_cache(vma, address, entry);
> } else {
> - mem_cgroup_uncharge_page(page);
> - if (anon)
> + if (charged)
> + mem_cgroup_uncharge_page(page);
> + if (anon) {
> page_cache_release(page);
> - else
> + } else
> anon = 1; /* no anon but release faulted_page */
> }
>
checkpatch reports a warning here.
I think it should be like
@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, entry);
} else {
- mem_cgroup_uncharge_page(page);
+ if (charged)
+ mem_cgroup_uncharge_page(page);
if (anon)
page_cache_release(page);
else
Thanks,
Daisuke Nishimura.
> Index: mmotm-2.6.27-rc7+/mm/rmap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> +++ mmotm-2.6.27-rc7+/mm/rmap.c
> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> page_clear_dirty(page);
> set_page_dirty(page);
> }
> -
> - mem_cgroup_uncharge_page(page);
> + if (PageAnon(page))
> + mem_cgroup_uncharge_page(page);
> __dec_zone_page_state(page,
> PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
> /*
> Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
> +++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
> @@ -112,14 +112,22 @@ the per cgroup LRU.
>
> 2.2.1 Accounting details
>
> -All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
> -RSS pages are accounted at the time of page_add_*_rmap() unless they've already
> -been accounted for earlier. A file page will be accounted for as Page Cache;
> -it's mapped into the page tables of a process, duplicate accounting is carefully
> -avoided. Page Cache pages are accounted at the time of add_to_page_cache().
> -The corresponding routines that remove a page from the page tables or removes
> -a page from Page Cache is used to decrement the accounting counters of the
> -cgroup.
> +All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
> +(some pages which never be reclaimable and will not be on global LRU
> + are not accounted. we just accounts pages under usual vm management.)
> +
> +RSS pages are accounted at page_fault unless they've already been accounted
> +for earlier. A file page will be accounted for as Page Cache when it's
> +inserted into inode (radix-tree). While it's mapped into the page tables of
> +processes, duplicate accounting is carefully avoided.
> +
> +A RSS page is unaccounted when it's fully unmapped. A PageCache page is
> +unaccounted when it's removed from radix-tree.
> +
> +At page migration, accounting information is kept.
> +
> +Note: we just account pages-on-lru because our purpose is to control amount
> +of used pages. not-on-lru pages are tend to be out-of-control from vm view.
>
> 2.3 Shared Page Accounting
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
2008-09-29 11:19 ` Daisuke Nishimura
@ 2008-09-29 11:59 ` kamezawa.hiroyu
-1 siblings, 0 replies; 34+ messages in thread
From: kamezawa.hiroyu @ 2008-09-29 11:59 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: KAMEZAWA Hiroyuki, nishimura, linux-mm, LKML, balbir, xemul,
Andrew Morton
----- Original Message -----
>On Mon, 29 Sep 2008 19:23:39 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fuj
itsu.com> wrote:
>> There are not-on-LRU pages which can be mapped and they are not worth to
>> be accounted. (becasue we can't shrink them and need dirty codes to handle
>> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
>> and don't want to account out-of-vm's control pages.
>>
>> When special_mapping_fault() is called, page->mapping is tend to be NULL
>> and it's charged as Anonymous page.
>> insert_page() also handles some special pages from drivers.
>>
>> This patch is for avoiding to account special pages.
>>
>> Changlog: v5 -> v6
>> - modified Documentation.
>> - fixed to charge only when a page is newly allocated.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
>> mm/memory.c | 29 +++++++++++++----------------
>> mm/rmap.c | 4 ++--
>> 3 files changed, 31 insertions(+), 26 deletions(-)
>>
>> Index: mmotm-2.6.27-rc7+/mm/memory.c
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
>> +++ mmotm-2.6.27-rc7+/mm/memory.c
>> @@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
>> pte_t *pte;
>> spinlock_t *ptl;
>>
>> - retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
>> - if (retval)
>> - goto out;
>> -
>> retval = -EINVAL;
>> if (PageAnon(page))
>> - goto out_uncharge;
>> + goto out;
>> retval = -ENOMEM;
>> flush_dcache_page(page);
>> pte = get_locked_pte(mm, addr, &ptl);
>> if (!pte)
>> - goto out_uncharge;
>> + goto out;
>> retval = -EBUSY;
>> if (!pte_none(*pte))
>> goto out_unlock;
>> @@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
>> return retval;
>> out_unlock:
>> pte_unmap_unlock(pte, ptl);
>> -out_uncharge:
>> - mem_cgroup_uncharge_page(page);
>> out:
>> return retval;
>> }
>> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
>> struct page *page;
>> pte_t entry;
>> int anon = 0;
>> + int charged = 0;
>> struct page *dirty_page = NULL;
>> struct vm_fault vmf;
>> int ret;
>> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
>> ret = VM_FAULT_OOM;
>> goto out;
>> }
>> + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
>> + ret = VM_FAULT_OOM;
>> + page_cache_release(page);
>> + goto out;
>> + }
>> + charged = 1;
>> /*
>> * Don't let another task, with possibly unlocked vma,
>> * keep the mlocked page.
>> @@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
>>
>> }
>>
>> - if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
>> - ret = VM_FAULT_OOM;
>> - goto out;
>> - }
>> -
>> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>>
>> /*
>> @@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
>> /* no need to invalidate: a not-present page won't be cached */
>> update_mmu_cache(vma, address, entry);
>> } else {
>> - mem_cgroup_uncharge_page(page);
>> - if (anon)
>> + if (charged)
>> + mem_cgroup_uncharge_page(page);
>> + if (anon) {
>> page_cache_release(page);
>> - else
>> + } else
>> anon = 1; /* no anon but release faulted_page */
>> }
>>
>
>checkpatch reports a warning here.
>
>I think it should be like
>
Oh, thanks. I'll post fixed one, tomorrow.
Thanks,
-Kame
>@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_a
rea_struct *vma,
> /* no need to invalidate: a not-present page won't be cached */
> update_mmu_cache(vma, address, entry);
> } else {
>- mem_cgroup_uncharge_page(page);
>+ if (charged)
>+ mem_cgroup_uncharge_page(page);
> if (anon)
> page_cache_release(page);
> else
>
>
>Thanks,
>Daisuke Nishimura.
>
>> Index: mmotm-2.6.27-rc7+/mm/rmap.c
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
>> +++ mmotm-2.6.27-rc7+/mm/rmap.c
>> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
>> page_clear_dirty(page);
>> set_page_dirty(page);
>> }
>> -
>> - mem_cgroup_uncharge_page(page);
>> + if (PageAnon(page))
>> + mem_cgroup_uncharge_page(page);
>> __dec_zone_page_state(page,
>> PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
>> /*
>> Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
>> +++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
>> @@ -112,14 +112,22 @@ the per cgroup LRU.
>>
>> 2.2.1 Accounting details
>>
>> -All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
>> -RSS pages are accounted at the time of page_add_*_rmap() unless they've al
ready
>> -been accounted for earlier. A file page will be accounted for as Page Cach
e;
>> -it's mapped into the page tables of a process, duplicate accounting is car
efully
>> -avoided. Page Cache pages are accounted at the time of add_to_page_cache()
.
>> -The corresponding routines that remove a page from the page tables or remo
ves
>> -a page from Page Cache is used to decrement the accounting counters of the
>> -cgroup.
>> +All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
>> +(some pages which never be reclaimable and will not be on global LRU
>> + are not accounted. we just accounts pages under usual vm management.)
>> +
>> +RSS pages are accounted at page_fault unless they've already been accounte
d
>> +for earlier. A file page will be accounted for as Page Cache when it's
>> +inserted into inode (radix-tree). While it's mapped into the page tables o
f
>> +processes, duplicate accounting is carefully avoided.
>> +
>> +A RSS page is unaccounted when it's fully unmapped. A PageCache page is
>> +unaccounted when it's removed from radix-tree.
>> +
>> +At page migration, accounting information is kept.
>> +
>> +Note: we just account pages-on-lru because our purpose is to control amoun
t
>> +of used pages. not-on-lru pages are tend to be out-of-control from vm view
.
>>
>> 2.3 Shared Page Accounting
>>
>>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
@ 2008-09-29 11:59 ` kamezawa.hiroyu
0 siblings, 0 replies; 34+ messages in thread
From: kamezawa.hiroyu @ 2008-09-29 11:59 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, balbir, xemul, Andrew Morton
----- Original Message -----
>On Mon, 29 Sep 2008 19:23:39 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fuj
itsu.com> wrote:
>> There are not-on-LRU pages which can be mapped and they are not worth to
>> be accounted. (becasue we can't shrink them and need dirty codes to handle
>> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
>> and don't want to account out-of-vm's control pages.
>>
>> When special_mapping_fault() is called, page->mapping is tend to be NULL
>> and it's charged as Anonymous page.
>> insert_page() also handles some special pages from drivers.
>>
>> This patch is for avoiding to account special pages.
>>
>> Changlog: v5 -> v6
>> - modified Documentation.
>> - fixed to charge only when a page is newly allocated.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
>> mm/memory.c | 29 +++++++++++++----------------
>> mm/rmap.c | 4 ++--
>> 3 files changed, 31 insertions(+), 26 deletions(-)
>>
>> Index: mmotm-2.6.27-rc7+/mm/memory.c
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
>> +++ mmotm-2.6.27-rc7+/mm/memory.c
>> @@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
>> pte_t *pte;
>> spinlock_t *ptl;
>>
>> - retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
>> - if (retval)
>> - goto out;
>> -
>> retval = -EINVAL;
>> if (PageAnon(page))
>> - goto out_uncharge;
>> + goto out;
>> retval = -ENOMEM;
>> flush_dcache_page(page);
>> pte = get_locked_pte(mm, addr, &ptl);
>> if (!pte)
>> - goto out_uncharge;
>> + goto out;
>> retval = -EBUSY;
>> if (!pte_none(*pte))
>> goto out_unlock;
>> @@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
>> return retval;
>> out_unlock:
>> pte_unmap_unlock(pte, ptl);
>> -out_uncharge:
>> - mem_cgroup_uncharge_page(page);
>> out:
>> return retval;
>> }
>> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
>> struct page *page;
>> pte_t entry;
>> int anon = 0;
>> + int charged = 0;
>> struct page *dirty_page = NULL;
>> struct vm_fault vmf;
>> int ret;
>> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
>> ret = VM_FAULT_OOM;
>> goto out;
>> }
>> + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
>> + ret = VM_FAULT_OOM;
>> + page_cache_release(page);
>> + goto out;
>> + }
>> + charged = 1;
>> /*
>> * Don't let another task, with possibly unlocked vma,
>> * keep the mlocked page.
>> @@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
>>
>> }
>>
>> - if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
>> - ret = VM_FAULT_OOM;
>> - goto out;
>> - }
>> -
>> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>>
>> /*
>> @@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
>> /* no need to invalidate: a not-present page won't be cached */
>> update_mmu_cache(vma, address, entry);
>> } else {
>> - mem_cgroup_uncharge_page(page);
>> - if (anon)
>> + if (charged)
>> + mem_cgroup_uncharge_page(page);
>> + if (anon) {
>> page_cache_release(page);
>> - else
>> + } else
>> anon = 1; /* no anon but release faulted_page */
>> }
>>
>
>checkpatch reports a warning here.
>
>I think it should be like
>
Oh, thanks. I'll post fixed one, tomorrow.
Thanks,
-Kame
>@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_a
rea_struct *vma,
> /* no need to invalidate: a not-present page won't be cached */
> update_mmu_cache(vma, address, entry);
> } else {
>- mem_cgroup_uncharge_page(page);
>+ if (charged)
>+ mem_cgroup_uncharge_page(page);
> if (anon)
> page_cache_release(page);
> else
>
>
>Thanks,
>Daisuke Nishimura.
>
>> Index: mmotm-2.6.27-rc7+/mm/rmap.c
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
>> +++ mmotm-2.6.27-rc7+/mm/rmap.c
>> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
>> page_clear_dirty(page);
>> set_page_dirty(page);
>> }
>> -
>> - mem_cgroup_uncharge_page(page);
>> + if (PageAnon(page))
>> + mem_cgroup_uncharge_page(page);
>> __dec_zone_page_state(page,
>> PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
>> /*
>> Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
>> +++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
>> @@ -112,14 +112,22 @@ the per cgroup LRU.
>>
>> 2.2.1 Accounting details
>>
>> -All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
>> -RSS pages are accounted at the time of page_add_*_rmap() unless they've al
ready
>> -been accounted for earlier. A file page will be accounted for as Page Cach
e;
>> -it's mapped into the page tables of a process, duplicate accounting is car
efully
>> -avoided. Page Cache pages are accounted at the time of add_to_page_cache()
.
>> -The corresponding routines that remove a page from the page tables or remo
ves
>> -a page from Page Cache is used to decrement the accounting counters of the
>> -cgroup.
>> +All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
>> +(some pages which never be reclaimable and will not be on global LRU
>> + are not accounted. we just accounts pages under usual vm management.)
>> +
>> +RSS pages are accounted at page_fault unless they've already been accounte
d
>> +for earlier. A file page will be accounted for as Page Cache when it's
>> +inserted into inode (radix-tree). While it's mapped into the page tables o
f
>> +processes, duplicate accounting is carefully avoided.
>> +
>> +A RSS page is unaccounted when it's fully unmapped. A PageCache page is
>> +unaccounted when it's removed from radix-tree.
>> +
>> +At page migration, accounting information is kept.
>> +
>> +Note: we just account pages-on-lru because our purpose is to control amoun
t
>> +of used pages. not-on-lru pages are tend to be out-of-control from vm view
.
>>
>> 2.3 Shared Page Accounting
>>
>>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
2008-09-29 10:23 ` KAMEZAWA Hiroyuki
@ 2008-09-30 1:17 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-30 1:17 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, nishimura@mxp.nes.nec.co.jp, Andrew Morton
This is conding-style fixed version. Thank you, Nishimura-san.
-Kmae
==
There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to handle
specical case) We'd like to make use of usual objrmap/radix-tree's protcol
and don't want to account out-of-vm's control pages.
When special_mapping_fault() is called, page->mapping is tend to be NULL
and it's charged as Anonymous page.
insert_page() also handles some special pages from drivers.
This patch is for avoiding to account special pages.
Changlog: v5 -> v6
- modified Documentation.
- fixed to charge only when a page is newly allocated.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
mm/memory.c | 25 +++++++++++--------------
mm/rmap.c | 4 ++--
3 files changed, 29 insertions(+), 24 deletions(-)
Index: mmotm-2.6.27-rc7+/mm/memory.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memory.c
+++ mmotm-2.6.27-rc7+/mm/memory.c
@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
pte_t *pte;
spinlock_t *ptl;
- retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
- if (retval)
- goto out;
-
retval = -EINVAL;
if (PageAnon(page))
- goto out_uncharge;
+ goto out;
retval = -ENOMEM;
flush_dcache_page(page);
pte = get_locked_pte(mm, addr, &ptl);
if (!pte)
- goto out_uncharge;
+ goto out;
retval = -EBUSY;
if (!pte_none(*pte))
goto out_unlock;
@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
return retval;
out_unlock:
pte_unmap_unlock(pte, ptl);
-out_uncharge:
- mem_cgroup_uncharge_page(page);
out:
return retval;
}
@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
struct page *page;
pte_t entry;
int anon = 0;
+ int charged = 0;
struct page *dirty_page = NULL;
struct vm_fault vmf;
int ret;
@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
ret = VM_FAULT_OOM;
goto out;
}
+ if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
+ ret = VM_FAULT_OOM;
+ page_cache_release(page);
+ goto out;
+ }
+ charged = 1;
/*
* Don't let another task, with possibly unlocked vma,
* keep the mlocked page.
@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
}
- if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
- ret = VM_FAULT_OOM;
- goto out;
- }
-
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
/*
@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *
/* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, entry);
} else {
- mem_cgroup_uncharge_page(page);
+ if (charged)
+ mem_cgroup_uncharge_page(page);
if (anon)
page_cache_release(page);
else
Index: mmotm-2.6.27-rc7+/mm/rmap.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/rmap.c
+++ mmotm-2.6.27-rc7+/mm/rmap.c
@@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
page_clear_dirty(page);
set_page_dirty(page);
}
-
- mem_cgroup_uncharge_page(page);
+ if (PageAnon(page))
+ mem_cgroup_uncharge_page(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
/*
Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
===================================================================
--- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
+++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
@@ -112,14 +112,22 @@ the per cgroup LRU.
2.2.1 Accounting details
-All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
-RSS pages are accounted at the time of page_add_*_rmap() unless they've already
-been accounted for earlier. A file page will be accounted for as Page Cache;
-it's mapped into the page tables of a process, duplicate accounting is carefully
-avoided. Page Cache pages are accounted at the time of add_to_page_cache().
-The corresponding routines that remove a page from the page tables or removes
-a page from Page Cache is used to decrement the accounting counters of the
-cgroup.
+All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
+(some pages which never be reclaimable and will not be on global LRU
+ are not accounted. we just accounts pages under usual vm management.)
+
+RSS pages are accounted at page_fault unless they've already been accounted
+for earlier. A file page will be accounted for as Page Cache when it's
+inserted into inode (radix-tree). While it's mapped into the page tables of
+processes, duplicate accounting is carefully avoided.
+
+A RSS page is unaccounted when it's fully unmapped. A PageCache page is
+unaccounted when it's removed from radix-tree.
+
+At page migration, accounting information is kept.
+
+Note: we just account pages-on-lru because our purpose is to control amount
+of used pages. not-on-lru pages are tend to be out-of-control from vm view.
2.3 Shared Page Accounting
^ permalink raw reply [flat|nested] 34+ messages in thread* [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
@ 2008-09-30 1:17 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-30 1:17 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, nishimura@mxp.nes.nec.co.jp, Andrew Morton
This is conding-style fixed version. Thank you, Nishimura-san.
-Kmae
==
There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to handle
specical case) We'd like to make use of usual objrmap/radix-tree's protcol
and don't want to account out-of-vm's control pages.
When special_mapping_fault() is called, page->mapping is tend to be NULL
and it's charged as Anonymous page.
insert_page() also handles some special pages from drivers.
This patch is for avoiding to account special pages.
Changlog: v5 -> v6
- modified Documentation.
- fixed to charge only when a page is newly allocated.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Documentation/controllers/memory.txt | 24 ++++++++++++++++--------
mm/memory.c | 25 +++++++++++--------------
mm/rmap.c | 4 ++--
3 files changed, 29 insertions(+), 24 deletions(-)
Index: mmotm-2.6.27-rc7+/mm/memory.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memory.c
+++ mmotm-2.6.27-rc7+/mm/memory.c
@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
pte_t *pte;
spinlock_t *ptl;
- retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
- if (retval)
- goto out;
-
retval = -EINVAL;
if (PageAnon(page))
- goto out_uncharge;
+ goto out;
retval = -ENOMEM;
flush_dcache_page(page);
pte = get_locked_pte(mm, addr, &ptl);
if (!pte)
- goto out_uncharge;
+ goto out;
retval = -EBUSY;
if (!pte_none(*pte))
goto out_unlock;
@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
return retval;
out_unlock:
pte_unmap_unlock(pte, ptl);
-out_uncharge:
- mem_cgroup_uncharge_page(page);
out:
return retval;
}
@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
struct page *page;
pte_t entry;
int anon = 0;
+ int charged = 0;
struct page *dirty_page = NULL;
struct vm_fault vmf;
int ret;
@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
ret = VM_FAULT_OOM;
goto out;
}
+ if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
+ ret = VM_FAULT_OOM;
+ page_cache_release(page);
+ goto out;
+ }
+ charged = 1;
/*
* Don't let another task, with possibly unlocked vma,
* keep the mlocked page.
@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
}
- if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
- ret = VM_FAULT_OOM;
- goto out;
- }
-
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
/*
@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *
/* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, entry);
} else {
- mem_cgroup_uncharge_page(page);
+ if (charged)
+ mem_cgroup_uncharge_page(page);
if (anon)
page_cache_release(page);
else
Index: mmotm-2.6.27-rc7+/mm/rmap.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/rmap.c
+++ mmotm-2.6.27-rc7+/mm/rmap.c
@@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
page_clear_dirty(page);
set_page_dirty(page);
}
-
- mem_cgroup_uncharge_page(page);
+ if (PageAnon(page))
+ mem_cgroup_uncharge_page(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
/*
Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
===================================================================
--- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
+++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
@@ -112,14 +112,22 @@ the per cgroup LRU.
2.2.1 Accounting details
-All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
-RSS pages are accounted at the time of page_add_*_rmap() unless they've already
-been accounted for earlier. A file page will be accounted for as Page Cache;
-it's mapped into the page tables of a process, duplicate accounting is carefully
-avoided. Page Cache pages are accounted at the time of add_to_page_cache().
-The corresponding routines that remove a page from the page tables or removes
-a page from Page Cache is used to decrement the accounting counters of the
-cgroup.
+All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
+(some pages which never be reclaimable and will not be on global LRU
+ are not accounted. we just accounts pages under usual vm management.)
+
+RSS pages are accounted at page_fault unless they've already been accounted
+for earlier. A file page will be accounted for as Page Cache when it's
+inserted into inode (radix-tree). While it's mapped into the page tables of
+processes, duplicate accounting is carefully avoided.
+
+A RSS page is unaccounted when it's fully unmapped. A PageCache page is
+unaccounted when it's removed from radix-tree.
+
+At page migration, accounting information is kept.
+
+Note: we just account pages-on-lru because our purpose is to control amount
+of used pages. not-on-lru pages are tend to be out-of-control from vm view.
2.3 Shared Page Accounting
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
2008-09-30 1:17 ` KAMEZAWA Hiroyuki
@ 2008-10-01 3:49 ` Balbir Singh
-1 siblings, 0 replies; 34+ messages in thread
From: Balbir Singh @ 2008-10-01 3:49 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, xemul@openvz.org,
nishimura@mxp.nes.nec.co.jp, Andrew Morton
KAMEZAWA Hiroyuki wrote:
> This is conding-style fixed version. Thank you, Nishimura-san.
> -Kmae
> ==
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
>
> When special_mapping_fault() is called, page->mapping is tend to be NULL
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
>
> This patch is for avoiding to account special pages.
>
> Changlog: v5 -> v6
> - modified Documentation.
> - fixed to charge only when a page is newly allocated.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
[snip]
> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> struct page *page;
> pte_t entry;
> int anon = 0;
> + int charged = 0;
> struct page *dirty_page = NULL;
> struct vm_fault vmf;
> int ret;
> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> ret = VM_FAULT_OOM;
> goto out;
> }
> + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> + ret = VM_FAULT_OOM;
> + page_cache_release(page);
> + goto out;
> + }
> + charged = 1;
If I understand this correctly, we now account only when the VMA is not shared?
Seems reasonable, since we don't allocate a page otherwise.
[snip]
> Index: mmotm-2.6.27-rc7+/mm/rmap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> +++ mmotm-2.6.27-rc7+/mm/rmap.c
> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> page_clear_dirty(page);
> set_page_dirty(page);
> }
> -
> - mem_cgroup_uncharge_page(page);
> + if (PageAnon(page))
> + mem_cgroup_uncharge_page(page);
Is the change because we expect the page to get directly uncharged when it is
removed from cache? i.e, page->mapping is set to NULL before uncharge?
Looks good to me, I am yet to test it though.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Balbir
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
@ 2008-10-01 3:49 ` Balbir Singh
0 siblings, 0 replies; 34+ messages in thread
From: Balbir Singh @ 2008-10-01 3:49 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, xemul@openvz.org,
nishimura@mxp.nes.nec.co.jp, Andrew Morton
KAMEZAWA Hiroyuki wrote:
> This is conding-style fixed version. Thank you, Nishimura-san.
> -Kmae
> ==
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
>
> When special_mapping_fault() is called, page->mapping is tend to be NULL
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
>
> This patch is for avoiding to account special pages.
>
> Changlog: v5 -> v6
> - modified Documentation.
> - fixed to charge only when a page is newly allocated.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
[snip]
> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> struct page *page;
> pte_t entry;
> int anon = 0;
> + int charged = 0;
> struct page *dirty_page = NULL;
> struct vm_fault vmf;
> int ret;
> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> ret = VM_FAULT_OOM;
> goto out;
> }
> + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> + ret = VM_FAULT_OOM;
> + page_cache_release(page);
> + goto out;
> + }
> + charged = 1;
If I understand this correctly, we now account only when the VMA is not shared?
Seems reasonable, since we don't allocate a page otherwise.
[snip]
> Index: mmotm-2.6.27-rc7+/mm/rmap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> +++ mmotm-2.6.27-rc7+/mm/rmap.c
> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> page_clear_dirty(page);
> set_page_dirty(page);
> }
> -
> - mem_cgroup_uncharge_page(page);
> + if (PageAnon(page))
> + mem_cgroup_uncharge_page(page);
Is the change because we expect the page to get directly uncharged when it is
removed from cache? i.e, page->mapping is set to NULL before uncharge?
Looks good to me, I am yet to test it though.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
2008-10-01 3:49 ` Balbir Singh
@ 2008-10-01 4:50 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-10-01 4:50 UTC (permalink / raw)
To: balbir
Cc: linux-mm@kvack.org, LKML, xemul@openvz.org,
nishimura@mxp.nes.nec.co.jp, Andrew Morton
On Wed, 01 Oct 2008 09:19:10 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > This is conding-style fixed version. Thank you, Nishimura-san.
> > -Kmae
> > ==
> > There are not-on-LRU pages which can be mapped and they are not worth to
> > be accounted. (becasue we can't shrink them and need dirty codes to handle
> > specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> > and don't want to account out-of-vm's control pages.
> >
> > When special_mapping_fault() is called, page->mapping is tend to be NULL
> > and it's charged as Anonymous page.
> > insert_page() also handles some special pages from drivers.
> >
> > This patch is for avoiding to account special pages.
> >
> > Changlog: v5 -> v6
> > - modified Documentation.
> > - fixed to charge only when a page is newly allocated.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
>
> [snip]
> > @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> > struct page *page;
> > pte_t entry;
> > int anon = 0;
> > + int charged = 0;
> > struct page *dirty_page = NULL;
> > struct vm_fault vmf;
> > int ret;
> > @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> > ret = VM_FAULT_OOM;
> > goto out;
> > }
> > + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> > + ret = VM_FAULT_OOM;
> > + page_cache_release(page);
> > + goto out;
> > + }
> > + charged = 1;
>
> If I understand this correctly, we now account only when the VMA is not shared?
> Seems reasonable, since we don't allocate a page otherwise.
>
When we allocate a new page. If in page-cache, it's accounted at radix-tree.
>
> [snip]
>
>
> > Index: mmotm-2.6.27-rc7+/mm/rmap.c
> > ===================================================================
> > --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> > +++ mmotm-2.6.27-rc7+/mm/rmap.c
> > @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> > page_clear_dirty(page);
> > set_page_dirty(page);
> > }
> > -
> > - mem_cgroup_uncharge_page(page);
> > + if (PageAnon(page))
> > + mem_cgroup_uncharge_page(page);
>
> Is the change because we expect the page to get directly uncharged when it is
> removed from cache? i.e, page->mapping is set to NULL before uncharge?
>
yes. I expect deletion from radix-tree catch page-cache.
> Looks good to me, I am yet to test it though.
>
Thanks,
-Kame
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>
>
> --
> Balbir
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
@ 2008-10-01 4:50 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-10-01 4:50 UTC (permalink / raw)
To: balbir
Cc: linux-mm@kvack.org, LKML, xemul@openvz.org,
nishimura@mxp.nes.nec.co.jp, Andrew Morton
On Wed, 01 Oct 2008 09:19:10 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > This is conding-style fixed version. Thank you, Nishimura-san.
> > -Kmae
> > ==
> > There are not-on-LRU pages which can be mapped and they are not worth to
> > be accounted. (becasue we can't shrink them and need dirty codes to handle
> > specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> > and don't want to account out-of-vm's control pages.
> >
> > When special_mapping_fault() is called, page->mapping is tend to be NULL
> > and it's charged as Anonymous page.
> > insert_page() also handles some special pages from drivers.
> >
> > This patch is for avoiding to account special pages.
> >
> > Changlog: v5 -> v6
> > - modified Documentation.
> > - fixed to charge only when a page is newly allocated.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
>
> [snip]
> > @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> > struct page *page;
> > pte_t entry;
> > int anon = 0;
> > + int charged = 0;
> > struct page *dirty_page = NULL;
> > struct vm_fault vmf;
> > int ret;
> > @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> > ret = VM_FAULT_OOM;
> > goto out;
> > }
> > + if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> > + ret = VM_FAULT_OOM;
> > + page_cache_release(page);
> > + goto out;
> > + }
> > + charged = 1;
>
> If I understand this correctly, we now account only when the VMA is not shared?
> Seems reasonable, since we don't allocate a page otherwise.
>
When we allocate a new page. If in page-cache, it's accounted at radix-tree.
>
> [snip]
>
>
> > Index: mmotm-2.6.27-rc7+/mm/rmap.c
> > ===================================================================
> > --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> > +++ mmotm-2.6.27-rc7+/mm/rmap.c
> > @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> > page_clear_dirty(page);
> > set_page_dirty(page);
> > }
> > -
> > - mem_cgroup_uncharge_page(page);
> > + if (PageAnon(page))
> > + mem_cgroup_uncharge_page(page);
>
> Is the change because we expect the page to get directly uncharged when it is
> removed from cache? i.e, page->mapping is set to NULL before uncharge?
>
yes. I expect deletion from radix-tree catch page-cache.
> Looks good to me, I am yet to test it though.
>
Thanks,
-Kame
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>
>
> --
> Balbir
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
2008-09-29 10:23 ` KAMEZAWA Hiroyuki
@ 2008-09-30 8:14 ` Balbir Singh
-1 siblings, 0 replies; 34+ messages in thread
From: Balbir Singh @ 2008-09-30 8:14 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, xemul@openvz.org,
nishimura@mxp.nes.nec.co.jp, Andrew Morton
KAMEZAWA Hiroyuki wrote:
> + if (charged)
> + mem_cgroup_uncharge_page(page);
> + if (anon) {
> page_cache_release(page);
> - else
> + } else
> anon = 1; /* no anon but release faulted_page */
> }
Coding style braces for a single-line if-else, please recheck
--
Balbir
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
@ 2008-09-30 8:14 ` Balbir Singh
0 siblings, 0 replies; 34+ messages in thread
From: Balbir Singh @ 2008-09-30 8:14 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, xemul@openvz.org,
nishimura@mxp.nes.nec.co.jp, Andrew Morton
KAMEZAWA Hiroyuki wrote:
> + if (charged)
> + mem_cgroup_uncharge_page(page);
> + if (anon) {
> page_cache_release(page);
> - else
> + } else
> anon = 1; /* no anon but release faulted_page */
> }
Coding style braces for a single-line if-else, please recheck
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread