* [PATCH v2] memcg: fix leak on wrong LRU with FUSE
@ 2011-03-08 4:56 KAMEZAWA Hiroyuki
2011-03-08 9:18 ` Daisuke Nishimura
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-03-08 4:56 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp,
akpm@linux-foundation.org, balbir@linux.vnet.ibm.com
fs/fuse/dev.c::fuse_try_move_page() does
(1) remove a page by ->steal()
(2) re-add the page to page cache
(3) link the page to LRU if it was not on LRU at (1)
This implies the page is _on_ LRU when it's added to radix-tree.
So, the page is added to memory cgroup while it's on LRU.
because LRU is lazy and no one flushs it.
This is the same behavior as SwapCache and needs special care as
- remove page from LRU before overwrite pc->mem_cgroup.
- add page to LRU after overwrite pc->mem_cgroup.
And we need to taking care of pagevec.
If PageLRU(page) is set before we add PCG_USED bit, the page
will not be added to memcg's LRU (in short period).
So, regardlress of PageLRU(page) value before commit_charge(),
we need to check PageLRU(page) after commit_charge().
Changelog:
- clean up.
- cover !PageLRU() by pagevec case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++-------------------
1 file changed, 34 insertions(+), 19 deletions(-)
Index: mmotm-0303/mm/memcontrol.c
===================================================================
--- mmotm-0303.orig/mm/memcontrol.c
+++ mmotm-0303/mm/memcontrol.c
@@ -926,13 +926,12 @@ void mem_cgroup_add_lru_list(struct page
}
/*
- * At handling SwapCache, pc->mem_cgroup may be changed while it's linked to
- * lru because the page may.be reused after it's fully uncharged (because of
- * SwapCache behavior).To handle that, unlink page_cgroup from LRU when charge
- * it again. This function is only used to charge SwapCache. It's done under
- * lock_page and expected that zone->lru_lock is never held.
+ * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
+ * while it's linked to lru because the page may be reused after it's fully
+ * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
+ * It's done under lock_page and expected that zone->lru_lock isnever held.
*/
-static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_del_before_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
@@ -948,7 +947,7 @@ static void mem_cgroup_lru_del_before_co
spin_unlock_irqrestore(&zone->lru_lock, flags);
}
-static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_add_after_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
@@ -2431,9 +2430,28 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype);
+static void
+__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
+ enum charge_type ctype)
+{
+ struct page_cgroup *pc = lookup_page_cgroup(page);
+ /*
+ * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
+ * is already on LRU. It means the page may on some other page_cgroup's
+ * LRU. Take care of it.
+ */
+ if (unlikely(PageLRU(page)))
+ mem_cgroup_lru_del_before_commit(page);
+ __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
+ if (unlikely(PageLRU(page)))
+ mem_cgroup_lru_add_after_commit(page);
+ return;
+}
+
int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
+ struct mem_cgroup *mem = NULL;
int ret;
if (mem_cgroup_disabled())
@@ -2468,14 +2486,15 @@ int mem_cgroup_cache_charge(struct page
if (unlikely(!mm))
mm = &init_mm;
- if (page_is_file_cache(page))
- return mem_cgroup_charge_common(page, mm, gfp_mask,
- MEM_CGROUP_CHARGE_TYPE_CACHE);
-
+ if (page_is_file_cache(page)) {
+ ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
+ if (ret || !mem)
+ return ret;
+ __mem_cgroup_commit_charge_lrucare(page, mem,
+ MEM_CGROUP_CHARGE_TYPE_CACHE);
+ }
/* shmem */
if (PageSwapCache(page)) {
- struct mem_cgroup *mem;
-
ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
if (!ret)
__mem_cgroup_commit_charge_swapin(page, mem,
@@ -2532,17 +2551,13 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype)
{
- struct page_cgroup *pc;
-
if (mem_cgroup_disabled())
return;
if (!ptr)
return;
cgroup_exclude_rmdir(&ptr->css);
- pc = lookup_page_cgroup(page);
- mem_cgroup_lru_del_before_commit_swapcache(page);
- __mem_cgroup_commit_charge(ptr, page, 1, pc, ctype);
- mem_cgroup_lru_add_after_commit_swapcache(page);
+
+ __mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
/*
* Now swap is on-memory. This means this page may be
* counted both as mem and swap....double count.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] memcg: fix leak on wrong LRU with FUSE
2011-03-08 4:56 [PATCH v2] memcg: fix leak on wrong LRU with FUSE KAMEZAWA Hiroyuki
@ 2011-03-08 9:18 ` Daisuke Nishimura
2011-03-09 6:07 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: Daisuke Nishimura @ 2011-03-08 9:18 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, balbir@linux.vnet.ibm.com,
Daisuke Nishimura
On Tue, 8 Mar 2011 13:56:12 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> fs/fuse/dev.c::fuse_try_move_page() does
>
> (1) remove a page by ->steal()
> (2) re-add the page to page cache
> (3) link the page to LRU if it was not on LRU at (1)
>
> This implies the page is _on_ LRU when it's added to radix-tree.
> So, the page is added to memory cgroup while it's on LRU.
> because LRU is lazy and no one flushs it.
>
> This is the same behavior as SwapCache and needs special care as
> - remove page from LRU before overwrite pc->mem_cgroup.
> - add page to LRU after overwrite pc->mem_cgroup.
>
> And we need to taking care of pagevec.
>
> If PageLRU(page) is set before we add PCG_USED bit, the page
> will not be added to memcg's LRU (in short period).
> So, regardlress of PageLRU(page) value before commit_charge(),
> we need to check PageLRU(page) after commit_charge().
>
> Changelog:
> - clean up.
> - cover !PageLRU() by pagevec case.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 34 insertions(+), 19 deletions(-)
>
> Index: mmotm-0303/mm/memcontrol.c
> ===================================================================
> --- mmotm-0303.orig/mm/memcontrol.c
> +++ mmotm-0303/mm/memcontrol.c
> @@ -926,13 +926,12 @@ void mem_cgroup_add_lru_list(struct page
> }
>
> /*
> - * At handling SwapCache, pc->mem_cgroup may be changed while it's linked to
> - * lru because the page may.be reused after it's fully uncharged (because of
> - * SwapCache behavior).To handle that, unlink page_cgroup from LRU when charge
> - * it again. This function is only used to charge SwapCache. It's done under
> - * lock_page and expected that zone->lru_lock is never held.
> + * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
> + * while it's linked to lru because the page may be reused after it's fully
> + * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
> + * It's done under lock_page and expected that zone->lru_lock isnever held.
> */
> -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_del_before_commit(struct page *page)
> {
> unsigned long flags;
> struct zone *zone = page_zone(page);
> @@ -948,7 +947,7 @@ static void mem_cgroup_lru_del_before_co
> spin_unlock_irqrestore(&zone->lru_lock, flags);
> }
>
> -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_add_after_commit(struct page *page)
> {
> unsigned long flags;
> struct zone *zone = page_zone(page);
> @@ -2431,9 +2430,28 @@ static void
> __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> enum charge_type ctype);
>
> +static void
> +__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
> + enum charge_type ctype)
> +{
> + struct page_cgroup *pc = lookup_page_cgroup(page);
> + /*
> + * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
> + * is already on LRU. It means the page may on some other page_cgroup's
> + * LRU. Take care of it.
> + */
> + if (unlikely(PageLRU(page)))
> + mem_cgroup_lru_del_before_commit(page);
> + __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
> + if (unlikely(PageLRU(page)))
> + mem_cgroup_lru_add_after_commit(page);
> + return;
> +}
> +
> int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> gfp_t gfp_mask)
> {
> + struct mem_cgroup *mem = NULL;
> int ret;
>
> if (mem_cgroup_disabled())
> @@ -2468,14 +2486,15 @@ int mem_cgroup_cache_charge(struct page
> if (unlikely(!mm))
> mm = &init_mm;
>
> - if (page_is_file_cache(page))
> - return mem_cgroup_charge_common(page, mm, gfp_mask,
> - MEM_CGROUP_CHARGE_TYPE_CACHE);
> -
> + if (page_is_file_cache(page)) {
> + ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
> + if (ret || !mem)
> + return ret;
> + __mem_cgroup_commit_charge_lrucare(page, mem,
> + MEM_CGROUP_CHARGE_TYPE_CACHE);
> + }
We should do "return 0" here, or do:
} else {
/* shmem */
if (PageSwapCache(page)) {
..
} else {
..
}
}
Otherwise, the page cache will be charged twice.
Thanks,
Daisuke Nishimura.
> /* shmem */
> if (PageSwapCache(page)) {
> - struct mem_cgroup *mem;
> -
> ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
> if (!ret)
> __mem_cgroup_commit_charge_swapin(page, mem,
> @@ -2532,17 +2551,13 @@ static void
> __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> enum charge_type ctype)
> {
> - struct page_cgroup *pc;
> -
> if (mem_cgroup_disabled())
> return;
> if (!ptr)
> return;
> cgroup_exclude_rmdir(&ptr->css);
> - pc = lookup_page_cgroup(page);
> - mem_cgroup_lru_del_before_commit_swapcache(page);
> - __mem_cgroup_commit_charge(ptr, page, 1, pc, ctype);
> - mem_cgroup_lru_add_after_commit_swapcache(page);
> +
> + __mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
> /*
> * Now swap is on-memory. This means this page may be
> * counted both as mem and swap....double count.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] memcg: fix leak on wrong LRU with FUSE
2011-03-08 9:18 ` Daisuke Nishimura
@ 2011-03-09 6:07 ` KAMEZAWA Hiroyuki
2011-03-09 7:48 ` [PATCH v3] " KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-03-09 6:07 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, balbir@linux.vnet.ibm.com
On Tue, 8 Mar 2011 18:18:32 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> On Tue, 8 Mar 2011 13:56:12 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> > fs/fuse/dev.c::fuse_try_move_page() does
> >
> > (1) remove a page by ->steal()
> > (2) re-add the page to page cache
> > (3) link the page to LRU if it was not on LRU at (1)
> >
> > This implies the page is _on_ LRU when it's added to radix-tree.
> > So, the page is added to memory cgroup while it's on LRU.
> > because LRU is lazy and no one flushs it.
> >
> > This is the same behavior as SwapCache and needs special care as
> > - remove page from LRU before overwrite pc->mem_cgroup.
> > - add page to LRU after overwrite pc->mem_cgroup.
> >
> > And we need to taking care of pagevec.
> >
> > If PageLRU(page) is set before we add PCG_USED bit, the page
> > will not be added to memcg's LRU (in short period).
> > So, regardlress of PageLRU(page) value before commit_charge(),
> > we need to check PageLRU(page) after commit_charge().
> >
> > Changelog:
> > - clean up.
> > - cover !PageLRU() by pagevec case.
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++-------------------
> > 1 file changed, 34 insertions(+), 19 deletions(-)
> >
> > Index: mmotm-0303/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0303.orig/mm/memcontrol.c
> > +++ mmotm-0303/mm/memcontrol.c
> > @@ -926,13 +926,12 @@ void mem_cgroup_add_lru_list(struct page
> > }
> >
> > /*
> > - * At handling SwapCache, pc->mem_cgroup may be changed while it's linked to
> > - * lru because the page may.be reused after it's fully uncharged (because of
> > - * SwapCache behavior).To handle that, unlink page_cgroup from LRU when charge
> > - * it again. This function is only used to charge SwapCache. It's done under
> > - * lock_page and expected that zone->lru_lock is never held.
> > + * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
> > + * while it's linked to lru because the page may be reused after it's fully
> > + * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
> > + * It's done under lock_page and expected that zone->lru_lock isnever held.
> > */
> > -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> > +static void mem_cgroup_lru_del_before_commit(struct page *page)
> > {
> > unsigned long flags;
> > struct zone *zone = page_zone(page);
> > @@ -948,7 +947,7 @@ static void mem_cgroup_lru_del_before_co
> > spin_unlock_irqrestore(&zone->lru_lock, flags);
> > }
> >
> > -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> > +static void mem_cgroup_lru_add_after_commit(struct page *page)
> > {
> > unsigned long flags;
> > struct zone *zone = page_zone(page);
> > @@ -2431,9 +2430,28 @@ static void
> > __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> > enum charge_type ctype);
> >
> > +static void
> > +__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
> > + enum charge_type ctype)
> > +{
> > + struct page_cgroup *pc = lookup_page_cgroup(page);
> > + /*
> > + * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
> > + * is already on LRU. It means the page may on some other page_cgroup's
> > + * LRU. Take care of it.
> > + */
> > + if (unlikely(PageLRU(page)))
> > + mem_cgroup_lru_del_before_commit(page);
> > + __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
> > + if (unlikely(PageLRU(page)))
> > + mem_cgroup_lru_add_after_commit(page);
> > + return;
> > +}
> > +
> > int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > gfp_t gfp_mask)
> > {
> > + struct mem_cgroup *mem = NULL;
> > int ret;
> >
> > if (mem_cgroup_disabled())
> > @@ -2468,14 +2486,15 @@ int mem_cgroup_cache_charge(struct page
> > if (unlikely(!mm))
> > mm = &init_mm;
> >
> > - if (page_is_file_cache(page))
> > - return mem_cgroup_charge_common(page, mm, gfp_mask,
> > - MEM_CGROUP_CHARGE_TYPE_CACHE);
> > -
> > + if (page_is_file_cache(page)) {
> > + ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
> > + if (ret || !mem)
> > + return ret;
> > + __mem_cgroup_commit_charge_lrucare(page, mem,
> > + MEM_CGROUP_CHARGE_TYPE_CACHE);
> > + }
> We should do "return 0" here, or do:
>
> } else {
> /* shmem */
> if (PageSwapCache(page)) {
> ..
> } else {
> ..
> }
> }
>
> Otherwise, the page cache will be charged twice.
>
Ahh, thanks. I'll send v3.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3] memcg: fix leak on wrong LRU with FUSE
2011-03-09 6:07 ` KAMEZAWA Hiroyuki
@ 2011-03-09 7:48 ` KAMEZAWA Hiroyuki
2011-03-09 10:00 ` Johannes Weiner
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-03-09 7:48 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, akpm@linux-foundation.org,
balbir@linux.vnet.ibm.com
On Wed, 9 Mar 2011 15:07:50 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > } else {
> > /* shmem */
> > if (PageSwapCache(page)) {
> > ..
> > } else {
> > ..
> > }
> > }
> >
> > Otherwise, the page cache will be charged twice.
> >
>
> Ahh, thanks. I'll send v3.
>
Okay, this is a fixed one.
==
fs/fuse/dev.c::fuse_try_move_page() does
(1) remove a page by ->steal()
(2) re-add the page to page cache
(3) link the page to LRU if it was not on LRU at (1)
This implies the page is _on_ LRU when it's added to radix-tree.
So, the page is added to memory cgroup while it's on LRU.
because LRU is lazy and no one flushs it.
This is the same behavior as SwapCache and needs special care as
- remove page from LRU before overwrite pc->mem_cgroup.
- add page to LRU after overwrite pc->mem_cgroup.
And we need to taking care of pagevec.
If PageLRU(page) is set before we add PCG_USED bit, the page
will not be added to memcg's LRU (in short period).
So, regardlress of PageLRU(page) value before commit_charge(),
we need to check PageLRU(page) after commit_charge().
Changelog v2=>v3:
- fixed double accounting.
Changelog v1=>v2:
- clean up.
- cover !PageLRU() by pagevec case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 54 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 35 insertions(+), 19 deletions(-)
Index: mmotm-0303/mm/memcontrol.c
===================================================================
--- mmotm-0303.orig/mm/memcontrol.c
+++ mmotm-0303/mm/memcontrol.c
@@ -926,13 +926,12 @@ void mem_cgroup_add_lru_list(struct page
}
/*
- * At handling SwapCache, pc->mem_cgroup may be changed while it's linked to
- * lru because the page may.be reused after it's fully uncharged (because of
- * SwapCache behavior).To handle that, unlink page_cgroup from LRU when charge
- * it again. This function is only used to charge SwapCache. It's done under
- * lock_page and expected that zone->lru_lock is never held.
+ * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
+ * while it's linked to lru because the page may be reused after it's fully
+ * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
+ * It's done under lock_page and expected that zone->lru_lock isnever held.
*/
-static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_del_before_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
@@ -948,7 +947,7 @@ static void mem_cgroup_lru_del_before_co
spin_unlock_irqrestore(&zone->lru_lock, flags);
}
-static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_add_after_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
@@ -2431,9 +2430,28 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype);
+static void
+__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
+ enum charge_type ctype)
+{
+ struct page_cgroup *pc = lookup_page_cgroup(page);
+ /*
+ * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
+ * is already on LRU. It means the page may on some other page_cgroup's
+ * LRU. Take care of it.
+ */
+ if (unlikely(PageLRU(page)))
+ mem_cgroup_lru_del_before_commit(page);
+ __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
+ if (unlikely(PageLRU(page)))
+ mem_cgroup_lru_add_after_commit(page);
+ return;
+}
+
int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
+ struct mem_cgroup *mem = NULL;
int ret;
if (mem_cgroup_disabled())
@@ -2468,14 +2486,16 @@ int mem_cgroup_cache_charge(struct page
if (unlikely(!mm))
mm = &init_mm;
- if (page_is_file_cache(page))
- return mem_cgroup_charge_common(page, mm, gfp_mask,
- MEM_CGROUP_CHARGE_TYPE_CACHE);
-
+ if (page_is_file_cache(page)) {
+ ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
+ if (ret || !mem)
+ return ret;
+ __mem_cgroup_commit_charge_lrucare(page, mem,
+ MEM_CGROUP_CHARGE_TYPE_CACHE);
+ return ret;
+ }
/* shmem */
if (PageSwapCache(page)) {
- struct mem_cgroup *mem;
-
ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
if (!ret)
__mem_cgroup_commit_charge_swapin(page, mem,
@@ -2532,17 +2552,13 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype)
{
- struct page_cgroup *pc;
-
if (mem_cgroup_disabled())
return;
if (!ptr)
return;
cgroup_exclude_rmdir(&ptr->css);
- pc = lookup_page_cgroup(page);
- mem_cgroup_lru_del_before_commit_swapcache(page);
- __mem_cgroup_commit_charge(ptr, page, 1, pc, ctype);
- mem_cgroup_lru_add_after_commit_swapcache(page);
+
+ __mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
/*
* Now swap is on-memory. This means this page may be
* counted both as mem and swap....double count.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] memcg: fix leak on wrong LRU with FUSE
2011-03-09 7:48 ` [PATCH v3] " KAMEZAWA Hiroyuki
@ 2011-03-09 10:00 ` Johannes Weiner
2011-03-09 23:36 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2011-03-09 10:00 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, akpm@linux-foundation.org,
balbir@linux.vnet.ibm.com
On Wed, Mar 09, 2011 at 04:48:01PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 9 Mar 2011 15:07:50 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> > > } else {
> > > /* shmem */
> > > if (PageSwapCache(page)) {
> > > ..
> > > } else {
> > > ..
> > > }
> > > }
> > >
> > > Otherwise, the page cache will be charged twice.
> > >
> >
> > Ahh, thanks. I'll send v3.
> >
>
> Okay, this is a fixed one.
> ==
>
> fs/fuse/dev.c::fuse_try_move_page() does
>
> (1) remove a page by ->steal()
> (2) re-add the page to page cache
> (3) link the page to LRU if it was not on LRU at (1)
>
> This implies the page is _on_ LRU when it's added to radix-tree.
> So, the page is added to memory cgroup while it's on LRU.
> because LRU is lazy and no one flushs it.
>
> This is the same behavior as SwapCache and needs special care as
> - remove page from LRU before overwrite pc->mem_cgroup.
> - add page to LRU after overwrite pc->mem_cgroup.
>
> And we need to taking care of pagevec.
>
> If PageLRU(page) is set before we add PCG_USED bit, the page
> will not be added to memcg's LRU (in short period).
> So, regardlress of PageLRU(page) value before commit_charge(),
> we need to check PageLRU(page) after commit_charge().
>
> Changelog v2=>v3:
> - fixed double accounting.
>
> Changelog v1=>v2:
> - clean up.
> - cover !PageLRU() by pagevec case.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Thanks for the fix. I have a few comments below. Only nitpicks
though, the patch looks correct to me.
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
> @@ -2431,9 +2430,28 @@ static void
> __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> enum charge_type ctype);
>
> +static void
> +__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
> + enum charge_type ctype)
> +{
> + struct page_cgroup *pc = lookup_page_cgroup(page);
> + /*
> + * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
> + * is already on LRU. It means the page may on some other page_cgroup's
> + * LRU. Take care of it.
> + */
> + if (unlikely(PageLRU(page)))
> + mem_cgroup_lru_del_before_commit(page);
Do we need the extra check? mem_cgroup_lru_del_before_commit() will
do the right thing if the page is not on the list.
> + __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
> + if (unlikely(PageLRU(page)))
> + mem_cgroup_lru_add_after_commit(page);
Same here, mem_cgroup_lru_add_after_commit() has its own check for
PG_lru.
> @@ -2468,14 +2486,16 @@ int mem_cgroup_cache_charge(struct page
> if (unlikely(!mm))
> mm = &init_mm;
>
> - if (page_is_file_cache(page))
> - return mem_cgroup_charge_common(page, mm, gfp_mask,
> - MEM_CGROUP_CHARGE_TYPE_CACHE);
> -
> + if (page_is_file_cache(page)) {
> + ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
> + if (ret || !mem)
> + return ret;
> + __mem_cgroup_commit_charge_lrucare(page, mem,
> + MEM_CGROUP_CHARGE_TYPE_CACHE);
I think the comment about why we need to take care of the LRU status
would make more sense here (rather than in the _lrucare function),
because it is here where you make handling the lru a consequence of
the page being a file page.
How about this?
/*
* FUSE reuses pages without going through the final
* put that would remove them from the LRU list, make
* sure that they get relinked properly.
*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] memcg: fix leak on wrong LRU with FUSE
2011-03-09 10:00 ` Johannes Weiner
@ 2011-03-09 23:36 ` KAMEZAWA Hiroyuki
2011-03-10 5:47 ` [PATCH v4] " KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-03-09 23:36 UTC (permalink / raw)
To: Johannes Weiner
Cc: Daisuke Nishimura, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, akpm@linux-foundation.org,
balbir@linux.vnet.ibm.com
On Wed, 9 Mar 2011 11:00:20 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Wed, Mar 09, 2011 at 04:48:01PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 9 Mar 2011 15:07:50 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > > > } else {
> > > > /* shmem */
> > > > if (PageSwapCache(page)) {
> > > > ..
> > > > } else {
> > > > ..
> > > > }
> > > > }
> > > >
> > > > Otherwise, the page cache will be charged twice.
> > > >
> > >
> > > Ahh, thanks. I'll send v3.
> > >
> >
> > Okay, this is a fixed one.
> > ==
> >
> > fs/fuse/dev.c::fuse_try_move_page() does
> >
> > (1) remove a page by ->steal()
> > (2) re-add the page to page cache
> > (3) link the page to LRU if it was not on LRU at (1)
> >
> > This implies the page is _on_ LRU when it's added to radix-tree.
> > So, the page is added to memory cgroup while it's on LRU.
> > because LRU is lazy and no one flushs it.
> >
> > This is the same behavior as SwapCache and needs special care as
> > - remove page from LRU before overwrite pc->mem_cgroup.
> > - add page to LRU after overwrite pc->mem_cgroup.
> >
> > And we need to taking care of pagevec.
> >
> > If PageLRU(page) is set before we add PCG_USED bit, the page
> > will not be added to memcg's LRU (in short period).
> > So, regardlress of PageLRU(page) value before commit_charge(),
> > we need to check PageLRU(page) after commit_charge().
> >
> > Changelog v2=>v3:
> > - fixed double accounting.
> >
> > Changelog v1=>v2:
> > - clean up.
> > - cover !PageLRU() by pagevec case.
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Thanks for the fix. I have a few comments below. Only nitpicks
> though, the patch looks correct to me.
>
> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
>
Thank you for review.
> > @@ -2431,9 +2430,28 @@ static void
> > __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> > enum charge_type ctype);
> >
> > +static void
> > +__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
> > + enum charge_type ctype)
> > +{
> > + struct page_cgroup *pc = lookup_page_cgroup(page);
> > + /*
> > + * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
> > + * is already on LRU. It means the page may on some other page_cgroup's
> > + * LRU. Take care of it.
> > + */
> > + if (unlikely(PageLRU(page)))
> > + mem_cgroup_lru_del_before_commit(page);
>
> Do we need the extra check? mem_cgroup_lru_del_before_commit() will
> do the right thing if the page is not on the list.
>
lru_del_before_commit does checks under zone->lru_lock. So, it's very very heavy.
Hmm, I'll move the check under mem_cgroup_lru_del_before_commit() before lock.
> > + __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
> > + if (unlikely(PageLRU(page)))
> > + mem_cgroup_lru_add_after_commit(page);
>
> Same here, mem_cgroup_lru_add_after_commit() has its own check for
> PG_lru.
>
I'll move the check.
> > @@ -2468,14 +2486,16 @@ int mem_cgroup_cache_charge(struct page
> > if (unlikely(!mm))
> > mm = &init_mm;
> >
> > - if (page_is_file_cache(page))
> > - return mem_cgroup_charge_common(page, mm, gfp_mask,
> > - MEM_CGROUP_CHARGE_TYPE_CACHE);
> > -
> > + if (page_is_file_cache(page)) {
> > + ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
> > + if (ret || !mem)
> > + return ret;
> > + __mem_cgroup_commit_charge_lrucare(page, mem,
> > + MEM_CGROUP_CHARGE_TYPE_CACHE);
>
> I think the comment about why we need to take care of the LRU status
> would make more sense here (rather than in the _lrucare function),
> because it is here where you make handling the lru a consequence of
> the page being a file page.
>
Sure.
> How about this?
>
> /*
> * FUSE reuses pages without going through the final
> * put that would remove them from the LRU list, make
> * sure that they get relinked properly.
> */
will add. Thank you !
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4] memcg: fix leak on wrong LRU with FUSE
2011-03-09 23:36 ` KAMEZAWA Hiroyuki
@ 2011-03-10 5:47 ` KAMEZAWA Hiroyuki
2011-03-10 6:04 ` Daisuke Nishimura
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-03-10 5:47 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Johannes Weiner, Daisuke Nishimura, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, akpm@linux-foundation.org,
balbir@linux.vnet.ibm.com
On Thu, 10 Mar 2011 08:36:59 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> will add. Thank you !
>
Here is v4 based on feedbacks.
==
fs/fuse/dev.c::fuse_try_move_page() does
(1) remove a page by ->steal()
(2) re-add the page to page cache
(3) link the page to LRU if it was not on LRU at (1)
This implies the page is _on_ LRU when it's added to radix-tree.
So, the page is added to memory cgroup while it's on LRU and
the pave will remain in the old(wrong) memcg.
By this bug, force_empty()'s LRU scan cannot find the page and
rmdir() will never ends.
This is the same behavior as SwapCache and needs special care as
- remove page from LRU before overwrite pc->mem_cgroup.
- add page to LRU after overwrite pc->mem_cgroup.
This will fixes memcg's rmdir() hang issue with FUSE.
Changelog v3=v4:
- moved PageLRU() check into the leaf function.
- added comments
Changelog v2=>v3:
- fixed double accounting.
Changelog v1=>v2:
- clean up.
- cover !PageLRU() by pagevec case.
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 70 +++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 52 insertions(+), 18 deletions(-)
Index: mmotm-0303/mm/memcontrol.c
===================================================================
--- mmotm-0303.orig/mm/memcontrol.c
+++ mmotm-0303/mm/memcontrol.c
@@ -926,18 +926,28 @@ void mem_cgroup_add_lru_list(struct page
}
/*
- * At handling SwapCache, pc->mem_cgroup may be changed while it's linked to
- * lru because the page may.be reused after it's fully uncharged (because of
- * SwapCache behavior).To handle that, unlink page_cgroup from LRU when charge
- * it again. This function is only used to charge SwapCache. It's done under
- * lock_page and expected that zone->lru_lock is never held.
+ * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
+ * while it's linked to lru because the page may be reused after it's fully
+ * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
+ * It's done under lock_page and expected that zone->lru_lock isnever held.
*/
-static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_del_before_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
struct page_cgroup *pc = lookup_page_cgroup(page);
+ /*
+ * Doing this check without taking ->lru_lock seems wrong but this
+ * is safe. Because if page_cgroup's USED bit is unset, the page
+ * will not be added to any memcg's LRU. If page_cgroup's USED bit is
+ * set, the commit after this will fail, anyway.
+ * This all charge/uncharge is done under some mutual execustion.
+ * So, we don't need to taking care of changes in USED bit.
+ */
+ if (likely(!PageLRU(page)))
+ return;
+
spin_lock_irqsave(&zone->lru_lock, flags);
/*
* Forget old LRU when this page_cgroup is *not* used. This Used bit
@@ -948,12 +958,15 @@ static void mem_cgroup_lru_del_before_co
spin_unlock_irqrestore(&zone->lru_lock, flags);
}
-static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_add_after_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
struct page_cgroup *pc = lookup_page_cgroup(page);
+ /* taking care of that the page is added to LRU while we commit it */
+ if (likely(!PageLRU(page)))
+ return;
spin_lock_irqsave(&zone->lru_lock, flags);
/* link when the page is linked to LRU but page_cgroup isn't */
if (PageLRU(page) && !PageCgroupAcctLRU(pc))
@@ -2431,9 +2444,26 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype);
+static void
+__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
+ enum charge_type ctype)
+{
+ struct page_cgroup *pc = lookup_page_cgroup(page);
+ /*
+ * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
+ * is already on LRU. It means the page may on some other page_cgroup's
+ * LRU. Take care of it.
+ */
+ mem_cgroup_lru_del_before_commit(page);
+ __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
+ mem_cgroup_lru_add_after_commit(page);
+ return;
+}
+
int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
+ struct mem_cgroup *mem = NULL;
int ret;
if (mem_cgroup_disabled())
@@ -2468,14 +2498,22 @@ int mem_cgroup_cache_charge(struct page
if (unlikely(!mm))
mm = &init_mm;
- if (page_is_file_cache(page))
- return mem_cgroup_charge_common(page, mm, gfp_mask,
- MEM_CGROUP_CHARGE_TYPE_CACHE);
+ if (page_is_file_cache(page)) {
+ ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
+ if (ret || !mem)
+ return ret;
+ /*
+ * FUSE reuses pages without going through the final
+ * put that would remove them from the LRU list, make
+ * sure that they get relinked properly.
+ */
+ __mem_cgroup_commit_charge_lrucare(page, mem,
+ MEM_CGROUP_CHARGE_TYPE_CACHE);
+ return ret;
+ }
/* shmem */
if (PageSwapCache(page)) {
- struct mem_cgroup *mem;
-
ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
if (!ret)
__mem_cgroup_commit_charge_swapin(page, mem,
@@ -2532,17 +2570,13 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype)
{
- struct page_cgroup *pc;
-
if (mem_cgroup_disabled())
return;
if (!ptr)
return;
cgroup_exclude_rmdir(&ptr->css);
- pc = lookup_page_cgroup(page);
- mem_cgroup_lru_del_before_commit_swapcache(page);
- __mem_cgroup_commit_charge(ptr, page, 1, pc, ctype);
- mem_cgroup_lru_add_after_commit_swapcache(page);
+
+ __mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
/*
* Now swap is on-memory. This means this page may be
* counted both as mem and swap....double count.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4] memcg: fix leak on wrong LRU with FUSE
2011-03-10 5:47 ` [PATCH v4] " KAMEZAWA Hiroyuki
@ 2011-03-10 6:04 ` Daisuke Nishimura
2011-03-10 22:20 ` Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Daisuke Nishimura @ 2011-03-10 6:04 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Johannes Weiner, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, balbir@linux.vnet.ibm.com,
Daisuke Nishimura
On Thu, 10 Mar 2011 14:47:52 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 10 Mar 2011 08:36:59 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> > will add. Thank you !
> >
>
> Here is v4 based on feedbacks.
> ==
>
> fs/fuse/dev.c::fuse_try_move_page() does
>
> (1) remove a page by ->steal()
> (2) re-add the page to page cache
> (3) link the page to LRU if it was not on LRU at (1)
>
> This implies the page is _on_ LRU when it's added to radix-tree.
> So, the page is added to memory cgroup while it's on LRU and
> the pave will remain in the old(wrong) memcg.
> By this bug, force_empty()'s LRU scan cannot find the page and
> rmdir() will never ends.
>
> This is the same behavior as SwapCache and needs special care as
> - remove page from LRU before overwrite pc->mem_cgroup.
> - add page to LRU after overwrite pc->mem_cgroup.
>
> This will fixes memcg's rmdir() hang issue with FUSE.
>
> Changelog v3=v4:
> - moved PageLRU() check into the leaf function.
> - added comments
>
> Changelog v2=>v3:
> - fixed double accounting.
>
> Changelog v1=>v2:
> - clean up.
> - cover !PageLRU() by pagevec case.
>
> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
I hope this can fix the original BZ case.
Thanks,
Daisuke Nishimura.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4] memcg: fix leak on wrong LRU with FUSE
2011-03-10 6:04 ` Daisuke Nishimura
@ 2011-03-10 22:20 ` Andrew Morton
2011-03-10 23:45 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2011-03-10 22:20 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: KAMEZAWA Hiroyuki, Johannes Weiner, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, balbir@linux.vnet.ibm.com
On Thu, 10 Mar 2011 15:04:28 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>
> I hope this can fix the original BZ case.
Do you recall the buzilla bug number?
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4] memcg: fix leak on wrong LRU with FUSE
2011-03-10 22:20 ` Andrew Morton
@ 2011-03-10 23:45 ` KAMEZAWA Hiroyuki
2011-03-11 0:01 ` Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-03-10 23:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Daisuke Nishimura, Johannes Weiner, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, balbir@linux.vnet.ibm.com
On Thu, 10 Mar 2011 14:20:29 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 10 Mar 2011 15:04:28 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
> > > Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> >
> > I hope this can fix the original BZ case.
>
> Do you recall the buzilla bug number?
>
Bug 30432
The patch works only when the user use FUSE.
We need to confirm that.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4] memcg: fix leak on wrong LRU with FUSE
2011-03-10 23:45 ` KAMEZAWA Hiroyuki
@ 2011-03-11 0:01 ` Andrew Morton
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2011-03-11 0:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura, Johannes Weiner, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
Daniel Poelzleithner
On Fri, 11 Mar 2011 08:45:51 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 10 Mar 2011 14:20:29 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > On Thu, 10 Mar 2011 15:04:28 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> >
> > > > Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > >
> > > Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > >
> > > I hope this can fix the original BZ case.
> >
> > Do you recall the buzilla bug number?
> >
>
> Bug 30432
OK, well please let's not lose track of these things. It's useful for
maintaining the bug database and it helps with the rather important
step of crediting the bug reporters who help us.
> The patch works only when the user use FUSE.
> We need to confirm that.
Daniel, are you able to test the proposed fix?
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
fs/fuse/dev.c::fuse_try_move_page() does
(1) remove a page by ->steal()
(2) re-add the page to page cache
(3) link the page to LRU if it was not on LRU at (1)
This implies the page is _on_ LRU when it's added to radix-tree. So, the
page is added to memory cgroup while it's on LRU. because LRU is lazy and
no one flushs it.
This is the same behavior as SwapCache and needs special care as
- remove page from LRU before overwrite pc->mem_cgroup.
- add page to LRU after overwrite pc->mem_cgroup.
And we need to taking care of pagevec.
If PageLRU(page) is set before we add PCG_USED bit, the page will not be
added to memcg's LRU (in short period). So, regardlress of PageLRU(page)
value before commit_charge(), we need to check PageLRU(page) after
commit_charge().
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30432
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Balbir Singh <balbir@in.ibm.com>
Reported-by: Daniel Poelzleithner <poelzi@poelzi.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 70 ++++++++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 18 deletions(-)
diff -puN mm/memcontrol.c~memcg-fix-leak-on-wrong-lru-with-fuse mm/memcontrol.c
--- a/mm/memcontrol.c~memcg-fix-leak-on-wrong-lru-with-fuse
+++ a/mm/memcontrol.c
@@ -926,18 +926,28 @@ void mem_cgroup_add_lru_list(struct page
}
/*
- * At handling SwapCache, pc->mem_cgroup may be changed while it's linked to
- * lru because the page may.be reused after it's fully uncharged (because of
- * SwapCache behavior).To handle that, unlink page_cgroup from LRU when charge
- * it again. This function is only used to charge SwapCache. It's done under
- * lock_page and expected that zone->lru_lock is never held.
+ * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
+ * while it's linked to lru because the page may be reused after it's fully
+ * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
+ * It's done under lock_page and expected that zone->lru_lock isnever held.
*/
-static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_del_before_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
struct page_cgroup *pc = lookup_page_cgroup(page);
+ /*
+ * Doing this check without taking ->lru_lock seems wrong but this
+ * is safe. Because if page_cgroup's USED bit is unset, the page
+ * will not be added to any memcg's LRU. If page_cgroup's USED bit is
+ * set, the commit after this will fail, anyway.
+ * This all charge/uncharge is done under some mutual execustion.
+ * So, we don't need to taking care of changes in USED bit.
+ */
+ if (likely(!PageLRU(page)))
+ return;
+
spin_lock_irqsave(&zone->lru_lock, flags);
/*
* Forget old LRU when this page_cgroup is *not* used. This Used bit
@@ -948,12 +958,15 @@ static void mem_cgroup_lru_del_before_co
spin_unlock_irqrestore(&zone->lru_lock, flags);
}
-static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_add_after_commit(struct page *page)
{
unsigned long flags;
struct zone *zone = page_zone(page);
struct page_cgroup *pc = lookup_page_cgroup(page);
+ /* taking care of that the page is added to LRU while we commit it */
+ if (likely(!PageLRU(page)))
+ return;
spin_lock_irqsave(&zone->lru_lock, flags);
/* link when the page is linked to LRU but page_cgroup isn't */
if (PageLRU(page) && !PageCgroupAcctLRU(pc))
@@ -2431,9 +2444,26 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype);
+static void
+__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *mem,
+ enum charge_type ctype)
+{
+ struct page_cgroup *pc = lookup_page_cgroup(page);
+ /*
+ * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
+ * is already on LRU. It means the page may on some other page_cgroup's
+ * LRU. Take care of it.
+ */
+ mem_cgroup_lru_del_before_commit(page);
+ __mem_cgroup_commit_charge(mem, page, 1, pc, ctype);
+ mem_cgroup_lru_add_after_commit(page);
+ return;
+}
+
int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
+ struct mem_cgroup *mem = NULL;
int ret;
if (mem_cgroup_disabled())
@@ -2468,14 +2498,22 @@ int mem_cgroup_cache_charge(struct page
if (unlikely(!mm))
mm = &init_mm;
- if (page_is_file_cache(page))
- return mem_cgroup_charge_common(page, mm, gfp_mask,
- MEM_CGROUP_CHARGE_TYPE_CACHE);
+ if (page_is_file_cache(page)) {
+ ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &mem, true);
+ if (ret || !mem)
+ return ret;
+ /*
+ * FUSE reuses pages without going through the final
+ * put that would remove them from the LRU list, make
+ * sure that they get relinked properly.
+ */
+ __mem_cgroup_commit_charge_lrucare(page, mem,
+ MEM_CGROUP_CHARGE_TYPE_CACHE);
+ return ret;
+ }
/* shmem */
if (PageSwapCache(page)) {
- struct mem_cgroup *mem;
-
ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
if (!ret)
__mem_cgroup_commit_charge_swapin(page, mem,
@@ -2532,17 +2570,13 @@ static void
__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
enum charge_type ctype)
{
- struct page_cgroup *pc;
-
if (mem_cgroup_disabled())
return;
if (!ptr)
return;
cgroup_exclude_rmdir(&ptr->css);
- pc = lookup_page_cgroup(page);
- mem_cgroup_lru_del_before_commit_swapcache(page);
- __mem_cgroup_commit_charge(ptr, page, 1, pc, ctype);
- mem_cgroup_lru_add_after_commit_swapcache(page);
+
+ __mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
/*
* Now swap is on-memory. This means this page may be
* counted both as mem and swap....double count.
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-03-11 0:01 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-08 4:56 [PATCH v2] memcg: fix leak on wrong LRU with FUSE KAMEZAWA Hiroyuki
2011-03-08 9:18 ` Daisuke Nishimura
2011-03-09 6:07 ` KAMEZAWA Hiroyuki
2011-03-09 7:48 ` [PATCH v3] " KAMEZAWA Hiroyuki
2011-03-09 10:00 ` Johannes Weiner
2011-03-09 23:36 ` KAMEZAWA Hiroyuki
2011-03-10 5:47 ` [PATCH v4] " KAMEZAWA Hiroyuki
2011-03-10 6:04 ` Daisuke Nishimura
2011-03-10 22:20 ` Andrew Morton
2011-03-10 23:45 ` KAMEZAWA Hiroyuki
2011-03-11 0:01 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).