[PATCH] mm: Do not keep page locked during page fault while charging it for memcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: Do not keep page locked during page fault while charging it for memcg
@ 2011-06-22 12:06 Michal Hocko
  2011-06-22 12:15 ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-22 12:06 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, Rik van Riel,
	Michel Lespinasse

Currently we are keeping faulted page locked throughout whole __do_fault
call (except for page_mkwrite code path). If we do early COW we allocate a
new page which has to be charged for a memcg (mem_cgroup_newpage_charge).
This function, however, might block for unbounded amount of time if memcg
oom killer is disabled because the only way out of the OOM situation is
either an external event (kill a process from the group or resize the group
hard limit) or internal event (that would get us under the limit). Many
times the external event is the only chance to move forward, though.
In the end we are keeping the faulted page locked and blocking other
processes from faulting it in which is not good at all because we are
basically punishing potentially an unrelated process for OOM condition
in a different group (I have seen stuck system because of ld-2.11.1.so being
locked).

Let's unlock the faulted page while we are charging a new page and then
recheck whether it wasn't truncated in the mean time. We should retry the
fault in that case.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memory.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87d9353..12e7ccc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3177,7 +3177,23 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 				ret = VM_FAULT_OOM;
 				goto out;
 			}
-			if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
+
+			/* We have to drop the page lock here because memcg
+			 * charging might block for unbound time if memcg oom
+			 * killer is disabled.
+			 */
+			unlock_page(vmf.page);
+			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
+			lock_page(vmf.page);
+
+			if (!vmf.page->mapping) {
+				if (!ret)
+					mem_cgroup_uncharge_page(page);
+				page_cache_release(page);
+				ret = 0; /* retry the fault */
+				goto out;
+			}
+			if (ret) {
 				ret = VM_FAULT_OOM;
 				page_cache_release(page);
 				goto out;
-- 
1.7.5.4

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-22 12:06 [PATCH] mm: Do not keep page locked during page fault while charging it for memcg Michal Hocko
@ 2011-06-22 12:15 ` Christoph Hellwig
  2011-06-22 12:32   ` [PATCH V2] " Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2011-06-22 12:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, Rik van Riel,
	Michel Lespinasse

> +
> +			/* We have to drop the page lock here because memcg
> +			 * charging might block for unbound time if memcg oom
> +			 * killer is disabled.
> +			 */
> +			unlock_page(vmf.page);
> +			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
> +			lock_page(vmf.page);

This introduces a completely poinless unlock/lock cycle for non-memcg
pagefaults.  Please make sure it only happens when actually needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-22 12:15 ` Christoph Hellwig
@ 2011-06-22 12:32   ` Michal Hocko
  2011-06-23  6:08     ` [PATCH] memcg: unlock page before charging it. (WasRe: " KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-22 12:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, Rik van Riel,
	Michel Lespinasse, Mel Gorman

On Wed 22-06-11 08:15:16, Christoph Hellwig wrote:
> > +
> > +			/* We have to drop the page lock here because memcg
> > +			 * charging might block for unbound time if memcg oom
> > +			 * killer is disabled.
> > +			 */
> > +			unlock_page(vmf.page);
> > +			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
> > +			lock_page(vmf.page);
> 
> This introduces a completely poinless unlock/lock cycle for non-memcg
> pagefaults.  Please make sure it only happens when actually needed.

Fair point. Thanks!
What about the following?
I realize that pushing more memcg logic into mm/memory.c is not nice but
I found it better than pushing the old page into mem_cgroup_newpage_charge.
We could also check whether the old page is in the root cgroup because
memcg oom killer is not active there but that would add more code into
this hot path so I guess it is not worth it.

Changes since v1
- do not unlock page when memory controller is disabled.

8<------

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] memcg: unlock page before charging it. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-22 12:32   ` [PATCH V2] " Michal Hocko
@ 2011-06-23  6:08     ` KAMEZAWA Hiroyuki
  2011-06-23  7:23       ` Daisuke Nishimura
  2011-06-23  7:41       ` Michal Hocko
  0 siblings, 2 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-23  6:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Wed, 22 Jun 2011 14:32:04 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Wed 22-06-11 08:15:16, Christoph Hellwig wrote:
> > > +
> > > +			/* We have to drop the page lock here because memcg
> > > +			 * charging might block for unbound time if memcg oom
> > > +			 * killer is disabled.
> > > +			 */
> > > +			unlock_page(vmf.page);
> > > +			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
> > > +			lock_page(vmf.page);
> > 
> > This introduces a completely poinless unlock/lock cycle for non-memcg
> > pagefaults.  Please make sure it only happens when actually needed.
> 
> Fair point. Thanks!
> What about the following?
> I realize that pushing more memcg logic into mm/memory.c is not nice but
> I found it better than pushing the old page into mem_cgroup_newpage_charge.
> We could also check whether the old page is in the root cgroup because
> memcg oom killer is not active there but that would add more code into
> this hot path so I guess it is not worth it.
> 
> Changes since v1
> - do not unlock page when memory controller is disabled.
> 

Great work. Then I confirmed Lutz' problem is fixed.

But I like following style rather than additional lock/unlock.
How do you think ? I tested this on the latest git tree and confirmed
the Lutz's livelock problem is fixed. And I think this should go stable tree.


==

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memcg: unlock page before charging it. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23  6:08     ` [PATCH] memcg: unlock page before charging it. (WasRe: " KAMEZAWA Hiroyuki
@ 2011-06-23  7:23       ` Daisuke Nishimura
  2011-06-23  7:41       ` Michal Hocko
  1 sibling, 0 replies; 17+ messages in thread
From: Daisuke Nishimura @ 2011-06-23  7:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, Christoph Hellwig, linux-mm, linux-kernel,
	Andrew Morton, Hugh Dickins, Rik van Riel, Michel Lespinasse,
	Mel Gorman, Lutz Vieweg, Daisuke Nishimura

On Thu, 23 Jun 2011 15:08:42 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 22 Jun 2011 14:32:04 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Wed 22-06-11 08:15:16, Christoph Hellwig wrote:
> > > > +
> > > > +			/* We have to drop the page lock here because memcg
> > > > +			 * charging might block for unbound time if memcg oom
> > > > +			 * killer is disabled.
> > > > +			 */
> > > > +			unlock_page(vmf.page);
> > > > +			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
> > > > +			lock_page(vmf.page);
> > > 
> > > This introduces a completely poinless unlock/lock cycle for non-memcg
> > > pagefaults.  Please make sure it only happens when actually needed.
> > 
> > Fair point. Thanks!
> > What about the following?
> > I realize that pushing more memcg logic into mm/memory.c is not nice but
> > I found it better than pushing the old page into mem_cgroup_newpage_charge.
> > We could also check whether the old page is in the root cgroup because
> > memcg oom killer is not active there but that would add more code into
> > this hot path so I guess it is not worth it.
> > 
> > Changes since v1
> > - do not unlock page when memory controller is disabled.
> > 
> 
> Great work. Then I confirmed Lutz' problem is fixed.
> 
> But I like following style rather than additional lock/unlock.
> How do you think ? I tested this on the latest git tree and confirmed
> the Lutz's livelock problem is fixed. And I think this should go stable tree.
> 
I vote for this one.

One comments are inlined below.

> 
> ==
> From 7e9250da9ff529958d4c1ff511458dbdac8e4b81 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 23 Jun 2011 15:05:57 +0900
> Subject: [PATCH] memcg: unlock page before charging it.
> 
> Currently we are keeping faulted page locked throughout whole __do_fault
> call (except for page_mkwrite code path). If we do early COW we allocate a
> new page which has to be charged for a memcg (mem_cgroup_newpage_charge).
> 
> This function, however, might block for unbounded amount of time if memcg
> oom killer is disabled or fork-bomb is running because the only way out of
> the OOM situation is either an external event or OOM-situation fix.
> 
> processes from faulting it in which is not good at all because we are
> basically punishing potentially an unrelated process for OOM condition
> in a different group (I have seen stuck system because of ld-2.11.1.so being
> locked).
> 
> We can do test easily.
>  % cgcreate -g memory:A
>  % cgset -r memory.limit_in_bytes=64M A
>  % cgset -r memory.memsw.limit_in_bytes=64M A
>  % cd kernel_dir; cgexec -g memory:A make -j
> 
> Then, the whole system will live-locked until you kill 'make -j'
> by hands (or push reboot...) This is because some important
> page in a shared library are locked and never released bcause of fork-bomb.
> 
> This patch delays "charge" until unlock_page() called. There is
> no problem as far as we keep reference on a page.
> (memcg doesn't require page_lock()).
> 
> Then, above livelock disappears.
> 
> Reported-by: Lutz Vieweg <lvml@5t9.de>
> Original-idea-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memory.c |   28 +++++++++++++++++++---------
>  1 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87d9353..66442da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3129,7 +3129,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	struct page *page;
>  	pte_t entry;
>  	int anon = 0;
> -	int charged = 0;
> +	struct page *need_charge = NULL;
>  	struct page *dirty_page = NULL;
>  	struct vm_fault vmf;
>  	int ret;
> @@ -3177,12 +3177,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  				ret = VM_FAULT_OOM;
>  				goto out;
>  			}
> -			if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
> -				ret = VM_FAULT_OOM;
> -				page_cache_release(page);
> -				goto out;
> -			}
> -			charged = 1;
> +			need_charge = page;
>  			copy_user_highpage(page, vmf.page, address, vma);
>  			__SetPageUptodate(page);
>  		} else {
> @@ -3251,12 +3246,11 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  		/* no need to invalidate: a not-present page won't be cached */
>  		update_mmu_cache(vma, address, page_table);
>  	} else {
> -		if (charged)
> -			mem_cgroup_uncharge_page(page);
>  		if (anon)
>  			page_cache_release(page);
>  		else
>  			anon = 1; /* no anon but release faulted_page */
> +		need_charge = NULL;
>  	}
>  
>  	pte_unmap_unlock(page_table, ptl);
> @@ -3268,6 +3262,17 @@ out:
>  		if (set_page_dirty(dirty_page))
>  			page_mkwrite = 1;
>  		unlock_page(dirty_page);
> +		if (need_charge) {
> +			/*
> +			 * charge this page before we drop refcnt.
> +			 * memory cgroup returns OOM condition when
> +			 * this task is killed. So, it's not necesasry
> +			 * to undo.
> +			 */
> +			if (mem_cgroup_newpage_charge(need_charge,
> +					mm, GFP_KERNEL))
> +				ret = VM_FAULT_OOM;
> +		}
>  		put_page(dirty_page);
>  		if (page_mkwrite && mapping) {
>  			/*
Hmm, if I read the code correctly, we don't come to this path.
Because "dirty_page" is set only in "anon == 0" case and, when we set "need_charge",
we set "anon" too.
So, we can do mem_cgroup_newpage_charge(need_charge) outside of
"if (dirty_page) ... else ..." block ?


Thanks,
Daisuke Nishimura.

> @@ -3282,6 +3287,11 @@ out:
>  			file_update_time(vma->vm_file);
>  	} else {
>  		unlock_page(vmf.page);
> +		if (need_charge) {
> +			if (mem_cgroup_newpage_charge(need_charge,
> +						mm, GFP_KERNEL))
> +				ret = VM_FAULT_OOM;
> +		}
>  		if (anon)
>  			page_cache_release(vmf.page);
>  	}
> -- 
> 1.7.4.1
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memcg: unlock page before charging it. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23  6:08     ` [PATCH] memcg: unlock page before charging it. (WasRe: " KAMEZAWA Hiroyuki
  2011-06-23  7:23       ` Daisuke Nishimura
@ 2011-06-23  7:41       ` Michal Hocko
  2011-06-23  8:08         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-23  7:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Thu 23-06-11 15:08:42, KAMEZAWA Hiroyuki wrote:
> On Wed, 22 Jun 2011 14:32:04 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Wed 22-06-11 08:15:16, Christoph Hellwig wrote:
> > > > +
> > > > +			/* We have to drop the page lock here because memcg
> > > > +			 * charging might block for unbound time if memcg oom
> > > > +			 * killer is disabled.
> > > > +			 */
> > > > +			unlock_page(vmf.page);
> > > > +			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
> > > > +			lock_page(vmf.page);
> > > 
> > > This introduces a completely poinless unlock/lock cycle for non-memcg
> > > pagefaults.  Please make sure it only happens when actually needed.
> > 
> > Fair point. Thanks!
> > What about the following?
> > I realize that pushing more memcg logic into mm/memory.c is not nice but
> > I found it better than pushing the old page into mem_cgroup_newpage_charge.
> > We could also check whether the old page is in the root cgroup because
> > memcg oom killer is not active there but that would add more code into
> > this hot path so I guess it is not worth it.
> > 
> > Changes since v1
> > - do not unlock page when memory controller is disabled.
> > 
> 
> Great work. Then I confirmed Lutz' problem is fixed.
> 
> But I like following style rather than additional lock/unlock.
> How do you think ?

Yes, I like it much more than the hairy way I did it. See comments bellow.

> I tested this on the latest git tree and confirmed
> the Lutz's livelock problem is fixed. And I think this should go stable tree.
> 
> 
> ==
> From 7e9250da9ff529958d4c1ff511458dbdac8e4b81 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 23 Jun 2011 15:05:57 +0900
> Subject: [PATCH] memcg: unlock page before charging it.
> 
> Currently we are keeping faulted page locked throughout whole __do_fault
> call (except for page_mkwrite code path). If we do early COW we allocate a
> new page which has to be charged for a memcg (mem_cgroup_newpage_charge).
> 
> This function, however, might block for unbounded amount of time if memcg
> oom killer is disabled or fork-bomb is running because the only way out of
> the OOM situation is either an external event or OOM-situation fix.
> 
> processes from faulting it in which is not good at all because we are

Missing the beginning of the sentence?

> basically punishing potentially an unrelated process for OOM condition
> in a different group (I have seen stuck system because of ld-2.11.1.so being
> locked).
> 
> We can do test easily.
>  % cgcreate -g memory:A
>  % cgset -r memory.limit_in_bytes=64M A
>  % cgset -r memory.memsw.limit_in_bytes=64M A
>  % cd kernel_dir; cgexec -g memory:A make -j
> 
> Then, the whole system will live-locked until you kill 'make -j'
> by hands (or push reboot...) This is because some important
> page in a shared library are locked and never released bcause of fork-bomb.
> 
> This patch delays "charge" until unlock_page() called. There is
> no problem as far as we keep reference on a page.
> (memcg doesn't require page_lock()).
> 
> Then, above livelock disappears.
> 
> Reported-by: Lutz Vieweg <lvml@5t9.de>
> Original-idea-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memory.c |   28 +++++++++++++++++++---------
>  1 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87d9353..66442da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3129,7 +3129,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	struct page *page;
>  	pte_t entry;
>  	int anon = 0;
> -	int charged = 0;
> +	struct page *need_charge = NULL;
>  	struct page *dirty_page = NULL;
>  	struct vm_fault vmf;
>  	int ret;
> @@ -3177,12 +3177,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  				ret = VM_FAULT_OOM;
>  				goto out;
>  			}
> -			if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
> -				ret = VM_FAULT_OOM;
> -				page_cache_release(page);
> -				goto out;
> -			}
> -			charged = 1;
> +			need_charge = page;
>  			copy_user_highpage(page, vmf.page, address, vma);
>  			__SetPageUptodate(page);
>  		} else {
> @@ -3251,12 +3246,11 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  		/* no need to invalidate: a not-present page won't be cached */
>  		update_mmu_cache(vma, address, page_table);
>  	} else {
> -		if (charged)
> -			mem_cgroup_uncharge_page(page);
>  		if (anon)
>  			page_cache_release(page);
>  		else
>  			anon = 1; /* no anon but release faulted_page */
> +		need_charge = NULL;
>  	}
>  
>  	pte_unmap_unlock(page_table, ptl);
> @@ -3268,6 +3262,17 @@ out:
>  		if (set_page_dirty(dirty_page))
>  			page_mkwrite = 1;
>  		unlock_page(dirty_page);
> +		if (need_charge) {
> +			/*
> +			 * charge this page before we drop refcnt.
> +			 * memory cgroup returns OOM condition when
> +			 * this task is killed. So, it's not necesasry
> +			 * to undo.
> +			 */
> +			if (mem_cgroup_newpage_charge(need_charge,
> +					mm, GFP_KERNEL))
> +				ret = VM_FAULT_OOM;
> +		}

We do not need this hunk, don't we? dirty_page is set only if !anon so
we never get to this path from COW.

Other than that:
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memcg: unlock page before charging it. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23  7:41       ` Michal Hocko
@ 2011-06-23  8:08         ` KAMEZAWA Hiroyuki
  2011-06-23  9:02           ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-23  8:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Thu, 23 Jun 2011 09:41:33 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 23-06-11 15:08:42, KAMEZAWA Hiroyuki wrote:
> > On Wed, 22 Jun 2011 14:32:04 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Wed 22-06-11 08:15:16, Christoph Hellwig wrote:
> > > > > +
> > > > > +			/* We have to drop the page lock here because memcg
> > > > > +			 * charging might block for unbound time if memcg oom
> > > > > +			 * killer is disabled.
> > > > > +			 */
> > > > > +			unlock_page(vmf.page);
> > > > > +			ret = mem_cgroup_newpage_charge(page, mm, GFP_KERNEL);
> > > > > +			lock_page(vmf.page);
> > > > 
> > > > This introduces a completely poinless unlock/lock cycle for non-memcg
> > > > pagefaults.  Please make sure it only happens when actually needed.
> > > 
> > > Fair point. Thanks!
> > > What about the following?
> > > I realize that pushing more memcg logic into mm/memory.c is not nice but
> > > I found it better than pushing the old page into mem_cgroup_newpage_charge.
> > > We could also check whether the old page is in the root cgroup because
> > > memcg oom killer is not active there but that would add more code into
> > > this hot path so I guess it is not worth it.
> > > 
> > > Changes since v1
> > > - do not unlock page when memory controller is disabled.
> > > 
> > 
> > Great work. Then I confirmed Lutz' problem is fixed.
> > 
> > But I like following style rather than additional lock/unlock.
> > How do you think ?
> 
> Yes, I like it much more than the hairy way I did it. See comments bellow.
> 
> > I tested this on the latest git tree and confirmed
> > the Lutz's livelock problem is fixed. And I think this should go stable tree.
> > 
> > 
> > ==
> > From 7e9250da9ff529958d4c1ff511458dbdac8e4b81 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Thu, 23 Jun 2011 15:05:57 +0900
> > Subject: [PATCH] memcg: unlock page before charging it.
> > 
> > Currently we are keeping faulted page locked throughout whole __do_fault
> > call (except for page_mkwrite code path). If we do early COW we allocate a
> > new page which has to be charged for a memcg (mem_cgroup_newpage_charge).
> > 
> > This function, however, might block for unbounded amount of time if memcg
> > oom killer is disabled or fork-bomb is running because the only way out of
> > the OOM situation is either an external event or OOM-situation fix.
> > 
> > processes from faulting it in which is not good at all because we are
> 
> Missing the beginning of the sentence?
> 

Ah, yes...

> > basically punishing potentially an unrelated process for OOM condition
> > in a different group (I have seen stuck system because of ld-2.11.1.so being
> > locked).
> > 
> > We can do test easily.
> >  % cgcreate -g memory:A
> >  % cgset -r memory.limit_in_bytes=64M A
> >  % cgset -r memory.memsw.limit_in_bytes=64M A
> >  % cd kernel_dir; cgexec -g memory:A make -j
> > 
> > Then, the whole system will live-locked until you kill 'make -j'
> > by hands (or push reboot...) This is because some important
> > page in a shared library are locked and never released bcause of fork-bomb.
> > 
> > This patch delays "charge" until unlock_page() called. There is
> > no problem as far as we keep reference on a page.
> > (memcg doesn't require page_lock()).
> > 
> > Then, above livelock disappears.
> > 
> > Reported-by: Lutz Vieweg <lvml@5t9.de>
> > Original-idea-by: Michal Hocko <mhocko@suse.cz>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memory.c |   28 +++++++++++++++++++---------
> >  1 files changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 87d9353..66442da 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3129,7 +3129,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >  	struct page *page;
> >  	pte_t entry;
> >  	int anon = 0;
> > -	int charged = 0;
> > +	struct page *need_charge = NULL;
> >  	struct page *dirty_page = NULL;
> >  	struct vm_fault vmf;
> >  	int ret;
> > @@ -3177,12 +3177,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >  				ret = VM_FAULT_OOM;
> >  				goto out;
> >  			}
> > -			if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
> > -				ret = VM_FAULT_OOM;
> > -				page_cache_release(page);
> > -				goto out;
> > -			}
> > -			charged = 1;
> > +			need_charge = page;
> >  			copy_user_highpage(page, vmf.page, address, vma);
> >  			__SetPageUptodate(page);
> >  		} else {
> > @@ -3251,12 +3246,11 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >  		/* no need to invalidate: a not-present page won't be cached */
> >  		update_mmu_cache(vma, address, page_table);
> >  	} else {
> > -		if (charged)
> > -			mem_cgroup_uncharge_page(page);
> >  		if (anon)
> >  			page_cache_release(page);
> >  		else
> >  			anon = 1; /* no anon but release faulted_page */
> > +		need_charge = NULL;
> >  	}
> >  
> >  	pte_unmap_unlock(page_table, ptl);
> > @@ -3268,6 +3262,17 @@ out:
> >  		if (set_page_dirty(dirty_page))
> >  			page_mkwrite = 1;
> >  		unlock_page(dirty_page);
> > +		if (need_charge) {
> > +			/*
> > +			 * charge this page before we drop refcnt.
> > +			 * memory cgroup returns OOM condition when
> > +			 * this task is killed. So, it's not necesasry
> > +			 * to undo.
> > +			 */
> > +			if (mem_cgroup_newpage_charge(need_charge,
> > +					mm, GFP_KERNEL))
> > +				ret = VM_FAULT_OOM;
> > +		}
> 
> We do not need this hunk, don't we? dirty_page is set only if !anon so
> we never get to this path from COW.
> 
You're right. (And Nishimura pointed out this, too)

> Other than that:
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> 

I found the page is added to LRU before charging. (In this case,
memcg's LRU is ignored.) I'll post a new version with a fix.

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memcg: unlock page before charging it. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23  8:08         ` KAMEZAWA Hiroyuki
@ 2011-06-23  9:02           ` Michal Hocko
  2011-06-23 10:01             ` [PATCH] mm: preallocate page before lock_page at filemap COW. " KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-23  9:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Thu 23-06-11 17:08:11, KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Jun 2011 09:41:33 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > Other than that:
> > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> > 
> 
> I found the page is added to LRU before charging. (In this case,
> memcg's LRU is ignored.) I'll post a new version with a fix.

Yes, you are right. I have missed that.
This means that we might race with reclaim which could evict the COWed
page wich in turn would uncharge that page even though we haven't
charged it yet.

Can we postpone page_add_new_anon_rmap to the charging path or it would
just race somewhere else?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23  9:02           ` Michal Hocko
@ 2011-06-23 10:01             ` KAMEZAWA Hiroyuki
  2011-06-23 11:58               ` Michal Hocko
  2011-06-24  7:57               ` Michal Hocko
  0 siblings, 2 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-23 10:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Thu, 23 Jun 2011 11:02:04 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 23-06-11 17:08:11, KAMEZAWA Hiroyuki wrote:
> > On Thu, 23 Jun 2011 09:41:33 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> [...]
> > > Other than that:
> > > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> > > 
> > 
> > I found the page is added to LRU before charging. (In this case,
> > memcg's LRU is ignored.) I'll post a new version with a fix.
> 
> Yes, you are right. I have missed that.
> This means that we might race with reclaim which could evict the COWed
> page wich in turn would uncharge that page even though we haven't
> charged it yet.
> 
> Can we postpone page_add_new_anon_rmap to the charging path or it would
> just race somewhere else?
> 

I got a different idea. How about this ?
I think this will have benefit for non-memcg users under OOM, too.

A concerns is VM_FAULT_RETRY case but wait-for-lock will be much heavier
than preallocation + free-for-retry cost.

(I'm sorry I'll not be very active until the next week, so feel free to
 post your own version if necessary.)

This is onto -rc4 and worked well on my test.

==

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23 10:01             ` [PATCH] mm: preallocate page before lock_page at filemap COW. " KAMEZAWA Hiroyuki
@ 2011-06-23 11:58               ` Michal Hocko
  2011-06-23 13:01                 ` Hiroyuki Kamezawa
  2011-06-24  7:57               ` Michal Hocko
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-23 11:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Thu 23-06-11 19:01:57, KAMEZAWA Hiroyuki wrote:
> On Thu, 23 Jun 2011 11:02:04 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 23-06-11 17:08:11, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 23 Jun 2011 09:41:33 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > [...]
> > > > Other than that:
> > > > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> > > > 
> > > 
> > > I found the page is added to LRU before charging. (In this case,
> > > memcg's LRU is ignored.) I'll post a new version with a fix.
> > 
> > Yes, you are right. I have missed that.
> > This means that we might race with reclaim which could evict the COWed
> > page wich in turn would uncharge that page even though we haven't
> > charged it yet.
> > 
> > Can we postpone page_add_new_anon_rmap to the charging path or it would
> > just race somewhere else?
> > 
> 
> I got a different idea. How about this ?
> I think this will have benefit for non-memcg users under OOM, too.

Could you be more specific? I do not see how preallocation which might
turn out to be pointless could help under OOM.

> 
> A concerns is VM_FAULT_RETRY case but wait-for-lock will be much heavier
> than preallocation + free-for-retry cost.

Preallocation is rather costly when fault handler fails (e.g. SIGBUS
which is the easiest one to trigger).

I am not saying this approach is bad but I think that preallocation can
be much more costly than unlock, charge and lock&recheck approach.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23 11:58               ` Michal Hocko
@ 2011-06-23 13:01                 ` Hiroyuki Kamezawa
  2011-06-23 13:23                   ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Hiroyuki Kamezawa @ 2011-06-23 13:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, Christoph Hellwig, linux-mm, linux-kernel,
	Andrew Morton, Hugh Dickins, Rik van Riel, Michel Lespinasse,
	Mel Gorman, Lutz Vieweg

2011/6/23 Michal Hocko <mhocko@suse.cz>:
> On Thu 23-06-11 19:01:57, KAMEZAWA Hiroyuki wrote:
>> On Thu, 23 Jun 2011 11:02:04 +0200
>> Michal Hocko <mhocko@suse.cz> wrote:
>>
>> > On Thu 23-06-11 17:08:11, KAMEZAWA Hiroyuki wrote:
>> > > On Thu, 23 Jun 2011 09:41:33 +0200
>> > > Michal Hocko <mhocko@suse.cz> wrote:
>> > [...]
>> > > > Other than that:
>> > > > Reviewed-by: Michal Hocko <mhocko@suse.cz>
>> > > >
>> > >
>> > > I found the page is added to LRU before charging. (In this case,
>> > > memcg's LRU is ignored.) I'll post a new version with a fix.
>> >
>> > Yes, you are right. I have missed that.
>> > This means that we might race with reclaim which could evict the COWed
>> > page wich in turn would uncharge that page even though we haven't
>> > charged it yet.
>> >
>> > Can we postpone page_add_new_anon_rmap to the charging path or it would
>> > just race somewhere else?
>> >
>>
>> I got a different idea. How about this ?
>> I think this will have benefit for non-memcg users under OOM, too.
>
> Could you be more specific? I do not see how preallocation which might
> turn out to be pointless could help under OOM.
>

We'll have no page allocation under lock_page() held in this path.
I think it is good.

>>
>> A concerns is VM_FAULT_RETRY case but wait-for-lock will be much heavier
>> than preallocation + free-for-retry cost.
>
> Preallocation is rather costly when fault handler fails (e.g. SIGBUS
> which is the easiest one to trigger).
>
I think pcp cache of free page allocater does enough good job and I guess
we'll see no problem even if there is a storm of SIGBUS.

> I am not saying this approach is bad but I think that preallocation can
> be much more costly than unlock, charge and lock&recheck approach.

memcg_is_disabled() cannot help ROOT cgroup. And additional
lock/unlock method may kill FAULT_RETRY at lock contention optimization
which was added recently.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23 13:01                 ` Hiroyuki Kamezawa
@ 2011-06-23 13:23                   ` Michal Hocko
  2011-06-23 13:51                     ` Hiroyuki Kamezawa
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-23 13:23 UTC (permalink / raw)
  To: Hiroyuki Kamezawa
  Cc: KAMEZAWA Hiroyuki, Christoph Hellwig, linux-mm, linux-kernel,
	Andrew Morton, Hugh Dickins, Rik van Riel, Michel Lespinasse,
	Mel Gorman, Lutz Vieweg

On Thu 23-06-11 22:01:40, Hiroyuki Kamezawa wrote:
> 2011/6/23 Michal Hocko <mhocko@suse.cz>:
> > On Thu 23-06-11 19:01:57, KAMEZAWA Hiroyuki wrote:
> >> On Thu, 23 Jun 2011 11:02:04 +0200
> >> Michal Hocko <mhocko@suse.cz> wrote:
> >>
> >> > On Thu 23-06-11 17:08:11, KAMEZAWA Hiroyuki wrote:
> >> > > On Thu, 23 Jun 2011 09:41:33 +0200
> >> > > Michal Hocko <mhocko@suse.cz> wrote:
> >> > [...]
> >> > > > Other than that:
> >> > > > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> >> > > >
> >> > >
> >> > > I found the page is added to LRU before charging. (In this case,
> >> > > memcg's LRU is ignored.) I'll post a new version with a fix.
> >> >
> >> > Yes, you are right. I have missed that.
> >> > This means that we might race with reclaim which could evict the COWed
> >> > page wich in turn would uncharge that page even though we haven't
> >> > charged it yet.
> >> >
> >> > Can we postpone page_add_new_anon_rmap to the charging path or it would
> >> > just race somewhere else?
> >> >
> >>
> >> I got a different idea. How about this ?
> >> I think this will have benefit for non-memcg users under OOM, too.
> >
> > Could you be more specific? I do not see how preallocation which might
> > turn out to be pointless could help under OOM.
> >
> 
> We'll have no page allocation under lock_page() held in this path.
> I think it is good.

But it can also cause that the page, we are about to fault in, is evicted
due to allocation so we would have to do a major fault... This is
probably not that serious, though.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23 13:23                   ` Michal Hocko
@ 2011-06-23 13:51                     ` Hiroyuki Kamezawa
  0 siblings, 0 replies; 17+ messages in thread
From: Hiroyuki Kamezawa @ 2011-06-23 13:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, Christoph Hellwig, linux-mm, linux-kernel,
	Andrew Morton, Hugh Dickins, Rik van Riel, Michel Lespinasse,
	Mel Gorman, Lutz Vieweg

2011/6/23 Michal Hocko <mhocko@suse.cz>:
> On Thu 23-06-11 22:01:40, Hiroyuki Kamezawa wrote:
>> 2011/6/23 Michal Hocko <mhocko@suse.cz>:
>> > On Thu 23-06-11 19:01:57, KAMEZAWA Hiroyuki wrote:
>> >> On Thu, 23 Jun 2011 11:02:04 +0200
>> >> Michal Hocko <mhocko@suse.cz> wrote:
>> >>
>> >> > On Thu 23-06-11 17:08:11, KAMEZAWA Hiroyuki wrote:
>> >> > > On Thu, 23 Jun 2011 09:41:33 +0200
>> >> > > Michal Hocko <mhocko@suse.cz> wrote:
>> >> > [...]
>> >> > > > Other than that:
>> >> > > > Reviewed-by: Michal Hocko <mhocko@suse.cz>
>> >> > > >
>> >> > >
>> >> > > I found the page is added to LRU before charging. (In this case,
>> >> > > memcg's LRU is ignored.) I'll post a new version with a fix.
>> >> >
>> >> > Yes, you are right. I have missed that.
>> >> > This means that we might race with reclaim which could evict the COWed
>> >> > page wich in turn would uncharge that page even though we haven't
>> >> > charged it yet.
>> >> >
>> >> > Can we postpone page_add_new_anon_rmap to the charging path or it would
>> >> > just race somewhere else?
>> >> >
>> >>
>> >> I got a different idea. How about this ?
>> >> I think this will have benefit for non-memcg users under OOM, too.
>> >
>> > Could you be more specific? I do not see how preallocation which might
>> > turn out to be pointless could help under OOM.
>> >
>>
>> We'll have no page allocation under lock_page() held in this path.
>> I think it is good.
>
> But it can also cause that the page, we are about to fault in, is evicted
> due to allocation so we would have to do a major fault... This is
> probably not that serious, though.

For other purpose, I have(had) other patch to prevent it (and planned
to post it.)

The basic logic is...

1. add a new member variable to vm_area_struct as

  vma->vm_faulting_to

2. at __do_fault(), set vm_faulting_to as

  vma->vm_faulting_to = pgoff.


3. chec vma->vm_faulting_to at page_referenced_file() as

  if (pgoff (Was page->index) == vma->vm_faulting_to)
        referenced++

Then, the page which someone is waiting for page-fault will be marked as
referenced  and go KEEP_LOCKED.

(vm_faulting_to can be cleared after we got lock_page()).

In corner case, several threads which shares vma may fault into a vma.
But this will help typical case and have no overheads, I think.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-23 10:01             ` [PATCH] mm: preallocate page before lock_page at filemap COW. " KAMEZAWA Hiroyuki
  2011-06-23 11:58               ` Michal Hocko
@ 2011-06-24  7:57               ` Michal Hocko
  2011-06-24 11:46                 ` Hiroyuki Kamezawa
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-06-24  7:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Hellwig, linux-mm, linux-kernel, Andrew Morton,
	Hugh Dickins, Rik van Riel, Michel Lespinasse, Mel Gorman,
	Lutz Vieweg

On Thu 23-06-11 19:01:57, KAMEZAWA Hiroyuki wrote:
> From 7b0aa038f3a1c6a479a3ce6acb38c7d2740d7a75 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 23 Jun 2011 18:50:32 +0900
> Subject: [PATCH] mm: preallocate page before lock_page() at filemap COW.
> 
> Currently we are keeping faulted page locked throughout whole __do_fault
> call (except for page_mkwrite code path) after calling file system's
> fault code. If we do early COW, we allocate a new page which has to be
> charged for a memcg (mem_cgroup_newpage_charge).
> 
> This function, however, might block for unbounded amount of time if memcg
> oom killer is disabled or fork-bomb is running because the only way out of
> the OOM situation is either an external event or OOM-situation fix.
> 
> In the end we are keeping the faulted page locked and blocking other
> processes from faulting it in which is not good at all because we are
> basically punishing potentially an unrelated process for OOM condition
> in a different group (I have seen stuck system because of ld-2.11.1.so being
> locked).
> 
> We can do test easily.
> 
>  % cgcreate -g memory:A
>  % cgset -r memory.limit_in_bytes=64M A
>  % cgset -r memory.memsw.limit_in_bytes=64M A
>  % cd kernel_dir; cgexec -g memory:A make -j
> 
> Then, the whole system will live-locked until you kill 'make -j'
> by hands (or push reboot...) This is because some important
> page in a shared library are locked.
> 
> Considering again, the new page is not necessary to be allocated
> with lock_page() held. So....
> 
> This patch moves "charge" and memory allocation for COW page
> before lock_page(). Then, we can avoid scanning LRU with holding
> a lock on a page.
> 
> Then, above livelock disappears.
> 
> Reported-by: Lutz Vieweg <lvml@5t9.de>
> Original-idea-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Sorry, forgot to send my
Reviewed-by: Michal Hocko <mhocko@suse>

I still have concerns about this way to handle the issue. See the follow
up discussion in other thread (https://lkml.org/lkml/2011/6/23/135).

Anyway I think that we do not have many other options to handle this.
Either we unlock, charge, lock&restes or we preallocate, fault in

Or am I missing some other ways how to do it? What do others think about
these approaches?

> 
> Changelog:
>   - change the logic from "do charge after unlock" to
>     "do charge before lock".
> ---
>  mm/memory.c |   56 ++++++++++++++++++++++++++++++++++----------------------
>  1 files changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87d9353..845378c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3127,14 +3127,34 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	pte_t *page_table;
>  	spinlock_t *ptl;
>  	struct page *page;
> +	struct page *cow_page;
>  	pte_t entry;
>  	int anon = 0;
> -	int charged = 0;
>  	struct page *dirty_page = NULL;
>  	struct vm_fault vmf;
>  	int ret;
>  	int page_mkwrite = 0;
>  
> +	/*
> +	 * If we do COW later, allocate page befor taking lock_page()
> +	 * on the file cache page. This will reduce lock holding time.
> +	 */
> +	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
> +
> +		if (unlikely(anon_vma_prepare(vma)))
> +			return  VM_FAULT_OOM;
> +
> +		cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +		if (!cow_page)
> +			return VM_FAULT_OOM;
> +
> +		if (mem_cgroup_newpage_charge(cow_page, mm, GFP_KERNEL)) {
> +			page_cache_release(cow_page);
> +			return VM_FAULT_OOM;
> +		}
> +	} else
> +		cow_page = NULL;
> +
>  	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
>  	vmf.pgoff = pgoff;
>  	vmf.flags = flags;
> @@ -3143,12 +3163,13 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	ret = vma->vm_ops->fault(vma, &vmf);
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |
>  			    VM_FAULT_RETRY)))
> -		return ret;
> +		goto uncharge_out;
>  
>  	if (unlikely(PageHWPoison(vmf.page))) {
>  		if (ret & VM_FAULT_LOCKED)
>  			unlock_page(vmf.page);
> -		return VM_FAULT_HWPOISON;
> +		ret = VM_FAULT_HWPOISON;
> +		goto uncharge_out;
>  	}
>  
>  	/*
> @@ -3166,23 +3187,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	page = vmf.page;
>  	if (flags & FAULT_FLAG_WRITE) {
>  		if (!(vma->vm_flags & VM_SHARED)) {
> +			page = cow_page;
>  			anon = 1;
> -			if (unlikely(anon_vma_prepare(vma))) {
> -				ret = VM_FAULT_OOM;
> -				goto out;
> -			}
> -			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
> -						vma, address);
> -			if (!page) {
> -				ret = VM_FAULT_OOM;
> -				goto out;
> -			}
> -			if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
> -				ret = VM_FAULT_OOM;
> -				page_cache_release(page);
> -				goto out;
> -			}
> -			charged = 1;
>  			copy_user_highpage(page, vmf.page, address, vma);
>  			__SetPageUptodate(page);
>  		} else {
> @@ -3251,8 +3257,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  		/* no need to invalidate: a not-present page won't be cached */
>  		update_mmu_cache(vma, address, page_table);
>  	} else {
> -		if (charged)
> -			mem_cgroup_uncharge_page(page);
> +		if (cow_page)
> +			mem_cgroup_uncharge_page(cow_page);
>  		if (anon)
>  			page_cache_release(page);
>  		else
> @@ -3261,7 +3267,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  
>  	pte_unmap_unlock(page_table, ptl);
>  
> -out:
>  	if (dirty_page) {
>  		struct address_space *mapping = page->mapping;
>  
> @@ -3291,6 +3296,13 @@ out:
>  unwritable_page:
>  	page_cache_release(page);
>  	return ret;
> +uncharge_out:
> +	/* fs's fault handler get error */
> +	if (cow_page) {
> +		mem_cgroup_uncharge_page(cow_page);
> +		page_cache_release(cow_page);
> +	}
> +	return ret;
>  }
>  
>  static int do_linear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> -- 
> 1.7.4.1
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-24  7:57               ` Michal Hocko
@ 2011-06-24 11:46                 ` Hiroyuki Kamezawa
  2011-07-12  9:48                   ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Hiroyuki Kamezawa @ 2011-06-24 11:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, Christoph Hellwig, linux-mm, linux-kernel,
	Andrew Morton, Hugh Dickins, Rik van Riel, Michel Lespinasse,
	Mel Gorman, Lutz Vieweg

2011/6/24 Michal Hocko <mhocko@suse.cz>:
> Sorry, forgot to send my
> Reviewed-by: Michal Hocko <mhocko@suse>
>

Thanks.

> I still have concerns about this way to handle the issue. See the follow
> up discussion in other thread (https://lkml.org/lkml/2011/6/23/135).
>
> Anyway I think that we do not have many other options to handle this.
> Either we unlock, charge, lock&restes or we preallocate, fault in
>
I agree.

> Or am I missing some other ways how to do it? What do others think about
> these approaches?
>

Yes, I'd like to hear other mm specialists' suggestion. and I'll think
other way, again.
Anyway, memory reclaim with holding a lock_page() can cause big latency
or starvation especially when memcg is used. It's better to avoid it.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-06-24 11:46                 ` Hiroyuki Kamezawa
@ 2011-07-12  9:48                   ` Michal Hocko
  2012-01-26 10:46                     ` Lutz Vieweg
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2011-07-12  9:48 UTC (permalink / raw)
  To: Hiroyuki Kamezawa
  Cc: KAMEZAWA Hiroyuki, Christoph Hellwig, linux-mm, linux-kernel,
	Andrew Morton, Hugh Dickins, Rik van Riel, Michel Lespinasse,
	Mel Gorman, Lutz Vieweg

On Fri 24-06-11 20:46:29, Hiroyuki Kamezawa wrote:
> 2011/6/24 Michal Hocko <mhocko@suse.cz>:
> > Sorry, forgot to send my
> > Reviewed-by: Michal Hocko <mhocko@suse>
> >
> 
> Thanks.
> 
> > I still have concerns about this way to handle the issue. See the follow
> > up discussion in other thread (https://lkml.org/lkml/2011/6/23/135).
> >
> > Anyway I think that we do not have many other options to handle this.
> > Either we unlock, charge, lock&restes or we preallocate, fault in
> >
> I agree.
> 
> > Or am I missing some other ways how to do it? What do others think about
> > these approaches?
> >
> 
> Yes, I'd like to hear other mm specialists' suggestion. and I'll think
> other way, again.
> Anyway, memory reclaim with holding a lock_page() can cause big latency
> or starvation especially when memcg is used. It's better to avoid it.

Is there any intereset in discussing this or the email just got lost?
Just for reference preallocation patch from Kamezawa is already in the
Andrew's tree.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: preallocate page before lock_page at filemap COW. (WasRe: [PATCH V2] mm: Do not keep page locked during page fault while charging it for memcg
  2011-07-12  9:48                   ` Michal Hocko
@ 2012-01-26 10:46                     ` Lutz Vieweg
  0 siblings, 0 replies; 17+ messages in thread
From: Lutz Vieweg @ 2012-01-26 10:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hiroyuki Kamezawa, KAMEZAWA Hiroyuki, Christoph Hellwig, linux-mm,
	linux-kernel, Andrew Morton, Hugh Dickins, Rik van Riel,
	Michel Lespinasse, Mel Gorman

On 07/12/2011 11:48 AM, Michal Hocko wrote:
> Is there any intereset in discussing this or the email just got lost?
> Just for reference preallocation patch from Kamezawa is already in the
> Andrew's tree.

It's been a long time since this discussion, I just wanted to add
that I've been recently able to confirm the ability of memcg
to prevent single users from DOSing a system by "make -j" - in
a real-world scenario (using linux-3.2.1).

So thanks to all who contributed to the solution in whatever way :-)

(A minor issue remained: The kernel is very verbose when
killing tasks due to memcg restrictions. In fork-bomb like
scenarios, this can lead to high resource utilization for
syslog et al.)

Regards,

Lutz Vieweg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-01-26 10:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-22 12:06 [PATCH] mm: Do not keep page locked during page fault while charging it for memcg Michal Hocko
2011-06-22 12:15 ` Christoph Hellwig
2011-06-22 12:32   ` [PATCH V2] " Michal Hocko
2011-06-23  6:08     ` [PATCH] memcg: unlock page before charging it. (WasRe: " KAMEZAWA Hiroyuki
2011-06-23  7:23       ` Daisuke Nishimura
2011-06-23  7:41       ` Michal Hocko
2011-06-23  8:08         ` KAMEZAWA Hiroyuki
2011-06-23  9:02           ` Michal Hocko
2011-06-23 10:01             ` [PATCH] mm: preallocate page before lock_page at filemap COW. " KAMEZAWA Hiroyuki
2011-06-23 11:58               ` Michal Hocko
2011-06-23 13:01                 ` Hiroyuki Kamezawa
2011-06-23 13:23                   ` Michal Hocko
2011-06-23 13:51                     ` Hiroyuki Kamezawa
2011-06-24  7:57               ` Michal Hocko
2011-06-24 11:46                 ` Hiroyuki Kamezawa
2011-07-12  9:48                   ` Michal Hocko
2012-01-26 10:46                     ` Lutz Vieweg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).