linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] [PATCH] memcg: fix infinite loop
@ 2009-01-15  6:07 Li Zefan
  2009-01-15  6:15 ` Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Li Zefan @ 2009-01-15  6:07 UTC (permalink / raw)
  To: Balbir Singh, KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm@kvack.org

1. task p1 is in /memcg/0
2. p1 does mmap(4096*2, MAP_LOCKED)
3. echo 4096 > /memcg/0/memory.limit_in_bytes

The above 'echo' will never return, unless p1 exited or freed the memory.
The cause is we can't reclaim memory from p1, so the while loop in
mem_cgroup_resize_limit() won't break.

This patch fixes it by decrementing retry_count regardless the return value
of mem_cgroup_hierarchical_reclaim().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 mm/memcontrol.c |   15 ++++-----------
 1 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..1995098 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1524,11 +1524,10 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
 {
 
 	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
-	int progress;
 	u64 memswlimit;
 	int ret = 0;
 
-	while (retry_count) {
+	while (retry_count--) {
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			break;
@@ -1551,9 +1550,7 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
 		if (!ret)
 			break;
 
-		progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
-							   false);
-  		if (!progress)			retry_count--;
+		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false);
 	}
 
 	return ret;
@@ -1563,13 +1560,13 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
 				unsigned long long val)
 {
 	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
-	u64 memlimit, oldusage, curusage;
+	u64 memlimit;
 	int ret;
 
 	if (!do_swap_account)
 		return -EINVAL;
 
-	while (retry_count) {
+	while (retry_count--) {
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			break;
@@ -1592,11 +1589,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
 		if (!ret)
 			break;
 
-		oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
 		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
-		curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
-		if (curusage >= oldusage)
-			retry_count--;
 	}
 	return ret;
 }
-- 
1.5.4.rc3


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan
@ 2009-01-15  6:15 ` Balbir Singh
  2009-01-15  6:31   ` KAMEZAWA Hiroyuki
  2009-01-15  6:16 ` Daisuke Nishimura
  2009-01-19  8:49 ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2009-01-15  6:15 UTC (permalink / raw)
  To: Li Zefan; +Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-mm@kvack.org

* Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]:

> 1. task p1 is in /memcg/0
> 2. p1 does mmap(4096*2, MAP_LOCKED)
> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> 
> The above 'echo' will never return, unless p1 exited or freed the memory.
> The cause is we can't reclaim memory from p1, so the while loop in
> mem_cgroup_resize_limit() won't break.
> 
> This patch fixes it by decrementing retry_count regardless the return value
> of mem_cgroup_hierarchical_reclaim().
>

The problem definitely seems to exist, shouldn't we fix reclaim to
return 0, so that we know progress is not made and retry count
decrements? 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan
  2009-01-15  6:15 ` Balbir Singh
@ 2009-01-15  6:16 ` Daisuke Nishimura
  2009-01-15  6:27   ` Li Zefan
  2009-01-19  8:49 ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 13+ messages in thread
From: Daisuke Nishimura @ 2009-01-15  6:16 UTC (permalink / raw)
  To: Li Zefan; +Cc: nishimura, Balbir Singh, KAMEZAWA Hiroyuki, linux-mm@kvack.org

On Thu, 15 Jan 2009 14:07:51 +0800, Li Zefan <lizf@cn.fujitsu.com> wrote:
> 1. task p1 is in /memcg/0
> 2. p1 does mmap(4096*2, MAP_LOCKED)
> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> 
> The above 'echo' will never return, unless p1 exited or freed the memory.
> The cause is we can't reclaim memory from p1, so the while loop in
> mem_cgroup_resize_limit() won't break.
> 
But it can be interrupted, right ?

I don't think this would be a big problem.


Thanks,
Daisuke Nishimura.

> This patch fixes it by decrementing retry_count regardless the return value
> of mem_cgroup_hierarchical_reclaim().
> 
> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
> ---
>  mm/memcontrol.c |   15 ++++-----------
>  1 files changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..1995098 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1524,11 +1524,10 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>  {
>  
>  	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
> -	int progress;
>  	u64 memswlimit;
>  	int ret = 0;
>  
> -	while (retry_count) {
> +	while (retry_count--) {
>  		if (signal_pending(current)) {
>  			ret = -EINTR;
>  			break;
> @@ -1551,9 +1550,7 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>  		if (!ret)
>  			break;
>  
> -		progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
> -							   false);
> -  		if (!progress)			retry_count--;
> +		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false);
>  	}
>  
>  	return ret;
> @@ -1563,13 +1560,13 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
>  				unsigned long long val)
>  {
>  	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
> -	u64 memlimit, oldusage, curusage;
> +	u64 memlimit;
>  	int ret;
>  
>  	if (!do_swap_account)
>  		return -EINVAL;
>  
> -	while (retry_count) {
> +	while (retry_count--) {
>  		if (signal_pending(current)) {
>  			ret = -EINTR;
>  			break;
> @@ -1592,11 +1589,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
>  		if (!ret)
>  			break;
>  
> -		oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
>  		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
> -		curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> -		if (curusage >= oldusage)
> -			retry_count--;
>  	}
>  	return ret;
>  }
> -- 
> 1.5.4.rc3
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  6:16 ` Daisuke Nishimura
@ 2009-01-15  6:27   ` Li Zefan
  0 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2009-01-15  6:27 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, linux-mm@kvack.org

Daisuke Nishimura wrote:
> On Thu, 15 Jan 2009 14:07:51 +0800, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> 1. task p1 is in /memcg/0
>> 2. p1 does mmap(4096*2, MAP_LOCKED)
>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
>>
>> The above 'echo' will never return, unless p1 exited or freed the memory.
>> The cause is we can't reclaim memory from p1, so the while loop in
>> mem_cgroup_resize_limit() won't break.
>>
> But it can be interrupted, right ?
> 

So we expect users track how long does this operation take, and send some
signal to stop it if it takes a long time ? I don't think this is user-friendly..

> I don't think this would be a big problem.
> 

It's a problem, though it may not be a big problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  6:15 ` Balbir Singh
@ 2009-01-15  6:31   ` KAMEZAWA Hiroyuki
  2009-01-15  7:14     ` Li Zefan
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-15  6:31 UTC (permalink / raw)
  To: balbir; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org

On Thu, 15 Jan 2009 11:45:57 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]:
> 
> > 1. task p1 is in /memcg/0
> > 2. p1 does mmap(4096*2, MAP_LOCKED)
> > 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> > 
> > The above 'echo' will never return, unless p1 exited or freed the memory.
> > The cause is we can't reclaim memory from p1, so the while loop in
> > mem_cgroup_resize_limit() won't break.
> > 
> > This patch fixes it by decrementing retry_count regardless the return value
> > of mem_cgroup_hierarchical_reclaim().
> >
> 
> The problem definitely seems to exist, shouldn't we fix reclaim to
> return 0, so that we know progress is not made and retry count
> decrements? 
> 

The behavior is correct. And we already check signal_pending() in the loop.
Ctrl-C or SIGALARM will works better than checking retry count.
 But adding a new control file, memory.resize_timeout to check timeout is a choice.

Second thought is.
thanks to Kosaki at el, LRU for locked pages is now visible in memory.stat
file. So, we may able to have clever way.

== 
 unevictable = mem_cgroup_get_all_zonestat(mem, LRU_UNEVICLABLE);
 if (newlimit < unevictable)
	break;
==
But considering hierarchy, this can be complex.
please don't modify current behavior for a while, I'll try to write "hierarchical stat"
with CSS_ID patch set's easy hierarchy walk.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  6:31   ` KAMEZAWA Hiroyuki
@ 2009-01-15  7:14     ` Li Zefan
  2009-01-15  7:21       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Li Zefan @ 2009-01-15  7:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org

KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Jan 2009 11:45:57 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]:
>>
>>> 1. task p1 is in /memcg/0
>>> 2. p1 does mmap(4096*2, MAP_LOCKED)
>>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
>>>
>>> The above 'echo' will never return, unless p1 exited or freed the memory.
>>> The cause is we can't reclaim memory from p1, so the while loop in
>>> mem_cgroup_resize_limit() won't break.
>>>
>>> This patch fixes it by decrementing retry_count regardless the return value
>>> of mem_cgroup_hierarchical_reclaim().
>>>
>> The problem definitely seems to exist, shouldn't we fix reclaim to
>> return 0, so that we know progress is not made and retry count
>> decrements? 
>>
> 
> The behavior is correct. And we already check signal_pending() in the loop.
> Ctrl-C or SIGALARM will works better than checking retry count.

But this behavior seems like a regression. Please try it in 2.6.28, you'll see
it returns EBUSY immediately.

Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ?

>  But adding a new control file, memory.resize_timeout to check timeout is a choice.
> 
> Second thought is.
> thanks to Kosaki at el, LRU for locked pages is now visible in memory.stat
> file. So, we may able to have clever way.
> 
> == 
>  unevictable = mem_cgroup_get_all_zonestat(mem, LRU_UNEVICLABLE);
>  if (newlimit < unevictable)
> 	break;
> ==
> But considering hierarchy, this can be complex.
> please don't modify current behavior for a while, I'll try to write "hierarchical stat"
> with CSS_ID patch set's easy hierarchy walk.
> 
> Thanks,
> -Kame
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  7:14     ` Li Zefan
@ 2009-01-15  7:21       ` KAMEZAWA Hiroyuki
  2009-01-15  7:26         ` Balbir Singh
  2009-01-15  7:32         ` Li Zefan
  0 siblings, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-15  7:21 UTC (permalink / raw)
  To: Li Zefan; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org

On Thu, 15 Jan 2009 15:14:38 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Thu, 15 Jan 2009 11:45:57 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> >> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]:
> >>
> >>> 1. task p1 is in /memcg/0
> >>> 2. p1 does mmap(4096*2, MAP_LOCKED)
> >>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> >>>
> >>> The above 'echo' will never return, unless p1 exited or freed the memory.
> >>> The cause is we can't reclaim memory from p1, so the while loop in
> >>> mem_cgroup_resize_limit() won't break.
> >>>
> >>> This patch fixes it by decrementing retry_count regardless the return value
> >>> of mem_cgroup_hierarchical_reclaim().
> >>>
> >> The problem definitely seems to exist, shouldn't we fix reclaim to
> >> return 0, so that we know progress is not made and retry count
> >> decrements? 
> >>
> > 
> > The behavior is correct. And we already check signal_pending() in the loop.
> > Ctrl-C or SIGALARM will works better than checking retry count.
> 
> But this behavior seems like a regression. Please try it in 2.6.28, you'll see
> it returns EBUSY immediately.
> 
> Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ?
> 

This is intentional behavior change by
==
 memcg-make-oom-less-frequently.patch
==

try_to_free_page() returns positive value if try_to_free_page() reclaims at
least 1 pages. It itself doesn't seem to be buggy.

What buggy is resize_limit's retry-out check code, I think.

How about following ?
==
	while (1) {
		if (signal_pending())
			break;
		try to set limit ....
		...
		ret = mem_cgroup_hierarchical_reclaim(memcg,  GFP_KERNEL, false);
		total_progress += ret;	

		if (total_progress > (memcg->res.usage - val) * 2) {
			/*
			 * It seems we reclaimed twice of necessary
			 * pages...this memcg is busy
			 */
			ret = -EBUSY;
			break;
		}
	}
==

Thanks,
-Kame








--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  7:21       ` KAMEZAWA Hiroyuki
@ 2009-01-15  7:26         ` Balbir Singh
  2009-01-15  7:32         ` Li Zefan
  1 sibling, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2009-01-15  7:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-15 16:21:26]:

> On Thu, 15 Jan 2009 15:14:38 +0800
> Li Zefan <lizf@cn.fujitsu.com> wrote:
> 
> > KAMEZAWA Hiroyuki wrote:
> > > On Thu, 15 Jan 2009 11:45:57 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > >> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]:
> > >>
> > >>> 1. task p1 is in /memcg/0
> > >>> 2. p1 does mmap(4096*2, MAP_LOCKED)
> > >>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> > >>>
> > >>> The above 'echo' will never return, unless p1 exited or freed the memory.
> > >>> The cause is we can't reclaim memory from p1, so the while loop in
> > >>> mem_cgroup_resize_limit() won't break.
> > >>>
> > >>> This patch fixes it by decrementing retry_count regardless the return value
> > >>> of mem_cgroup_hierarchical_reclaim().
> > >>>
> > >> The problem definitely seems to exist, shouldn't we fix reclaim to
> > >> return 0, so that we know progress is not made and retry count
> > >> decrements? 
> > >>
> > > 
> > > The behavior is correct. And we already check signal_pending() in the loop.
> > > Ctrl-C or SIGALARM will works better than checking retry count.
> > 
> > But this behavior seems like a regression. Please try it in 2.6.28, you'll see
> > it returns EBUSY immediately.
> > 
> > Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ?
> > 
> 
> This is intentional behavior change by
> ==
>  memcg-make-oom-less-frequently.patch
> ==
> 
> try_to_free_page() returns positive value if try_to_free_page() reclaims at
> least 1 pages. It itself doesn't seem to be buggy.
> 
> What buggy is resize_limit's retry-out check code, I think.
> 
> How about following ?
> ==
> 	while (1) {
> 		if (signal_pending())
> 			break;
> 		try to set limit ....
> 		...
> 		ret = mem_cgroup_hierarchical_reclaim(memcg,  GFP_KERNEL, false);
> 		total_progress += ret;	
> 
> 		if (total_progress > (memcg->res.usage - val) * 2) {
> 			/*
> 			 * It seems we reclaimed twice of necessary
> 			 * pages...this memcg is busy
> 			 */
> 			ret = -EBUSY;
> 			break;

I think we need the nr_retries here as well, we should refuse to
resize_limit beyond a certain number of retries. In the case that Li
mentioned total_progress will be 0, since we cannot reclaim anything.
I prefer a nr_retries based approach for failure.

> 		}
> 	}
> ==
> 
> Thanks,
> -Kame
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  7:21       ` KAMEZAWA Hiroyuki
  2009-01-15  7:26         ` Balbir Singh
@ 2009-01-15  7:32         ` Li Zefan
  2009-01-15  7:35           ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 13+ messages in thread
From: Li Zefan @ 2009-01-15  7:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org

KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Jan 2009 15:14:38 +0800
> Li Zefan <lizf@cn.fujitsu.com> wrote:
> 
>> KAMEZAWA Hiroyuki wrote:
>>> On Thu, 15 Jan 2009 11:45:57 +0530
>>> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>>
>>>> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]:
>>>>
>>>>> 1. task p1 is in /memcg/0
>>>>> 2. p1 does mmap(4096*2, MAP_LOCKED)
>>>>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
>>>>>
>>>>> The above 'echo' will never return, unless p1 exited or freed the memory.
>>>>> The cause is we can't reclaim memory from p1, so the while loop in
>>>>> mem_cgroup_resize_limit() won't break.
>>>>>
>>>>> This patch fixes it by decrementing retry_count regardless the return value
>>>>> of mem_cgroup_hierarchical_reclaim().
>>>>>
>>>> The problem definitely seems to exist, shouldn't we fix reclaim to
>>>> return 0, so that we know progress is not made and retry count
>>>> decrements? 
>>>>
>>> The behavior is correct. And we already check signal_pending() in the loop.
>>> Ctrl-C or SIGALARM will works better than checking retry count.
>> But this behavior seems like a regression. Please try it in 2.6.28, you'll see
>> it returns EBUSY immediately.
>>
>> Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ?
>>
> 
> This is intentional behavior change by
> ==
>  memcg-make-oom-less-frequently.patch
> ==
> 
> try_to_free_page() returns positive value if try_to_free_page() reclaims at
> least 1 pages. It itself doesn't seem to be buggy.
> 
> What buggy is resize_limit's retry-out check code, I think.
> 
> How about following ?

Not sure.

I didn't look into the reclaim code, so I'd rather let you and Balbir decide if
this is a bug and (if yes) how to fix it.

> ==
> 	while (1) {
> 		if (signal_pending())
> 			break;
> 		try to set limit ....
> 		...
> 		ret = mem_cgroup_hierarchical_reclaim(memcg,  GFP_KERNEL, false);
> 		total_progress += ret;	
> 
> 		if (total_progress > (memcg->res.usage - val) * 2) {
> 			/*
> 			 * It seems we reclaimed twice of necessary
> 			 * pages...this memcg is busy
> 			 */
> 			ret = -EBUSY;
> 			break;
> 		}
> 	}
> ==
> 
> Thanks,
> -Kame
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  7:32         ` Li Zefan
@ 2009-01-15  7:35           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-15  7:35 UTC (permalink / raw)
  To: Li Zefan; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org

On Thu, 15 Jan 2009 15:32:19 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:

> > try_to_free_page() returns positive value if try_to_free_page() reclaims at
> > least 1 pages. It itself doesn't seem to be buggy.
> > 
> > What buggy is resize_limit's retry-out check code, I think.
> > 
> > How about following ?
> 
> Not sure.
> 
> I didn't look into the reclaim code, so I'd rather let you and Balbir decide if
> this is a bug and (if yes) how to fix it.
> 

Hmm, I personally think this is not a bug. But *UNEXPECTED* behavior change, yes,
it's called regression.

To be honest, I never like retry-by-count because there is no trustable logic.
Thank you for reporting anyway. I'll consider some workaround.

-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-15  6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan
  2009-01-15  6:15 ` Balbir Singh
  2009-01-15  6:16 ` Daisuke Nishimura
@ 2009-01-19  8:49 ` KAMEZAWA Hiroyuki
  2009-01-19  9:57   ` Balbir Singh
  2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-19  8:49 UTC (permalink / raw)
  To: Li Zefan; +Cc: Balbir Singh, Daisuke Nishimura, linux-mm@kvack.org

On Thu, 15 Jan 2009 14:07:51 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> 1. task p1 is in /memcg/0
> 2. p1 does mmap(4096*2, MAP_LOCKED)
> 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> 
> The above 'echo' will never return, unless p1 exited or freed the memory.
> The cause is we can't reclaim memory from p1, so the while loop in
> mem_cgroup_resize_limit() won't break.
> 
> This patch fixes it by decrementing retry_count regardless the return value
> of mem_cgroup_hierarchical_reclaim().
> 

Maybe a patch like this is necessary.  But details are not fixed yet. 
Any comments are welcome.

(This is base on my CSS ID patch set.)

-Kame
==

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

As Li Zefan pointed out, shrinking memcg's limit should return -EBUSY
after reasonable retries. This patch tries to fix the current behavior
of shrink_usage.

Before looking into "shrink should return -EBUSY" problem, we should fix
hierarchical reclaim code. It compares current usage and current limit,
but it only makes sense when the kernel reclaims memory because hit limits.
This is also a problem.

What this patch does are.

  1. add new argument "shrink" to hierarchical reclaim. If "shrink==true",
     hierarchical reclaim returns immediately and the caller checks the kernel
     should shrink more or not.
     (At shrinking memory, usage is always smaller than limit. So check for
      usage < limit is useless.)

  2. For adjusting to above change, 2 changes in "shrink"'s retry path.
     2-a. retry_count depends on # of children because the kernel visits
	  the children under hierarchy one by one.
     2-b. rather than checking return value of hierarchical_reclaim's progress,
	  compares usage-before-shrink and usage-after-shrink.
	  If usage-before-shrink > usage-after-shrink, retry_count is
	  decremented.

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 60 insertions(+), 11 deletions(-)

Index: mmotm-2.6.29-Jan16/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Jan16.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Jan16/mm/memcontrol.c
@@ -696,6 +696,23 @@ static int mem_cgroup_walk_tree(struct m
 	return ret;
 }
 
+static int mem_cgroup_count_children_cb(struct mem_cgroup *mem, void *data)
+{
+	int *val = data;
+	(*val)++;
+	return 0;
+}
+
+static int mem_cgroup_count_children(struct mem_cgroup *mem)
+{
+	int num = 0;
+	rcu_read_lock();
+ 	mem_cgroup_walk_tree(mem, &num, mem_cgroup_count_children_cb);
+	rcu_read_unlock();
+
+	return num;
+}
+
 /*
  * Visit the first child (need not be the first child as per the ordering
  * of the cgroup list, since we track last_scanned_child) of @mem and use
@@ -752,9 +769,12 @@ mem_cgroup_select_victim(struct mem_cgro
  * We give up and return to the caller when scan_age is increased by 2. This
  * means try_to_free_mem_cgroup_pages() is called against all children cgroup,
  * at least once. The caller itself will do further retry if necessary.
+ *
+ * If shrink==true, this routine doesn't check usage < limit, and just return
+ * quickly. It depends on callers whether shrinking is enough or not.
  */
 static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
-						gfp_t gfp_mask, bool noswap)
+				gfp_t gfp_mask, bool noswap, bool shrink)
 {
 	struct mem_cgroup *victim;
 	unsigned long start_age;
@@ -782,6 +802,13 @@ static int mem_cgroup_hierarchical_recla
 		ret = try_to_free_mem_cgroup_pages(victim, gfp_mask, noswap,
 						   get_swappiness(victim));
 		css_put(&victim->css);
+		/*
+		 * At shrinking usage, we can't check we should stop here or
+		 * reclaim more. It's depends on callers. last_scanned_child
+		 * will work enough for doing round-robin.
+		 */
+		if (shrink)
+			return ret;
 		total += ret;
 		if (mem_cgroup_check_under_limit(root_mem))
 			return 1 + total;
@@ -867,7 +894,7 @@ static int __mem_cgroup_try_charge(struc
 			goto nomem;
 
 		ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, gfp_mask,
-							noswap);
+							noswap, false);
 		if (ret)
 			continue;
 
@@ -1488,6 +1515,7 @@ int mem_cgroup_shrink_usage(struct page 
 	struct mem_cgroup *mem = NULL;
 	int progress = 0;
 	int retry = MEM_CGROUP_RECLAIM_RETRIES;
+	int children;
 
 	if (mem_cgroup_disabled())
 		return 0;
@@ -1499,7 +1527,8 @@ int mem_cgroup_shrink_usage(struct page 
 		return 0;
 
 	do {
-		progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true);
+		progress = mem_cgroup_hierarchical_reclaim(mem,
+					gfp_mask, true, false);
 		progress += mem_cgroup_check_under_limit(mem);
 	} while (!progress && --retry);
 
@@ -1514,11 +1543,21 @@ static DEFINE_MUTEX(set_limit_mutex);
 static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
 				unsigned long long val)
 {
-
-	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
+	int retry_count;
 	int progress;
 	u64 memswlimit;
 	int ret = 0;
+	int children = mem_cgroup_count_children(memcg);
+	u64 curusage, oldusage, minusage;
+
+	/*
+	 * For keeping hierarchical_reclaim simple, how long we should retry
+	 * is depends on callers. We set our retry-count to be function
+	 * of # of children which we should visit in this loop.
+	 */
+	retry_count = MEM_CGROUP_RECLAIM_RETRIES * children;
+
+	oldusage = res_counter_read_u64(&memcg->res, RES_USAGE);
 
 	while (retry_count) {
 		if (signal_pending(current)) {
@@ -1544,8 +1583,13 @@ static int mem_cgroup_resize_limit(struc
 			break;
 
 		progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
-							   false);
-  		if (!progress)			retry_count--;
+						   false, true);
+		curusage = res_counter_read_u64(&memcg->res, RES_USAGE);
+		/* Usage is reduced ? */
+  		if (curusage >= oldusage)
+			retry_count--;
+		else
+			oldusage = curusage;
 	}
 
 	return ret;
@@ -1554,13 +1598,16 @@ static int mem_cgroup_resize_limit(struc
 int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
 				unsigned long long val)
 {
-	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
+	int retry_count;
 	u64 memlimit, oldusage, curusage;
+	int children = mem_cgroup_count_children(memcg);
 	int ret;
 
 	if (!do_swap_account)
 		return -EINVAL;
-
+	/* see mem_cgroup_resize_res_limit */
+ 	retry_count = children * MEM_CGROUP_RECLAIM_RETRIES;
+	oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
 	while (retry_count) {
 		if (signal_pending(current)) {
 			ret = -EINTR;
@@ -1584,11 +1631,13 @@ int mem_cgroup_resize_memsw_limit(struct
 		if (!ret)
 			break;
 
-		oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
-		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
+		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true);
 		curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
+		/* Usage is reduced ? */
 		if (curusage >= oldusage)
 			retry_count--;
+		else
+			oldusage = curusage;
 	}
 	return ret;
 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-19  8:49 ` KAMEZAWA Hiroyuki
@ 2009-01-19  9:57   ` Balbir Singh
  2009-01-19 10:07     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2009-01-19  9:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-19 17:49:22]:

> On Thu, 15 Jan 2009 14:07:51 +0800
> Li Zefan <lizf@cn.fujitsu.com> wrote:
> 
> > 1. task p1 is in /memcg/0
> > 2. p1 does mmap(4096*2, MAP_LOCKED)
> > 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> > 
> > The above 'echo' will never return, unless p1 exited or freed the memory.
> > The cause is we can't reclaim memory from p1, so the while loop in
> > mem_cgroup_resize_limit() won't break.
> > 
> > This patch fixes it by decrementing retry_count regardless the return value
> > of mem_cgroup_hierarchical_reclaim().
> > 
> 
> Maybe a patch like this is necessary.  But details are not fixed yet. 
> Any comments are welcome.
> 
> (This is base on my CSS ID patch set.)
> 
> -Kame
> ==
> 
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> As Li Zefan pointed out, shrinking memcg's limit should return -EBUSY
> after reasonable retries. This patch tries to fix the current behavior
> of shrink_usage.
> 
> Before looking into "shrink should return -EBUSY" problem, we should fix
> hierarchical reclaim code. It compares current usage and current limit,
> but it only makes sense when the kernel reclaims memory because hit limits.
> This is also a problem.
> 
> What this patch does are.
> 
>   1. add new argument "shrink" to hierarchical reclaim. If "shrink==true",
>      hierarchical reclaim returns immediately and the caller checks the kernel
>      should shrink more or not.
>      (At shrinking memory, usage is always smaller than limit. So check for
>       usage < limit is useless.)
> 
>   2. For adjusting to above change, 2 changes in "shrink"'s retry path.
>      2-a. retry_count depends on # of children because the kernel visits
> 	  the children under hierarchy one by one.
>      2-b. rather than checking return value of hierarchical_reclaim's progress,
> 	  compares usage-before-shrink and usage-after-shrink.
> 	  If usage-before-shrink > usage-after-shrink, retry_count is
> 	  decremented.

The code seems to do the reverse, it checks for
        if (currusage >= oldusage)

> 
> Reported-by: Li Zefan <lizf@cn.fujitsu.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 60 insertions(+), 11 deletions(-)
> 
> Index: mmotm-2.6.29-Jan16/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.29-Jan16.orig/mm/memcontrol.c
> +++ mmotm-2.6.29-Jan16/mm/memcontrol.c
> @@ -696,6 +696,23 @@ static int mem_cgroup_walk_tree(struct m
>  	return ret;
>  }
> 
> +static int mem_cgroup_count_children_cb(struct mem_cgroup *mem, void *data)
> +{
> +	int *val = data;
> +	(*val)++;
> +	return 0;
> +}
> +
> +static int mem_cgroup_count_children(struct mem_cgroup *mem)
> +{
> +	int num = 0;
> +	rcu_read_lock();
> + 	mem_cgroup_walk_tree(mem, &num, mem_cgroup_count_children_cb);
> +	rcu_read_unlock();
> +
> +	return num;
> +}
> +
>  /*
>   * Visit the first child (need not be the first child as per the ordering
>   * of the cgroup list, since we track last_scanned_child) of @mem and use
> @@ -752,9 +769,12 @@ mem_cgroup_select_victim(struct mem_cgro
>   * We give up and return to the caller when scan_age is increased by 2. This
>   * means try_to_free_mem_cgroup_pages() is called against all children cgroup,
>   * at least once. The caller itself will do further retry if necessary.
> + *
> + * If shrink==true, this routine doesn't check usage < limit, and just return
> + * quickly. It depends on callers whether shrinking is enough or not.
>   */
>  static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> -						gfp_t gfp_mask, bool noswap)
> +				gfp_t gfp_mask, bool noswap, bool shrink)
>  {
>  	struct mem_cgroup *victim;
>  	unsigned long start_age;
> @@ -782,6 +802,13 @@ static int mem_cgroup_hierarchical_recla
>  		ret = try_to_free_mem_cgroup_pages(victim, gfp_mask, noswap,
>  						   get_swappiness(victim));
>  		css_put(&victim->css);
> +		/*
> +		 * At shrinking usage, we can't check we should stop here or
> +		 * reclaim more. It's depends on callers. last_scanned_child
> +		 * will work enough for doing round-robin.
> +		 */
> +		if (shrink)
> +			return ret;
>  		total += ret;
>  		if (mem_cgroup_check_under_limit(root_mem))
>  			return 1 + total;
> @@ -867,7 +894,7 @@ static int __mem_cgroup_try_charge(struc
>  			goto nomem;
> 
>  		ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, gfp_mask,
> -							noswap);
> +							noswap, false);
>  		if (ret)
>  			continue;
> 
> @@ -1488,6 +1515,7 @@ int mem_cgroup_shrink_usage(struct page 
>  	struct mem_cgroup *mem = NULL;
>  	int progress = 0;
>  	int retry = MEM_CGROUP_RECLAIM_RETRIES;
> +	int children;
> 
>  	if (mem_cgroup_disabled())
>  		return 0;
> @@ -1499,7 +1527,8 @@ int mem_cgroup_shrink_usage(struct page 
>  		return 0;
> 
>  	do {
> -		progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true);
> +		progress = mem_cgroup_hierarchical_reclaim(mem,
> +					gfp_mask, true, false);
>  		progress += mem_cgroup_check_under_limit(mem);
>  	} while (!progress && --retry);
> 
> @@ -1514,11 +1543,21 @@ static DEFINE_MUTEX(set_limit_mutex);
>  static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
>  				unsigned long long val)
>  {
> -
> -	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
> +	int retry_count;
>  	int progress;
>  	u64 memswlimit;
>  	int ret = 0;
> +	int children = mem_cgroup_count_children(memcg);
> +	u64 curusage, oldusage, minusage;
> +
> +	/*
> +	 * For keeping hierarchical_reclaim simple, how long we should retry
> +	 * is depends on callers. We set our retry-count to be function
> +	 * of # of children which we should visit in this loop.
> +	 */
> +	retry_count = MEM_CGROUP_RECLAIM_RETRIES * children;
> +
> +	oldusage = res_counter_read_u64(&memcg->res, RES_USAGE);
> 
>  	while (retry_count) {
>  		if (signal_pending(current)) {
> @@ -1544,8 +1583,13 @@ static int mem_cgroup_resize_limit(struc
>  			break;
> 
>  		progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
> -							   false);
> -  		if (!progress)			retry_count--;
> +						   false, true);
> +		curusage = res_counter_read_u64(&memcg->res, RES_USAGE);
> +		/* Usage is reduced ? */
> +  		if (curusage >= oldusage)
> +			retry_count--;
> +		else
> +			oldusage = curusage;
>  	}
> 
>  	return ret;
> @@ -1554,13 +1598,16 @@ static int mem_cgroup_resize_limit(struc
>  int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
>  				unsigned long long val)
>  {
> -	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
> +	int retry_count;
>  	u64 memlimit, oldusage, curusage;
> +	int children = mem_cgroup_count_children(memcg);
>  	int ret;
> 
>  	if (!do_swap_account)
>  		return -EINVAL;
> -
> +	/* see mem_cgroup_resize_res_limit */
> + 	retry_count = children * MEM_CGROUP_RECLAIM_RETRIES;
> +	oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
>  	while (retry_count) {
>  		if (signal_pending(current)) {
>  			ret = -EINTR;
> @@ -1584,11 +1631,13 @@ int mem_cgroup_resize_memsw_limit(struct
>  		if (!ret)
>  			break;
> 
> -		oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> -		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
> +		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true);
>  		curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> +		/* Usage is reduced ? */
>  		if (curusage >= oldusage)
>  			retry_count--;
> +		else
> +			oldusage = curusage;
>  	}
>  	return ret;
>  }

Has this been tested? It seems OK to the naked eye :)

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH] memcg: fix infinite loop
  2009-01-19  9:57   ` Balbir Singh
@ 2009-01-19 10:07     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-19 10:07 UTC (permalink / raw)
  To: balbir; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org

On Mon, 19 Jan 2009 15:27:38 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-19 17:49:22]:
> 
> > On Thu, 15 Jan 2009 14:07:51 +0800
> > Li Zefan <lizf@cn.fujitsu.com> wrote:
> > 
> > > 1. task p1 is in /memcg/0
> > > 2. p1 does mmap(4096*2, MAP_LOCKED)
> > > 3. echo 4096 > /memcg/0/memory.limit_in_bytes
> > > 
> > > The above 'echo' will never return, unless p1 exited or freed the memory.
> > > The cause is we can't reclaim memory from p1, so the while loop in
> > > mem_cgroup_resize_limit() won't break.
> > > 
> > > This patch fixes it by decrementing retry_count regardless the return value
> > > of mem_cgroup_hierarchical_reclaim().
> > > 
> > 
> > Maybe a patch like this is necessary.  But details are not fixed yet. 
> > Any comments are welcome.
> > 
> > (This is base on my CSS ID patch set.)
> > 
> > -Kame
> > ==
> > 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > As Li Zefan pointed out, shrinking memcg's limit should return -EBUSY
> > after reasonable retries. This patch tries to fix the current behavior
> > of shrink_usage.
> > 
> > Before looking into "shrink should return -EBUSY" problem, we should fix
> > hierarchical reclaim code. It compares current usage and current limit,
> > but it only makes sense when the kernel reclaims memory because hit limits.
> > This is also a problem.
> > 
> > What this patch does are.
> > 
> >   1. add new argument "shrink" to hierarchical reclaim. If "shrink==true",
> >      hierarchical reclaim returns immediately and the caller checks the kernel
> >      should shrink more or not.
> >      (At shrinking memory, usage is always smaller than limit. So check for
> >       usage < limit is useless.)
> > 
> >   2. For adjusting to above change, 2 changes in "shrink"'s retry path.
> >      2-a. retry_count depends on # of children because the kernel visits
> > 	  the children under hierarchy one by one.
> >      2-b. rather than checking return value of hierarchical_reclaim's progress,
> > 	  compares usage-before-shrink and usage-after-shrink.
> > 	  If usage-before-shrink > usage-after-shrink, retry_count is
> > 	  decremented.
> 
> The code seems to do the reverse, it checks for
>         if (currusage >= oldusage)
> 
Ah, the text is wrong ;(

> > -		oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> > -		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
> > +		mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true);
> >  		curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> > +		/* Usage is reduced ? */
> >  		if (curusage >= oldusage)
> >  			retry_count--;
> > +		else
> > +			oldusage = curusage;
> >  	}
> >  	return ret;
> >  }
> 
> Has this been tested? It seems OK to the naked eye :)
> 
Thank you, and yes, tested.
I'll try to make this patch simpler and queue on my stack.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-01-19 10:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-15  6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan
2009-01-15  6:15 ` Balbir Singh
2009-01-15  6:31   ` KAMEZAWA Hiroyuki
2009-01-15  7:14     ` Li Zefan
2009-01-15  7:21       ` KAMEZAWA Hiroyuki
2009-01-15  7:26         ` Balbir Singh
2009-01-15  7:32         ` Li Zefan
2009-01-15  7:35           ` KAMEZAWA Hiroyuki
2009-01-15  6:16 ` Daisuke Nishimura
2009-01-15  6:27   ` Li Zefan
2009-01-19  8:49 ` KAMEZAWA Hiroyuki
2009-01-19  9:57   ` Balbir Singh
2009-01-19 10:07     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).