* [RFC] [PATCH] memcg: fix infinite loop
@ 2009-01-15 6:07 Li Zefan
2009-01-15 6:15 ` Balbir Singh
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Li Zefan @ 2009-01-15 6:07 UTC (permalink / raw)
To: Balbir Singh, KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm@kvack.org
1. task p1 is in /memcg/0
2. p1 does mmap(4096*2, MAP_LOCKED)
3. echo 4096 > /memcg/0/memory.limit_in_bytes
The above 'echo' will never return, unless p1 exited or freed the memory.
The cause is we can't reclaim memory from p1, so the while loop in
mem_cgroup_resize_limit() won't break.
This patch fixes it by decrementing retry_count regardless the return value
of mem_cgroup_hierarchical_reclaim().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
mm/memcontrol.c | 15 ++++-----------
1 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..1995098 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1524,11 +1524,10 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
{
int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
- int progress;
u64 memswlimit;
int ret = 0;
- while (retry_count) {
+ while (retry_count--) {
if (signal_pending(current)) {
ret = -EINTR;
break;
@@ -1551,9 +1550,7 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
if (!ret)
break;
- progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
- false);
- if (!progress) retry_count--;
+ mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false);
}
return ret;
@@ -1563,13 +1560,13 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
unsigned long long val)
{
int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
- u64 memlimit, oldusage, curusage;
+ u64 memlimit;
int ret;
if (!do_swap_account)
return -EINVAL;
- while (retry_count) {
+ while (retry_count--) {
if (signal_pending(current)) {
ret = -EINTR;
break;
@@ -1592,11 +1589,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
if (!ret)
break;
- oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
- curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
- if (curusage >= oldusage)
- retry_count--;
}
return ret;
}
--
1.5.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan @ 2009-01-15 6:15 ` Balbir Singh 2009-01-15 6:31 ` KAMEZAWA Hiroyuki 2009-01-15 6:16 ` Daisuke Nishimura 2009-01-19 8:49 ` KAMEZAWA Hiroyuki 2 siblings, 1 reply; 13+ messages in thread From: Balbir Singh @ 2009-01-15 6:15 UTC (permalink / raw) To: Li Zefan; +Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-mm@kvack.org * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]: > 1. task p1 is in /memcg/0 > 2. p1 does mmap(4096*2, MAP_LOCKED) > 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > The above 'echo' will never return, unless p1 exited or freed the memory. > The cause is we can't reclaim memory from p1, so the while loop in > mem_cgroup_resize_limit() won't break. > > This patch fixes it by decrementing retry_count regardless the return value > of mem_cgroup_hierarchical_reclaim(). > The problem definitely seems to exist, shouldn't we fix reclaim to return 0, so that we know progress is not made and retry count decrements? -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 6:15 ` Balbir Singh @ 2009-01-15 6:31 ` KAMEZAWA Hiroyuki 2009-01-15 7:14 ` Li Zefan 0 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-01-15 6:31 UTC (permalink / raw) To: balbir; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org On Thu, 15 Jan 2009 11:45:57 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]: > > > 1. task p1 is in /memcg/0 > > 2. p1 does mmap(4096*2, MAP_LOCKED) > > 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > > > The above 'echo' will never return, unless p1 exited or freed the memory. > > The cause is we can't reclaim memory from p1, so the while loop in > > mem_cgroup_resize_limit() won't break. > > > > This patch fixes it by decrementing retry_count regardless the return value > > of mem_cgroup_hierarchical_reclaim(). > > > > The problem definitely seems to exist, shouldn't we fix reclaim to > return 0, so that we know progress is not made and retry count > decrements? > The behavior is correct. And we already check signal_pending() in the loop. Ctrl-C or SIGALARM will works better than checking retry count. But adding a new control file, memory.resize_timeout to check timeout is a choice. Second thought is. thanks to Kosaki at el, LRU for locked pages is now visible in memory.stat file. So, we may able to have clever way. == unevictable = mem_cgroup_get_all_zonestat(mem, LRU_UNEVICLABLE); if (newlimit < unevictable) break; == But considering hierarchy, this can be complex. please don't modify current behavior for a while, I'll try to write "hierarchical stat" with CSS_ID patch set's easy hierarchy walk. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 6:31 ` KAMEZAWA Hiroyuki @ 2009-01-15 7:14 ` Li Zefan 2009-01-15 7:21 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: Li Zefan @ 2009-01-15 7:14 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org KAMEZAWA Hiroyuki wrote: > On Thu, 15 Jan 2009 11:45:57 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > >> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]: >> >>> 1. task p1 is in /memcg/0 >>> 2. p1 does mmap(4096*2, MAP_LOCKED) >>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes >>> >>> The above 'echo' will never return, unless p1 exited or freed the memory. >>> The cause is we can't reclaim memory from p1, so the while loop in >>> mem_cgroup_resize_limit() won't break. >>> >>> This patch fixes it by decrementing retry_count regardless the return value >>> of mem_cgroup_hierarchical_reclaim(). >>> >> The problem definitely seems to exist, shouldn't we fix reclaim to >> return 0, so that we know progress is not made and retry count >> decrements? >> > > The behavior is correct. And we already check signal_pending() in the loop. > Ctrl-C or SIGALARM will works better than checking retry count. But this behavior seems like a regression. Please try it in 2.6.28, you'll see it returns EBUSY immediately. Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ? > But adding a new control file, memory.resize_timeout to check timeout is a choice. > > Second thought is. > thanks to Kosaki at el, LRU for locked pages is now visible in memory.stat > file. So, we may able to have clever way. > > == > unevictable = mem_cgroup_get_all_zonestat(mem, LRU_UNEVICLABLE); > if (newlimit < unevictable) > break; > == > But considering hierarchy, this can be complex. > please don't modify current behavior for a while, I'll try to write "hierarchical stat" > with CSS_ID patch set's easy hierarchy walk. > > Thanks, > -Kame > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 7:14 ` Li Zefan @ 2009-01-15 7:21 ` KAMEZAWA Hiroyuki 2009-01-15 7:26 ` Balbir Singh 2009-01-15 7:32 ` Li Zefan 0 siblings, 2 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-01-15 7:21 UTC (permalink / raw) To: Li Zefan; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org On Thu, 15 Jan 2009 15:14:38 +0800 Li Zefan <lizf@cn.fujitsu.com> wrote: > KAMEZAWA Hiroyuki wrote: > > On Thu, 15 Jan 2009 11:45:57 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > >> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]: > >> > >>> 1. task p1 is in /memcg/0 > >>> 2. p1 does mmap(4096*2, MAP_LOCKED) > >>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes > >>> > >>> The above 'echo' will never return, unless p1 exited or freed the memory. > >>> The cause is we can't reclaim memory from p1, so the while loop in > >>> mem_cgroup_resize_limit() won't break. > >>> > >>> This patch fixes it by decrementing retry_count regardless the return value > >>> of mem_cgroup_hierarchical_reclaim(). > >>> > >> The problem definitely seems to exist, shouldn't we fix reclaim to > >> return 0, so that we know progress is not made and retry count > >> decrements? > >> > > > > The behavior is correct. And we already check signal_pending() in the loop. > > Ctrl-C or SIGALARM will works better than checking retry count. > > But this behavior seems like a regression. Please try it in 2.6.28, you'll see > it returns EBUSY immediately. > > Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ? > This is intentional behavior change by == memcg-make-oom-less-frequently.patch == try_to_free_page() returns positive value if try_to_free_page() reclaims at least 1 pages. It itself doesn't seem to be buggy. What buggy is resize_limit's retry-out check code, I think. How about following ? == while (1) { if (signal_pending()) break; try to set limit .... ... ret = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false); total_progress += ret; if (total_progress > (memcg->res.usage - val) * 2) { /* * It seems we reclaimed twice of necessary * pages...this memcg is busy */ ret = -EBUSY; break; } } == Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 7:21 ` KAMEZAWA Hiroyuki @ 2009-01-15 7:26 ` Balbir Singh 2009-01-15 7:32 ` Li Zefan 1 sibling, 0 replies; 13+ messages in thread From: Balbir Singh @ 2009-01-15 7:26 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-15 16:21:26]: > On Thu, 15 Jan 2009 15:14:38 +0800 > Li Zefan <lizf@cn.fujitsu.com> wrote: > > > KAMEZAWA Hiroyuki wrote: > > > On Thu, 15 Jan 2009 11:45:57 +0530 > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > >> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]: > > >> > > >>> 1. task p1 is in /memcg/0 > > >>> 2. p1 does mmap(4096*2, MAP_LOCKED) > > >>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > >>> > > >>> The above 'echo' will never return, unless p1 exited or freed the memory. > > >>> The cause is we can't reclaim memory from p1, so the while loop in > > >>> mem_cgroup_resize_limit() won't break. > > >>> > > >>> This patch fixes it by decrementing retry_count regardless the return value > > >>> of mem_cgroup_hierarchical_reclaim(). > > >>> > > >> The problem definitely seems to exist, shouldn't we fix reclaim to > > >> return 0, so that we know progress is not made and retry count > > >> decrements? > > >> > > > > > > The behavior is correct. And we already check signal_pending() in the loop. > > > Ctrl-C or SIGALARM will works better than checking retry count. > > > > But this behavior seems like a regression. Please try it in 2.6.28, you'll see > > it returns EBUSY immediately. > > > > Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ? > > > > This is intentional behavior change by > == > memcg-make-oom-less-frequently.patch > == > > try_to_free_page() returns positive value if try_to_free_page() reclaims at > least 1 pages. It itself doesn't seem to be buggy. > > What buggy is resize_limit's retry-out check code, I think. > > How about following ? > == > while (1) { > if (signal_pending()) > break; > try to set limit .... > ... > ret = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false); > total_progress += ret; > > if (total_progress > (memcg->res.usage - val) * 2) { > /* > * It seems we reclaimed twice of necessary > * pages...this memcg is busy > */ > ret = -EBUSY; > break; I think we need the nr_retries here as well, we should refuse to resize_limit beyond a certain number of retries. In the case that Li mentioned total_progress will be 0, since we cannot reclaim anything. I prefer a nr_retries based approach for failure. > } > } > == > > Thanks, > -Kame > > > > > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 7:21 ` KAMEZAWA Hiroyuki 2009-01-15 7:26 ` Balbir Singh @ 2009-01-15 7:32 ` Li Zefan 2009-01-15 7:35 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 13+ messages in thread From: Li Zefan @ 2009-01-15 7:32 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org KAMEZAWA Hiroyuki wrote: > On Thu, 15 Jan 2009 15:14:38 +0800 > Li Zefan <lizf@cn.fujitsu.com> wrote: > >> KAMEZAWA Hiroyuki wrote: >>> On Thu, 15 Jan 2009 11:45:57 +0530 >>> Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >>> >>>> * Li Zefan <lizf@cn.fujitsu.com> [2009-01-15 14:07:51]: >>>> >>>>> 1. task p1 is in /memcg/0 >>>>> 2. p1 does mmap(4096*2, MAP_LOCKED) >>>>> 3. echo 4096 > /memcg/0/memory.limit_in_bytes >>>>> >>>>> The above 'echo' will never return, unless p1 exited or freed the memory. >>>>> The cause is we can't reclaim memory from p1, so the while loop in >>>>> mem_cgroup_resize_limit() won't break. >>>>> >>>>> This patch fixes it by decrementing retry_count regardless the return value >>>>> of mem_cgroup_hierarchical_reclaim(). >>>>> >>>> The problem definitely seems to exist, shouldn't we fix reclaim to >>>> return 0, so that we know progress is not made and retry count >>>> decrements? >>>> >>> The behavior is correct. And we already check signal_pending() in the loop. >>> Ctrl-C or SIGALARM will works better than checking retry count. >> But this behavior seems like a regression. Please try it in 2.6.28, you'll see >> it returns EBUSY immediately. >> >> Looks like the return value of mem_cgroup_hierarchical_reclaim() is buggy ? >> > > This is intentional behavior change by > == > memcg-make-oom-less-frequently.patch > == > > try_to_free_page() returns positive value if try_to_free_page() reclaims at > least 1 pages. It itself doesn't seem to be buggy. > > What buggy is resize_limit's retry-out check code, I think. > > How about following ? Not sure. I didn't look into the reclaim code, so I'd rather let you and Balbir decide if this is a bug and (if yes) how to fix it. > == > while (1) { > if (signal_pending()) > break; > try to set limit .... > ... > ret = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false); > total_progress += ret; > > if (total_progress > (memcg->res.usage - val) * 2) { > /* > * It seems we reclaimed twice of necessary > * pages...this memcg is busy > */ > ret = -EBUSY; > break; > } > } > == > > Thanks, > -Kame > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 7:32 ` Li Zefan @ 2009-01-15 7:35 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-01-15 7:35 UTC (permalink / raw) To: Li Zefan; +Cc: balbir, Daisuke Nishimura, linux-mm@kvack.org On Thu, 15 Jan 2009 15:32:19 +0800 Li Zefan <lizf@cn.fujitsu.com> wrote: > KAMEZAWA Hiroyuki wrote: > > try_to_free_page() returns positive value if try_to_free_page() reclaims at > > least 1 pages. It itself doesn't seem to be buggy. > > > > What buggy is resize_limit's retry-out check code, I think. > > > > How about following ? > > Not sure. > > I didn't look into the reclaim code, so I'd rather let you and Balbir decide if > this is a bug and (if yes) how to fix it. > Hmm, I personally think this is not a bug. But *UNEXPECTED* behavior change, yes, it's called regression. To be honest, I never like retry-by-count because there is no trustable logic. Thank you for reporting anyway. I'll consider some workaround. -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan 2009-01-15 6:15 ` Balbir Singh @ 2009-01-15 6:16 ` Daisuke Nishimura 2009-01-15 6:27 ` Li Zefan 2009-01-19 8:49 ` KAMEZAWA Hiroyuki 2 siblings, 1 reply; 13+ messages in thread From: Daisuke Nishimura @ 2009-01-15 6:16 UTC (permalink / raw) To: Li Zefan; +Cc: nishimura, Balbir Singh, KAMEZAWA Hiroyuki, linux-mm@kvack.org On Thu, 15 Jan 2009 14:07:51 +0800, Li Zefan <lizf@cn.fujitsu.com> wrote: > 1. task p1 is in /memcg/0 > 2. p1 does mmap(4096*2, MAP_LOCKED) > 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > The above 'echo' will never return, unless p1 exited or freed the memory. > The cause is we can't reclaim memory from p1, so the while loop in > mem_cgroup_resize_limit() won't break. > But it can be interrupted, right ? I don't think this would be a big problem. Thanks, Daisuke Nishimura. > This patch fixes it by decrementing retry_count regardless the return value > of mem_cgroup_hierarchical_reclaim(). > > Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> > --- > mm/memcontrol.c | 15 ++++----------- > 1 files changed, 4 insertions(+), 11 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index fb62b43..1995098 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1524,11 +1524,10 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, > { > > int retry_count = MEM_CGROUP_RECLAIM_RETRIES; > - int progress; > u64 memswlimit; > int ret = 0; > > - while (retry_count) { > + while (retry_count--) { > if (signal_pending(current)) { > ret = -EINTR; > break; > @@ -1551,9 +1550,7 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, > if (!ret) > break; > > - progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, > - false); > - if (!progress) retry_count--; > + mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, false); > } > > return ret; > @@ -1563,13 +1560,13 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg, > unsigned long long val) > { > int retry_count = MEM_CGROUP_RECLAIM_RETRIES; > - u64 memlimit, oldusage, curusage; > + u64 memlimit; > int ret; > > if (!do_swap_account) > return -EINVAL; > > - while (retry_count) { > + while (retry_count--) { > if (signal_pending(current)) { > ret = -EINTR; > break; > @@ -1592,11 +1589,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg, > if (!ret) > break; > > - oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true); > - curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > - if (curusage >= oldusage) > - retry_count--; > } > return ret; > } > -- > 1.5.4.rc3 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 6:16 ` Daisuke Nishimura @ 2009-01-15 6:27 ` Li Zefan 0 siblings, 0 replies; 13+ messages in thread From: Li Zefan @ 2009-01-15 6:27 UTC (permalink / raw) To: Daisuke Nishimura; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, linux-mm@kvack.org Daisuke Nishimura wrote: > On Thu, 15 Jan 2009 14:07:51 +0800, Li Zefan <lizf@cn.fujitsu.com> wrote: >> 1. task p1 is in /memcg/0 >> 2. p1 does mmap(4096*2, MAP_LOCKED) >> 3. echo 4096 > /memcg/0/memory.limit_in_bytes >> >> The above 'echo' will never return, unless p1 exited or freed the memory. >> The cause is we can't reclaim memory from p1, so the while loop in >> mem_cgroup_resize_limit() won't break. >> > But it can be interrupted, right ? > So we expect users track how long does this operation take, and send some signal to stop it if it takes a long time ? I don't think this is user-friendly.. > I don't think this would be a big problem. > It's a problem, though it may not be a big problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-15 6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan 2009-01-15 6:15 ` Balbir Singh 2009-01-15 6:16 ` Daisuke Nishimura @ 2009-01-19 8:49 ` KAMEZAWA Hiroyuki 2009-01-19 9:57 ` Balbir Singh 2 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-01-19 8:49 UTC (permalink / raw) To: Li Zefan; +Cc: Balbir Singh, Daisuke Nishimura, linux-mm@kvack.org On Thu, 15 Jan 2009 14:07:51 +0800 Li Zefan <lizf@cn.fujitsu.com> wrote: > 1. task p1 is in /memcg/0 > 2. p1 does mmap(4096*2, MAP_LOCKED) > 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > The above 'echo' will never return, unless p1 exited or freed the memory. > The cause is we can't reclaim memory from p1, so the while loop in > mem_cgroup_resize_limit() won't break. > > This patch fixes it by decrementing retry_count regardless the return value > of mem_cgroup_hierarchical_reclaim(). > Maybe a patch like this is necessary. But details are not fixed yet. Any comments are welcome. (This is base on my CSS ID patch set.) -Kame == From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> As Li Zefan pointed out, shrinking memcg's limit should return -EBUSY after reasonable retries. This patch tries to fix the current behavior of shrink_usage. Before looking into "shrink should return -EBUSY" problem, we should fix hierarchical reclaim code. It compares current usage and current limit, but it only makes sense when the kernel reclaims memory because hit limits. This is also a problem. What this patch does are. 1. add new argument "shrink" to hierarchical reclaim. If "shrink==true", hierarchical reclaim returns immediately and the caller checks the kernel should shrink more or not. (At shrinking memory, usage is always smaller than limit. So check for usage < limit is useless.) 2. For adjusting to above change, 2 changes in "shrink"'s retry path. 2-a. retry_count depends on # of children because the kernel visits the children under hierarchy one by one. 2-b. rather than checking return value of hierarchical_reclaim's progress, compares usage-before-shrink and usage-after-shrink. If usage-before-shrink > usage-after-shrink, retry_count is decremented. Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 60 insertions(+), 11 deletions(-) Index: mmotm-2.6.29-Jan16/mm/memcontrol.c =================================================================== --- mmotm-2.6.29-Jan16.orig/mm/memcontrol.c +++ mmotm-2.6.29-Jan16/mm/memcontrol.c @@ -696,6 +696,23 @@ static int mem_cgroup_walk_tree(struct m return ret; } +static int mem_cgroup_count_children_cb(struct mem_cgroup *mem, void *data) +{ + int *val = data; + (*val)++; + return 0; +} + +static int mem_cgroup_count_children(struct mem_cgroup *mem) +{ + int num = 0; + rcu_read_lock(); + mem_cgroup_walk_tree(mem, &num, mem_cgroup_count_children_cb); + rcu_read_unlock(); + + return num; +} + /* * Visit the first child (need not be the first child as per the ordering * of the cgroup list, since we track last_scanned_child) of @mem and use @@ -752,9 +769,12 @@ mem_cgroup_select_victim(struct mem_cgro * We give up and return to the caller when scan_age is increased by 2. This * means try_to_free_mem_cgroup_pages() is called against all children cgroup, * at least once. The caller itself will do further retry if necessary. + * + * If shrink==true, this routine doesn't check usage < limit, and just return + * quickly. It depends on callers whether shrinking is enough or not. */ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, - gfp_t gfp_mask, bool noswap) + gfp_t gfp_mask, bool noswap, bool shrink) { struct mem_cgroup *victim; unsigned long start_age; @@ -782,6 +802,13 @@ static int mem_cgroup_hierarchical_recla ret = try_to_free_mem_cgroup_pages(victim, gfp_mask, noswap, get_swappiness(victim)); css_put(&victim->css); + /* + * At shrinking usage, we can't check we should stop here or + * reclaim more. It's depends on callers. last_scanned_child + * will work enough for doing round-robin. + */ + if (shrink) + return ret; total += ret; if (mem_cgroup_check_under_limit(root_mem)) return 1 + total; @@ -867,7 +894,7 @@ static int __mem_cgroup_try_charge(struc goto nomem; ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, gfp_mask, - noswap); + noswap, false); if (ret) continue; @@ -1488,6 +1515,7 @@ int mem_cgroup_shrink_usage(struct page struct mem_cgroup *mem = NULL; int progress = 0; int retry = MEM_CGROUP_RECLAIM_RETRIES; + int children; if (mem_cgroup_disabled()) return 0; @@ -1499,7 +1527,8 @@ int mem_cgroup_shrink_usage(struct page return 0; do { - progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true); + progress = mem_cgroup_hierarchical_reclaim(mem, + gfp_mask, true, false); progress += mem_cgroup_check_under_limit(mem); } while (!progress && --retry); @@ -1514,11 +1543,21 @@ static DEFINE_MUTEX(set_limit_mutex); static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, unsigned long long val) { - - int retry_count = MEM_CGROUP_RECLAIM_RETRIES; + int retry_count; int progress; u64 memswlimit; int ret = 0; + int children = mem_cgroup_count_children(memcg); + u64 curusage, oldusage, minusage; + + /* + * For keeping hierarchical_reclaim simple, how long we should retry + * is depends on callers. We set our retry-count to be function + * of # of children which we should visit in this loop. + */ + retry_count = MEM_CGROUP_RECLAIM_RETRIES * children; + + oldusage = res_counter_read_u64(&memcg->res, RES_USAGE); while (retry_count) { if (signal_pending(current)) { @@ -1544,8 +1583,13 @@ static int mem_cgroup_resize_limit(struc break; progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, - false); - if (!progress) retry_count--; + false, true); + curusage = res_counter_read_u64(&memcg->res, RES_USAGE); + /* Usage is reduced ? */ + if (curusage >= oldusage) + retry_count--; + else + oldusage = curusage; } return ret; @@ -1554,13 +1598,16 @@ static int mem_cgroup_resize_limit(struc int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg, unsigned long long val) { - int retry_count = MEM_CGROUP_RECLAIM_RETRIES; + int retry_count; u64 memlimit, oldusage, curusage; + int children = mem_cgroup_count_children(memcg); int ret; if (!do_swap_account) return -EINVAL; - + /* see mem_cgroup_resize_res_limit */ + retry_count = children * MEM_CGROUP_RECLAIM_RETRIES; + oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); while (retry_count) { if (signal_pending(current)) { ret = -EINTR; @@ -1584,11 +1631,13 @@ int mem_cgroup_resize_memsw_limit(struct if (!ret) break; - oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); - mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true); + mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true); curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); + /* Usage is reduced ? */ if (curusage >= oldusage) retry_count--; + else + oldusage = curusage; } return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-19 8:49 ` KAMEZAWA Hiroyuki @ 2009-01-19 9:57 ` Balbir Singh 2009-01-19 10:07 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: Balbir Singh @ 2009-01-19 9:57 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-19 17:49:22]: > On Thu, 15 Jan 2009 14:07:51 +0800 > Li Zefan <lizf@cn.fujitsu.com> wrote: > > > 1. task p1 is in /memcg/0 > > 2. p1 does mmap(4096*2, MAP_LOCKED) > > 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > > > The above 'echo' will never return, unless p1 exited or freed the memory. > > The cause is we can't reclaim memory from p1, so the while loop in > > mem_cgroup_resize_limit() won't break. > > > > This patch fixes it by decrementing retry_count regardless the return value > > of mem_cgroup_hierarchical_reclaim(). > > > > Maybe a patch like this is necessary. But details are not fixed yet. > Any comments are welcome. > > (This is base on my CSS ID patch set.) > > -Kame > == > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > As Li Zefan pointed out, shrinking memcg's limit should return -EBUSY > after reasonable retries. This patch tries to fix the current behavior > of shrink_usage. > > Before looking into "shrink should return -EBUSY" problem, we should fix > hierarchical reclaim code. It compares current usage and current limit, > but it only makes sense when the kernel reclaims memory because hit limits. > This is also a problem. > > What this patch does are. > > 1. add new argument "shrink" to hierarchical reclaim. If "shrink==true", > hierarchical reclaim returns immediately and the caller checks the kernel > should shrink more or not. > (At shrinking memory, usage is always smaller than limit. So check for > usage < limit is useless.) > > 2. For adjusting to above change, 2 changes in "shrink"'s retry path. > 2-a. retry_count depends on # of children because the kernel visits > the children under hierarchy one by one. > 2-b. rather than checking return value of hierarchical_reclaim's progress, > compares usage-before-shrink and usage-after-shrink. > If usage-before-shrink > usage-after-shrink, retry_count is > decremented. The code seems to do the reverse, it checks for if (currusage >= oldusage) > > Reported-by: Li Zefan <lizf@cn.fujitsu.com> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/memcontrol.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 60 insertions(+), 11 deletions(-) > > Index: mmotm-2.6.29-Jan16/mm/memcontrol.c > =================================================================== > --- mmotm-2.6.29-Jan16.orig/mm/memcontrol.c > +++ mmotm-2.6.29-Jan16/mm/memcontrol.c > @@ -696,6 +696,23 @@ static int mem_cgroup_walk_tree(struct m > return ret; > } > > +static int mem_cgroup_count_children_cb(struct mem_cgroup *mem, void *data) > +{ > + int *val = data; > + (*val)++; > + return 0; > +} > + > +static int mem_cgroup_count_children(struct mem_cgroup *mem) > +{ > + int num = 0; > + rcu_read_lock(); > + mem_cgroup_walk_tree(mem, &num, mem_cgroup_count_children_cb); > + rcu_read_unlock(); > + > + return num; > +} > + > /* > * Visit the first child (need not be the first child as per the ordering > * of the cgroup list, since we track last_scanned_child) of @mem and use > @@ -752,9 +769,12 @@ mem_cgroup_select_victim(struct mem_cgro > * We give up and return to the caller when scan_age is increased by 2. This > * means try_to_free_mem_cgroup_pages() is called against all children cgroup, > * at least once. The caller itself will do further retry if necessary. > + * > + * If shrink==true, this routine doesn't check usage < limit, and just return > + * quickly. It depends on callers whether shrinking is enough or not. > */ > static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, > - gfp_t gfp_mask, bool noswap) > + gfp_t gfp_mask, bool noswap, bool shrink) > { > struct mem_cgroup *victim; > unsigned long start_age; > @@ -782,6 +802,13 @@ static int mem_cgroup_hierarchical_recla > ret = try_to_free_mem_cgroup_pages(victim, gfp_mask, noswap, > get_swappiness(victim)); > css_put(&victim->css); > + /* > + * At shrinking usage, we can't check we should stop here or > + * reclaim more. It's depends on callers. last_scanned_child > + * will work enough for doing round-robin. > + */ > + if (shrink) > + return ret; > total += ret; > if (mem_cgroup_check_under_limit(root_mem)) > return 1 + total; > @@ -867,7 +894,7 @@ static int __mem_cgroup_try_charge(struc > goto nomem; > > ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, gfp_mask, > - noswap); > + noswap, false); > if (ret) > continue; > > @@ -1488,6 +1515,7 @@ int mem_cgroup_shrink_usage(struct page > struct mem_cgroup *mem = NULL; > int progress = 0; > int retry = MEM_CGROUP_RECLAIM_RETRIES; > + int children; > > if (mem_cgroup_disabled()) > return 0; > @@ -1499,7 +1527,8 @@ int mem_cgroup_shrink_usage(struct page > return 0; > > do { > - progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true); > + progress = mem_cgroup_hierarchical_reclaim(mem, > + gfp_mask, true, false); > progress += mem_cgroup_check_under_limit(mem); > } while (!progress && --retry); > > @@ -1514,11 +1543,21 @@ static DEFINE_MUTEX(set_limit_mutex); > static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, > unsigned long long val) > { > - > - int retry_count = MEM_CGROUP_RECLAIM_RETRIES; > + int retry_count; > int progress; > u64 memswlimit; > int ret = 0; > + int children = mem_cgroup_count_children(memcg); > + u64 curusage, oldusage, minusage; > + > + /* > + * For keeping hierarchical_reclaim simple, how long we should retry > + * is depends on callers. We set our retry-count to be function > + * of # of children which we should visit in this loop. > + */ > + retry_count = MEM_CGROUP_RECLAIM_RETRIES * children; > + > + oldusage = res_counter_read_u64(&memcg->res, RES_USAGE); > > while (retry_count) { > if (signal_pending(current)) { > @@ -1544,8 +1583,13 @@ static int mem_cgroup_resize_limit(struc > break; > > progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, > - false); > - if (!progress) retry_count--; > + false, true); > + curusage = res_counter_read_u64(&memcg->res, RES_USAGE); > + /* Usage is reduced ? */ > + if (curusage >= oldusage) > + retry_count--; > + else > + oldusage = curusage; > } > > return ret; > @@ -1554,13 +1598,16 @@ static int mem_cgroup_resize_limit(struc > int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg, > unsigned long long val) > { > - int retry_count = MEM_CGROUP_RECLAIM_RETRIES; > + int retry_count; > u64 memlimit, oldusage, curusage; > + int children = mem_cgroup_count_children(memcg); > int ret; > > if (!do_swap_account) > return -EINVAL; > - > + /* see mem_cgroup_resize_res_limit */ > + retry_count = children * MEM_CGROUP_RECLAIM_RETRIES; > + oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > while (retry_count) { > if (signal_pending(current)) { > ret = -EINTR; > @@ -1584,11 +1631,13 @@ int mem_cgroup_resize_memsw_limit(struct > if (!ret) > break; > > - oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > - mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true); > + mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true); > curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > + /* Usage is reduced ? */ > if (curusage >= oldusage) > retry_count--; > + else > + oldusage = curusage; > } > return ret; > } Has this been tested? It seems OK to the naked eye :) -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] [PATCH] memcg: fix infinite loop 2009-01-19 9:57 ` Balbir Singh @ 2009-01-19 10:07 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-01-19 10:07 UTC (permalink / raw) To: balbir; +Cc: Li Zefan, Daisuke Nishimura, linux-mm@kvack.org On Mon, 19 Jan 2009 15:27:38 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-19 17:49:22]: > > > On Thu, 15 Jan 2009 14:07:51 +0800 > > Li Zefan <lizf@cn.fujitsu.com> wrote: > > > > > 1. task p1 is in /memcg/0 > > > 2. p1 does mmap(4096*2, MAP_LOCKED) > > > 3. echo 4096 > /memcg/0/memory.limit_in_bytes > > > > > > The above 'echo' will never return, unless p1 exited or freed the memory. > > > The cause is we can't reclaim memory from p1, so the while loop in > > > mem_cgroup_resize_limit() won't break. > > > > > > This patch fixes it by decrementing retry_count regardless the return value > > > of mem_cgroup_hierarchical_reclaim(). > > > > > > > Maybe a patch like this is necessary. But details are not fixed yet. > > Any comments are welcome. > > > > (This is base on my CSS ID patch set.) > > > > -Kame > > == > > > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > > As Li Zefan pointed out, shrinking memcg's limit should return -EBUSY > > after reasonable retries. This patch tries to fix the current behavior > > of shrink_usage. > > > > Before looking into "shrink should return -EBUSY" problem, we should fix > > hierarchical reclaim code. It compares current usage and current limit, > > but it only makes sense when the kernel reclaims memory because hit limits. > > This is also a problem. > > > > What this patch does are. > > > > 1. add new argument "shrink" to hierarchical reclaim. If "shrink==true", > > hierarchical reclaim returns immediately and the caller checks the kernel > > should shrink more or not. > > (At shrinking memory, usage is always smaller than limit. So check for > > usage < limit is useless.) > > > > 2. For adjusting to above change, 2 changes in "shrink"'s retry path. > > 2-a. retry_count depends on # of children because the kernel visits > > the children under hierarchy one by one. > > 2-b. rather than checking return value of hierarchical_reclaim's progress, > > compares usage-before-shrink and usage-after-shrink. > > If usage-before-shrink > usage-after-shrink, retry_count is > > decremented. > > The code seems to do the reverse, it checks for > if (currusage >= oldusage) > Ah, the text is wrong ;( > > - oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > > - mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true); > > + mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true, true); > > curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE); > > + /* Usage is reduced ? */ > > if (curusage >= oldusage) > > retry_count--; > > + else > > + oldusage = curusage; > > } > > return ret; > > } > > Has this been tested? It seems OK to the naked eye :) > Thank you, and yes, tested. I'll try to make this patch simpler and queue on my stack. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-01-19 10:08 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-15 6:07 [RFC] [PATCH] memcg: fix infinite loop Li Zefan 2009-01-15 6:15 ` Balbir Singh 2009-01-15 6:31 ` KAMEZAWA Hiroyuki 2009-01-15 7:14 ` Li Zefan 2009-01-15 7:21 ` KAMEZAWA Hiroyuki 2009-01-15 7:26 ` Balbir Singh 2009-01-15 7:32 ` Li Zefan 2009-01-15 7:35 ` KAMEZAWA Hiroyuki 2009-01-15 6:16 ` Daisuke Nishimura 2009-01-15 6:27 ` Li Zefan 2009-01-19 8:49 ` KAMEZAWA Hiroyuki 2009-01-19 9:57 ` Balbir Singh 2009-01-19 10:07 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).