* [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup @ 2010-02-05 0:39 KAMEZAWA Hiroyuki 2010-02-05 0:57 ` David Rientjes 2010-02-05 16:30 ` Minchan Kim 0 siblings, 2 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-05 0:39 UTC (permalink / raw) To: linux-mm@kvack.org Cc: balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes Please take this patch in different context with recent discussion. This is a quick-fix for a terrible bug. This patch itself is against mmotm but can be easily applied to mainline or stable tree, I think. (But I don't CC stable tree until I get ack.) == Now, oom-killer kills process's chidlren at first. But this means a child in other cgroup can be killed. But it's not checked now. This patch fixes that. CC: Balbir Singh <balbir@linux.vnet.ibm.com> CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/oom_kill.c | 3 +++ 1 file changed, 3 insertions(+) Index: mmotm-2.6.33-Feb03/mm/oom_kill.c =================================================================== --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c +++ mmotm-2.6.33-Feb03/mm/oom_kill.c @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ list_for_each_entry(c, &p->children, sibling) { if (c->mm == p->mm) continue; + /* Children may be in other cgroup */ + if (mem && !task_in_mem_cgroup(c, mem)) + continue; if (!oom_kill_task(c)) return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-05 0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki @ 2010-02-05 0:57 ` David Rientjes 2010-02-05 16:30 ` Minchan Kim 1 sibling, 0 replies; 19+ messages in thread From: David Rientjes @ 2010-02-05 0:57 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, nishimura, Andrew Morton On Fri, 5 Feb 2010, KAMEZAWA Hiroyuki wrote: > Now, oom-killer kills process's chidlren at first. But this means > a child in other cgroup can be killed. But it's not checked now. > > This patch fixes that. > > CC: Balbir Singh <balbir@linux.vnet.ibm.com> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> > --- > mm/oom_kill.c | 3 +++ > 1 file changed, 3 insertions(+) > > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c > =================================================================== > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ > list_for_each_entry(c, &p->children, sibling) { > if (c->mm == p->mm) > continue; > + /* Children may be in other cgroup */ > + if (mem && !task_in_mem_cgroup(c, mem)) > + continue; > if (!oom_kill_task(c)) > return 0; > } > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-05 0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki 2010-02-05 0:57 ` David Rientjes @ 2010-02-05 16:30 ` Minchan Kim 2010-02-09 0:32 ` KAMEZAWA Hiroyuki 2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki 1 sibling, 2 replies; 19+ messages in thread From: Minchan Kim @ 2010-02-05 16:30 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes Hi, Kame. On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > Please take this patch in different context with recent discussion. > This is a quick-fix for a terrible bug. > > This patch itself is against mmotm but can be easily applied to mainline or > stable tree, I think. (But I don't CC stable tree until I get ack.) > > == > Now, oom-killer kills process's chidlren at first. But this means > a child in other cgroup can be killed. But it's not checked now. > > This patch fixes that. > > CC: Balbir Singh <balbir@linux.vnet.ibm.com> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/oom_kill.c | 3 +++ > 1 file changed, 3 insertions(+) > > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c > =================================================================== > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ > list_for_each_entry(c, &p->children, sibling) { > if (c->mm == p->mm) > continue; > + /* Children may be in other cgroup */ > + if (mem && !task_in_mem_cgroup(c, mem)) > + continue; > if (!oom_kill_task(c)) > return 0; > } > > -- I am worried about latency of OOM at worst case. I mean that task_in_mem_cgroup calls task_lock of child. We have used task_lock in many place. Some place task_lock hold and then other locks. For example, exit_fs held task_lock and try to hold write_lock of fs->lock. If child already hold task_lock and wait to write_lock of fs->lock, OOM latency is dependent of fs->lock. I am not sure how many usecase is also dependent of other locks. If it is not as is, we can't make sure in future. So How about try_task_in_mem_cgroup? If we can't hold task_lock, let's continue next child. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-05 16:30 ` Minchan Kim @ 2010-02-09 0:32 ` KAMEZAWA Hiroyuki 2010-02-09 0:56 ` KAMEZAWA Hiroyuki 2010-02-09 1:24 ` Minchan Kim 2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki 1 sibling, 2 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 0:32 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes On Sat, 6 Feb 2010 01:30:49 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > Hi, Kame. > > On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > Please take this patch in different context with recent discussion. > > This is a quick-fix for a terrible bug. > > > > This patch itself is against mmotm but can be easily applied to mainline or > > stable tree, I think. (But I don't CC stable tree until I get ack.) > > > > == > > Now, oom-killer kills process's chidlren at first. But this means > > a child in other cgroup can be killed. But it's not checked now. > > > > This patch fixes that. > > > > CC: Balbir Singh <balbir@linux.vnet.ibm.com> > > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > --- > > A mm/oom_kill.c | A A 3 +++ > > A 1 file changed, 3 insertions(+) > > > > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c > > =================================================================== > > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c > > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c > > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ > > A A A A list_for_each_entry(c, &p->children, sibling) { > > A A A A A A A A if (c->mm == p->mm) > > A A A A A A A A A A A A continue; > > + A A A A A A A /* Children may be in other cgroup */ > > + A A A A A A A if (mem && !task_in_mem_cgroup(c, mem)) > > + A A A A A A A A A A A continue; > > A A A A A A A A if (!oom_kill_task(c)) > > A A A A A A A A A A A A return 0; > > A A A A } > > > > -- > > I am worried about latency of OOM at worst case. > I mean that task_in_mem_cgroup calls task_lock of child. > We have used task_lock in many place. > Some place task_lock hold and then other locks. > For example, exit_fs held task_lock and try to hold write_lock of fs->lock. > If child already hold task_lock and wait to write_lock of fs->lock, OOM latency > is dependent of fs->lock. > > I am not sure how many usecase is also dependent of other locks. > If it is not as is, we can't make sure in future. > > So How about try_task_in_mem_cgroup? > If we can't hold task_lock, let's continue next child. > It's recommended not to use trylock in unclear case. Then, I think possible replacement will be not-to-use any lock in task_in_mem_cgroup. In my short consideration, I don't think task_lock is necessary if we can add some tricks and memory barrier. Please let this patch to go as it is because this is an obvious bug fix and give me time. Now, I think of following. This makes use of the fact mm->owner is changed only at _exit() of the owner. If there is a race with _exit() and mm->owner is racy, the oom selection itself was racy and bad. == int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem) { struct mm_struct *mm; struct task_struct *tsk; int ret = 0; mm = tsk->mm; if (!mm) return ret; /* * we are not interested in tasks other than owner. mm->owner is * updated when the owner task exits. If the owner is exiting now * (and race with us), we may miss. */ if (rcu_dereference(mm->owner) != tsk) return ret; rcu_read_lock(); /* while this task is alive, this task is the owner */ if (mem == mem_cgroup_from_task(tsk)) ret = 1; rcu_read_unlock(); return ret; } == Hmm, it seems no memory barrier is necessary. Does anyone has another idea ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 0:32 ` KAMEZAWA Hiroyuki @ 2010-02-09 0:56 ` KAMEZAWA Hiroyuki 2010-02-09 1:24 ` Minchan Kim 1 sibling, 0 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 0:56 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, linux-mm@kvack.org, balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes On Tue, 9 Feb 2010 09:32:46 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Sat, 6 Feb 2010 01:30:49 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > I am not sure how many usecase is also dependent of other locks. > > If it is not as is, we can't make sure in future. > > > > So How about try_task_in_mem_cgroup? > > If we can't hold task_lock, let's continue next child. > > > It's recommended not to use trylock in unclear case. > > Then, I think possible replacement will be not-to-use any lock in > task_in_mem_cgroup. In my short consideration, I don't think task_lock > is necessary if we can add some tricks and memory barrier. > > Please let this patch to go as it is because this is an obvious bug fix > and give me time. > I'll try some today. please wait. (but I wonder the patch will be not good for stable tree.) Thanks, -Kame > Now, I think of following. > This makes use of the fact mm->owner is changed only at _exit() of the owner. > If there is a race with _exit() and mm->owner is racy, the oom selection > itself was racy and bad. > == > int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem) > { > struct mm_struct *mm; > struct task_struct *tsk; > int ret = 0; > > mm = tsk->mm; > if (!mm) > return ret; > /* > * we are not interested in tasks other than owner. mm->owner is > * updated when the owner task exits. If the owner is exiting now > * (and race with us), we may miss. > */ > if (rcu_dereference(mm->owner) != tsk) > return ret; > rcu_read_lock(); > /* while this task is alive, this task is the owner */ > if (mem == mem_cgroup_from_task(tsk)) > ret = 1; > rcu_read_unlock(); > return ret; > } > == > Hmm, it seems no memory barrier is necessary. > > Does anyone has another idea ? > > Thanks, > -Kame > > > > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 0:32 ` KAMEZAWA Hiroyuki 2010-02-09 0:56 ` KAMEZAWA Hiroyuki @ 2010-02-09 1:24 ` Minchan Kim 2010-02-09 1:34 ` KAMEZAWA Hiroyuki 2010-02-09 6:49 ` David Rientjes 1 sibling, 2 replies; 19+ messages in thread From: Minchan Kim @ 2010-02-09 1:24 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes On Tue, Feb 9, 2010 at 9:32 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Sat, 6 Feb 2010 01:30:49 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > >> Hi, Kame. >> >> On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki >> <kamezawa.hiroyu@jp.fujitsu.com> wrote: >> > Please take this patch in different context with recent discussion. >> > This is a quick-fix for a terrible bug. >> > >> > This patch itself is against mmotm but can be easily applied to mainline or >> > stable tree, I think. (But I don't CC stable tree until I get ack.) >> > >> > == >> > Now, oom-killer kills process's chidlren at first. But this means >> > a child in other cgroup can be killed. But it's not checked now. >> > >> > This patch fixes that. >> > >> > CC: Balbir Singh <balbir@linux.vnet.ibm.com> >> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> >> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >> > --- >> > mm/oom_kill.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c >> > =================================================================== >> > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c >> > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c >> > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ >> > list_for_each_entry(c, &p->children, sibling) { >> > if (c->mm == p->mm) >> > continue; >> > + /* Children may be in other cgroup */ >> > + if (mem && !task_in_mem_cgroup(c, mem)) >> > + continue; >> > if (!oom_kill_task(c)) >> > return 0; >> > } >> > >> > -- >> >> I am worried about latency of OOM at worst case. >> I mean that task_in_mem_cgroup calls task_lock of child. >> We have used task_lock in many place. >> Some place task_lock hold and then other locks. >> For example, exit_fs held task_lock and try to hold write_lock of fs->lock. >> If child already hold task_lock and wait to write_lock of fs->lock, OOM latency >> is dependent of fs->lock. >> >> I am not sure how many usecase is also dependent of other locks. >> If it is not as is, we can't make sure in future. >> >> So How about try_task_in_mem_cgroup? >> If we can't hold task_lock, let's continue next child. >> > It's recommended not to use trylock in unclear case. > > Then, I think possible replacement will be not-to-use any lock in > task_in_mem_cgroup. In my short consideration, I don't think task_lock > is necessary if we can add some tricks and memory barrier. > > Please let this patch to go as it is because this is an obvious bug fix > and give me time. I think it's not only a latency problem of OOM but it is also a problem of deadlock. We can't expect child's lock state in oom_kill_process. So if you can remove lock like below your suggestion, I am OKAY. > > Now, I think of following. > This makes use of the fact mm->owner is changed only at _exit() of the owner. > If there is a race with _exit() and mm->owner is racy, the oom selection > itself was racy and bad. It seems to make sense to me. > == > int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem) > { > struct mm_struct *mm; > struct task_struct *tsk; > int ret = 0; > > mm = tsk->mm; > if (!mm) > return ret; > /* > * we are not interested in tasks other than owner. mm->owner is > * updated when the owner task exits. If the owner is exiting now > * (and race with us), we may miss. > */ > if (rcu_dereference(mm->owner) != tsk) > return ret; Yes. In this case, OOM killer can wait a few seconds until this task is exited. If we don't do that, we could kill other innocent task. > rcu_read_lock(); > /* while this task is alive, this task is the owner */ > if (mem == mem_cgroup_from_task(tsk)) > ret = 1; > rcu_read_unlock(); > return ret; > } > == > Hmm, it seems no memory barrier is necessary. > > Does anyone has another idea ? > > Thanks, > -Kame > > > > > > > > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 1:24 ` Minchan Kim @ 2010-02-09 1:34 ` KAMEZAWA Hiroyuki 2010-02-09 6:49 ` David Rientjes 1 sibling, 0 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 1:34 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes On Tue, 9 Feb 2010 10:24:45 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > On Tue, Feb 9, 2010 at 9:32 AM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > On Sat, 6 Feb 2010 01:30:49 +0900 > > Minchan Kim <minchan.kim@gmail.com> wrote: > > > >> Hi, Kame. > >> > >> On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki > >> <kamezawa.hiroyu@jp.fujitsu.com> wrote: > >> > Please take this patch in different context with recent discussion. > >> > This is a quick-fix for a terrible bug. > >> > > >> > This patch itself is against mmotm but can be easily applied to mainline or > >> > stable tree, I think. (But I don't CC stable tree until I get ack.) > >> > > >> > == > >> > Now, oom-killer kills process's chidlren at first. But this means > >> > a child in other cgroup can be killed. But it's not checked now. > >> > > >> > This patch fixes that. > >> > > >> > CC: Balbir Singh <balbir@linux.vnet.ibm.com> > >> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > >> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > >> > --- > >> > A mm/oom_kill.c | A A 3 +++ > >> > A 1 file changed, 3 insertions(+) > >> > > >> > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c > >> > =================================================================== > >> > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c > >> > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c > >> > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ > >> > A A A A list_for_each_entry(c, &p->children, sibling) { > >> > A A A A A A A A if (c->mm == p->mm) > >> > A A A A A A A A A A A A continue; > >> > + A A A A A A A /* Children may be in other cgroup */ > >> > + A A A A A A A if (mem && !task_in_mem_cgroup(c, mem)) > >> > + A A A A A A A A A A A continue; > >> > A A A A A A A A if (!oom_kill_task(c)) > >> > A A A A A A A A A A A A return 0; > >> > A A A A } > >> > > >> > -- > >> > >> I am worried about latency of OOM at worst case. > >> I mean that task_in_mem_cgroup calls task_lock of child. > >> We have used task_lock in many place. > >> Some place task_lock hold and then other locks. > >> For example, exit_fs held task_lock and try to hold write_lock of fs->lock. > >> If child already hold task_lock and wait to write_lock of fs->lock, OOM latency > >> is dependent of fs->lock. > >> > >> I am not sure how many usecase is also dependent of other locks. > >> If it is not as is, we can't make sure in future. > >> > >> So How about try_task_in_mem_cgroup? > >> If we can't hold task_lock, let's continue next child. > >> > > It's recommended not to use trylock in unclear case. > > > > Then, I think possible replacement will be not-to-use any lock in > > task_in_mem_cgroup. In my short consideration, I don't think task_lock > > is necessary if we can add some tricks and memory barrier. > > > > Please let this patch to go as it is because this is an obvious bug fix > > and give me time. > > I think it's not only a latency problem of OOM but it is also a > problem of deadlock. > We can't expect child's lock state in oom_kill_process. > yes. > So if you can remove lock like below your suggestion, I am OKAY. > I'll try. I don't like both mm->owner and children-kills and now they annoy me ;) For mm, I'll prepare lockless version. For stable tree, I'll prepare trylock version. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 1:24 ` Minchan Kim 2010-02-09 1:34 ` KAMEZAWA Hiroyuki @ 2010-02-09 6:49 ` David Rientjes 2010-02-09 7:08 ` KAMEZAWA Hiroyuki 2010-02-09 9:40 ` Minchan Kim 1 sibling, 2 replies; 19+ messages in thread From: David Rientjes @ 2010-02-09 6:49 UTC (permalink / raw) To: Minchan Kim Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, Balbir Singh, nishimura@mxp.nes.nec.co.jp, Andrew Morton On Tue, 9 Feb 2010, Minchan Kim wrote: > I think it's not only a latency problem of OOM but it is also a > problem of deadlock. > We can't expect child's lock state in oom_kill_process. > task_lock() is a spinlock, it shouldn't be held for any significant length of time and certainly not during a memory allocation which would be the only way we'd block in such a state during the oom killer; if that exists, we'd deadlock when it was chosen for kill in __oom_kill_task() anyway, which negates your point about oom_kill_process() and while scanning for tasks to kill and calling badness(). We don't have any special handling for GFP_ATOMIC allocations in the oom killer for locks being held while allocating anyway, the only thing we need to be concerned about is a writelock on tasklist_lock, but the oom killer only requires a readlock. You'd be correct if we help write_lock_irq(&tasklist_lock). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 6:49 ` David Rientjes @ 2010-02-09 7:08 ` KAMEZAWA Hiroyuki 2010-02-09 9:40 ` Minchan Kim 1 sibling, 0 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 7:08 UTC (permalink / raw) To: David Rientjes Cc: Minchan Kim, linux-mm@kvack.org, Balbir Singh, nishimura@mxp.nes.nec.co.jp, Andrew Morton On Mon, 8 Feb 2010 22:49:09 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > On Tue, 9 Feb 2010, Minchan Kim wrote: > > > I think it's not only a latency problem of OOM but it is also a > > problem of deadlock. > > We can't expect child's lock state in oom_kill_process. > > > > task_lock() is a spinlock, it shouldn't be held for any significant length > of time and certainly not during a memory allocation which would be the > only way we'd block in such a state during the oom killer; if that exists, > we'd deadlock when it was chosen for kill in __oom_kill_task() anyway, > which negates your point about oom_kill_process() and while scanning for > tasks to kill and calling badness(). We don't have any special handling > for GFP_ATOMIC allocations in the oom killer for locks being held while > allocating anyway, the only thing we need to be concerned about is a > writelock on tasklist_lock, but the oom killer only requires a readlock. > You'd be correct if we help write_lock_irq(&tasklist_lock). > Hmm, but it's not necessary to hold task_lock, anyway. Is this patch's logic itself ok if I rewrite the rescription/comments ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 6:49 ` David Rientjes 2010-02-09 7:08 ` KAMEZAWA Hiroyuki @ 2010-02-09 9:40 ` Minchan Kim 2010-02-09 9:55 ` David Rientjes 1 sibling, 1 reply; 19+ messages in thread From: Minchan Kim @ 2010-02-09 9:40 UTC (permalink / raw) To: David Rientjes Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, Balbir Singh, nishimura@mxp.nes.nec.co.jp, Andrew Morton Hi, David. On Tue, Feb 9, 2010 at 3:49 PM, David Rientjes <rientjes@google.com> wrote: > On Tue, 9 Feb 2010, Minchan Kim wrote: > >> I think it's not only a latency problem of OOM but it is also a >> problem of deadlock. >> We can't expect child's lock state in oom_kill_process. >> > > task_lock() is a spinlock, it shouldn't be held for any significant length > of time and certainly not during a memory allocation which would be the > only way we'd block in such a state during the oom killer; if that exists, > we'd deadlock when it was chosen for kill in __oom_kill_task() anyway, > which negates your point about oom_kill_process() and while scanning for > tasks to kill and calling badness(). We don't have any special handling > for GFP_ATOMIC allocations in the oom killer for locks being held while > allocating anyway, the only thing we need to be concerned about is a > writelock on tasklist_lock, but the oom killer only requires a readlock. > You'd be correct if we help write_lock_irq(&tasklist_lock). My point was following as. We try to kill child of OOMed task at first. But we can't know any locked state of child when OOM happens. It means at this point child is able to be holding any lock. So if we can try to hold task_lock of child, it could make new lock dependency between task_lock and other locks. Although there isn't such lock dependency now, I though it's not good. Please correct me if I was wrong. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 9:40 ` Minchan Kim @ 2010-02-09 9:55 ` David Rientjes 2010-02-09 10:18 ` Minchan Kim 0 siblings, 1 reply; 19+ messages in thread From: David Rientjes @ 2010-02-09 9:55 UTC (permalink / raw) To: Minchan Kim Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, Balbir Singh, nishimura@mxp.nes.nec.co.jp, Andrew Morton On Tue, 9 Feb 2010, Minchan Kim wrote: > My point was following as. > We try to kill child of OOMed task at first. > But we can't know any locked state of child when OOM happens. We don't need to, child->alloc_lock can be contended in which case we'll just spin but it won't stay locked because we're out of memory. In other words, nothing takes task_lock(child) and then waits for memory to become available while holding it, that would be fundamentally broken. So there is a dependency here and that is that task_lock(current) can't be taken in the page allocator because we'll deadlock in the oom killer, but that isn't anything new. > It means at this point child is able to be holding any lock. > So if we can try to hold task_lock of child, it could make new lock > dependency between task_lock and other locks. > The children aren't any special class of processes in this case, we always take task_lock() for them during the tasklist scan. In fact, we can take task_lock(p) for the same process p three times during the course of an oom kill: once to dump its statistics when /proc/pid/oom_dump_tasks is enabled, once to calculate its badness() score, and once to kill it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup 2010-02-09 9:55 ` David Rientjes @ 2010-02-09 10:18 ` Minchan Kim 0 siblings, 0 replies; 19+ messages in thread From: Minchan Kim @ 2010-02-09 10:18 UTC (permalink / raw) To: David Rientjes Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, Balbir Singh, nishimura@mxp.nes.nec.co.jp, Andrew Morton On Tue, Feb 9, 2010 at 6:55 PM, David Rientjes <rientjes@google.com> wrote: > On Tue, 9 Feb 2010, Minchan Kim wrote: > >> My point was following as. >> We try to kill child of OOMed task at first. >> But we can't know any locked state of child when OOM happens. > > We don't need to, child->alloc_lock can be contended in which case we'll > just spin but it won't stay locked because we're out of memory. In other > words, nothing takes task_lock(child) and then waits for memory to become > available while holding it, that would be fundamentally broken. So there > is a dependency here and that is that task_lock(current) can't be taken in > the page allocator because we'll deadlock in the oom killer, but that > isn't anything new. I understand it so I don't oppose Kame's original patch from now on. :) Thanks for kind explanation. David. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-05 16:30 ` Minchan Kim 2010-02-09 0:32 ` KAMEZAWA Hiroyuki @ 2010-02-09 3:02 ` KAMEZAWA Hiroyuki 2010-02-09 7:50 ` David Rientjes 2010-02-09 9:27 ` Balbir Singh 1 sibling, 2 replies; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 3:02 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes How about this ? Passed simple oom-kill test on mmotom-Feb06 == Now, oom-killer kills process's chidlren at first. But this means a child in other cgroup can be killed. But it's not checked now. This patch fixes that. It's pointed out that task_lock in task_in_mem_cgroup is bad at killing a task in oom-killer. It can cause siginificant delay or deadlock. For removing unnecessary task_lock under oom-killer, we use use some loose way. Considering oom-killer and task-walk in the tasklist, checking "task is in mem_cgroup" itself includes some race and we don't have to do strict check, here. (IOW, we can't do it.) Changelog: 2009/02/09 - modified task_in_mem_cgroup to be lockless. CC: Minchan Kim <minchan.kim@gmail.com> CC: David Rientjes <rientjes@google.com> CC: Balbir Singh <balbir@linux.vnet.ibm.com> CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/memcontrol.h | 5 +++-- mm/memcontrol.c | 32 ++++++++++++++++++++++++++++---- mm/oom_kill.c | 6 ++++-- 3 files changed, 35 insertions(+), 8 deletions(-) Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h =================================================================== --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_ struct mem_cgroup *mem_cont, int active, int file); extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask); -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); +int task_in_oom_mem_cgroup(struct task_struct *task, + const struct mem_cgroup *mem); extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct return 1; } -static inline int task_in_mem_cgroup(struct task_struct *task, +static inline int task_in_oom_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) { return 1; Index: mmotm-2.6.33-Feb06/mm/memcontrol.c =================================================================== --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c +++ mmotm-2.6.33-Feb06/mm/memcontrol.c @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page * mem_cgroup_add_lru_list(page, to); } -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) +/* + * This function is called from OOM Killer. This checks the task is mm_owner + * and checks it's mem_cgroup is under oom. + */ +int task_in_oom_mem_cgroup(struct task_struct *task, + const struct mem_cgroup *mem) { + struct mm_struct *mm; int ret; struct mem_cgroup *curr = NULL; - task_lock(task); + /* + * The task's task->mm pointer is guarded by task_lock() but it's + * risky to take task_lock in oom kill situaion. Oom-killer may + * kill a task which is in unknown status and cause siginificant delay + * or deadlock. + * So, we use some loose way. Because we're under taslist lock, "task" + * pointer is always safe and we can access it. So, accessing mem_cgroup + * via task struct is safe. To check the task is mm owner, we do loose + * check. And this is enough. + * There is small race at updating mm->onwer but we can ignore it. + * A problematic race here means that oom-selection logic by walking + * task list itself is racy. We can't make any strict guarantee between + * task's cgroup status and oom-killer selection, anyway. And, in real + * world, this will be no problem. + */ + mm = task->mm; + if (!mm || mm->owner != task) + return 0; rcu_read_lock(); - curr = try_get_mem_cgroup_from_mm(task->mm); + curr = mem_cgroup_from_task(task); + if (!css_tryget(&curr->css)); + curr = NULL; rcu_read_unlock(); - task_unlock(task); if (!curr) return 0; /* Index: mmotm-2.6.33-Feb06/mm/oom_kill.c =================================================================== --- mmotm-2.6.33-Feb06.orig/mm/oom_kill.c +++ mmotm-2.6.33-Feb06/mm/oom_kill.c @@ -264,7 +264,7 @@ static struct task_struct *select_bad_pr /* skip the init task */ if (is_global_init(p)) continue; - if (mem && !task_in_mem_cgroup(p, mem)) + if (mem && !task_in_oom_mem_cgroup(p, mem)) continue; /* @@ -332,7 +332,7 @@ static void dump_tasks(const struct mem_ do_each_thread(g, p) { struct mm_struct *mm; - if (mem && !task_in_mem_cgroup(p, mem)) + if (mem && !task_in_oom_mem_cgroup(p, mem)) continue; if (!thread_group_leader(p)) continue; @@ -459,6 +459,8 @@ static int oom_kill_process(struct task_ list_for_each_entry(c, &p->children, sibling) { if (c->mm == p->mm) continue; + if (mem && !task_in_oom_mem_cgroup(c, mem)) + continue; if (!oom_kill_task(c)) return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki @ 2010-02-09 7:50 ` David Rientjes 2010-02-09 8:02 ` KAMEZAWA Hiroyuki 2010-02-09 9:27 ` Balbir Singh 1 sibling, 1 reply; 19+ messages in thread From: David Rientjes @ 2010-02-09 7:50 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, linux-mm, Balbir Singh, nishimura, Andrew Morton On Tue, 9 Feb 2010, KAMEZAWA Hiroyuki wrote: > Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h > =================================================================== > --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h > +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h > @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_ > struct mem_cgroup *mem_cont, > int active, int file); > extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask); > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); > +int task_in_oom_mem_cgroup(struct task_struct *task, > + const struct mem_cgroup *mem); This is only called from the oom killer, so I'm not sure this needs to be renamed. It seems like any caller of this function, present or future, would be doing a tasklist iteration while holding a readlock on tasklist_lock, so perhaps just document that task_in_mem_cgroup() requires that? > > extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); > extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); > @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct > return 1; > } > > -static inline int task_in_mem_cgroup(struct task_struct *task, > +static inline int task_in_oom_mem_cgroup(struct task_struct *task, > const struct mem_cgroup *mem) > { > return 1; > Index: mmotm-2.6.33-Feb06/mm/memcontrol.c > =================================================================== > --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c > +++ mmotm-2.6.33-Feb06/mm/memcontrol.c > @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page * > mem_cgroup_add_lru_list(page, to); > } > > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) > +/* > + * This function is called from OOM Killer. This checks the task is mm_owner > + * and checks it's mem_cgroup is under oom. > + */ > +int task_in_oom_mem_cgroup(struct task_struct *task, > + const struct mem_cgroup *mem) > { > + struct mm_struct *mm; > int ret; > struct mem_cgroup *curr = NULL; > > - task_lock(task); > + /* > + * The task's task->mm pointer is guarded by task_lock() but it's > + * risky to take task_lock in oom kill situaion. Oom-killer may > + * kill a task which is in unknown status and cause siginificant delay > + * or deadlock. > + * So, we use some loose way. Because we're under taslist lock, "task" > + * pointer is always safe and we can access it. So, accessing mem_cgroup > + * via task struct is safe. To check the task is mm owner, we do loose > + * check. And this is enough. > + * There is small race at updating mm->onwer but we can ignore it. > + * A problematic race here means that oom-selection logic by walking > + * task list itself is racy. We can't make any strict guarantee between > + * task's cgroup status and oom-killer selection, anyway. And, in real > + * world, this will be no problem. > + */ > + mm = task->mm; > + if (!mm || mm->owner != task) > + return 0; You can't dereference task->mm->owner without holding task_lock(task), but I don't see why you need to even deal with task->mm. All callers to this function will check for !task->mm either during their iterations or with oom_kill_task() returning 0. > rcu_read_lock(); > - curr = try_get_mem_cgroup_from_mm(task->mm); > + curr = mem_cgroup_from_task(task); > + if (!css_tryget(&curr->css)); > + curr = NULL; We can always dereference p because of tasklist_lock, there should be no need to do rcu_read_lock() or any rcu dereference, so you should be able to just do this: do { curr = mem_cgroup_from_task(task); if (!curr) break; } while (!css_tryget(&curr->css)); If you like that better, I suggest sending your original two-liner fix using task_in_mem_cgroup() while taking task_lock(p) to stable and then improving on it with a follow-up patch for mainline to do this refcount variation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-09 7:50 ` David Rientjes @ 2010-02-09 8:02 ` KAMEZAWA Hiroyuki 2010-02-09 8:21 ` David Rientjes 0 siblings, 1 reply; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 8:02 UTC (permalink / raw) To: David Rientjes Cc: Minchan Kim, linux-mm, Balbir Singh, nishimura, Andrew Morton On Mon, 8 Feb 2010 23:50:12 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > On Tue, 9 Feb 2010, KAMEZAWA Hiroyuki wrote: > > > Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h > > =================================================================== > > --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h > > +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h > > @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_ > > struct mem_cgroup *mem_cont, > > int active, int file); > > extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask); > > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); > > +int task_in_oom_mem_cgroup(struct task_struct *task, > > + const struct mem_cgroup *mem); > > This is only called from the oom killer, so I'm not sure this needs to > be renamed. Why I renamed this is "be careful when a new user calls this". > It seems like any caller of this function, present or future, > would be doing a tasklist iteration while holding a readlock on > tasklist_lock, so perhaps just document that task_in_mem_cgroup() requires > that? Hmm. ok. I avoid this rename. It will make the patch smaller. > > > > > extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); > > extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); > > @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct > > return 1; > > } > > > > -static inline int task_in_mem_cgroup(struct task_struct *task, > > +static inline int task_in_oom_mem_cgroup(struct task_struct *task, > > const struct mem_cgroup *mem) > > { > > return 1; > > Index: mmotm-2.6.33-Feb06/mm/memcontrol.c > > =================================================================== > > --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c > > +++ mmotm-2.6.33-Feb06/mm/memcontrol.c > > @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page * > > mem_cgroup_add_lru_list(page, to); > > } > > > > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) > > +/* > > + * This function is called from OOM Killer. This checks the task is mm_owner > > + * and checks it's mem_cgroup is under oom. > > + */ > > +int task_in_oom_mem_cgroup(struct task_struct *task, > > + const struct mem_cgroup *mem) > > { > > + struct mm_struct *mm; > > int ret; > > struct mem_cgroup *curr = NULL; > > > > - task_lock(task); > > + /* > > + * The task's task->mm pointer is guarded by task_lock() but it's > > + * risky to take task_lock in oom kill situaion. Oom-killer may > > + * kill a task which is in unknown status and cause siginificant delay > > + * or deadlock. > > + * So, we use some loose way. Because we're under taslist lock, "task" > > + * pointer is always safe and we can access it. So, accessing mem_cgroup > > + * via task struct is safe. To check the task is mm owner, we do loose > > + * check. And this is enough. > > + * There is small race at updating mm->onwer but we can ignore it. > > + * A problematic race here means that oom-selection logic by walking > > + * task list itself is racy. We can't make any strict guarantee between > > + * task's cgroup status and oom-killer selection, anyway. And, in real > > + * world, this will be no problem. > > + */ > > + mm = task->mm; > > + if (!mm || mm->owner != task) > > + return 0; > > You can't dereference task->mm->owner without holding task_lock(task), but > I don't see why you need to even deal with task->mm. All callers to this > function will check for !task->mm either during their iterations or with > oom_kill_task() returning 0. > Just for being careful. We don't hold task_lock(), which guards task->mm in callers. > > rcu_read_lock(); > > - curr = try_get_mem_cgroup_from_mm(task->mm); > > + curr = mem_cgroup_from_task(task); > > + if (!css_tryget(&curr->css)); > > + curr = NULL; > > We can always dereference p because of tasklist_lock, there should be no > need to do rcu_read_lock() or any rcu dereference, so you should be able > to just do this: > > do { > curr = mem_cgroup_from_task(task); > if (!curr) > break; > } while (!css_tryget(&curr->css)); > Ok, I missed that. thank you. I'll use this code. > If you like that better, I suggest sending your original two-liner fix > using task_in_mem_cgroup() while taking task_lock(p) to stable and then > improving on it with a follow-up patch for mainline to do this refcount > variation. > Hmm. ok. I'll devide the patch into 2 parts. Thank you for review. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-09 8:02 ` KAMEZAWA Hiroyuki @ 2010-02-09 8:21 ` David Rientjes 2010-02-09 9:22 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 19+ messages in thread From: David Rientjes @ 2010-02-09 8:21 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, linux-mm, Balbir Singh, nishimura, Andrew Morton On Tue, 9 Feb 2010, KAMEZAWA Hiroyuki wrote: > > This is only called from the oom killer, so I'm not sure this needs to > > be renamed. > Why I renamed this is "be careful when a new user calls this". > It would still be good to document the function as requiring a readlock on tasklist_lock. > > > Index: mmotm-2.6.33-Feb06/mm/memcontrol.c > > > =================================================================== > > > --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c > > > +++ mmotm-2.6.33-Feb06/mm/memcontrol.c > > > @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page * > > > mem_cgroup_add_lru_list(page, to); > > > } > > > > > > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) > > > +/* > > > + * This function is called from OOM Killer. This checks the task is mm_owner > > > + * and checks it's mem_cgroup is under oom. > > > + */ > > > +int task_in_oom_mem_cgroup(struct task_struct *task, > > > + const struct mem_cgroup *mem) > > > { > > > + struct mm_struct *mm; > > > int ret; > > > struct mem_cgroup *curr = NULL; > > > > > > - task_lock(task); > > > + /* > > > + * The task's task->mm pointer is guarded by task_lock() but it's > > > + * risky to take task_lock in oom kill situaion. Oom-killer may > > > + * kill a task which is in unknown status and cause siginificant delay > > > + * or deadlock. > > > + * So, we use some loose way. Because we're under taslist lock, "task" > > > + * pointer is always safe and we can access it. So, accessing mem_cgroup > > > + * via task struct is safe. To check the task is mm owner, we do loose > > > + * check. And this is enough. > > > + * There is small race at updating mm->onwer but we can ignore it. > > > + * A problematic race here means that oom-selection logic by walking > > > + * task list itself is racy. We can't make any strict guarantee between > > > + * task's cgroup status and oom-killer selection, anyway. And, in real > > > + * world, this will be no problem. > > > + */ > > > + mm = task->mm; > > > + if (!mm || mm->owner != task) > > > + return 0; > > > > You can't dereference task->mm->owner without holding task_lock(task), but > > I don't see why you need to even deal with task->mm. All callers to this > > function will check for !task->mm either during their iterations or with > > oom_kill_task() returning 0. > > > Just for being careful. We don't hold task_lock(), which guards task->mm in > callers. > The callers don't care if it disappears out from under us since we never dereference it, it's just a sanity check to ensure we don't pick a kthread or an exiting task that won't free any memory. One of my patches to do the oom killer rewrite that I'll propose tomorrow actually removes a lot of that redundancy. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-09 8:21 ` David Rientjes @ 2010-02-09 9:22 ` KAMEZAWA Hiroyuki 2010-02-09 9:35 ` David Rientjes 0 siblings, 1 reply; 19+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-02-09 9:22 UTC (permalink / raw) To: David Rientjes Cc: Minchan Kim, linux-mm, Balbir Singh, nishimura, Andrew Morton On Tue, 9 Feb 2010 00:21:32 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > > > > - task_lock(task); > > > > + /* > > > > + * The task's task->mm pointer is guarded by task_lock() but it's > > > > + * risky to take task_lock in oom kill situaion. Oom-killer may > > > > + * kill a task which is in unknown status and cause siginificant delay > > > > + * or deadlock. > > > > + * So, we use some loose way. Because we're under taslist lock, "task" > > > > + * pointer is always safe and we can access it. So, accessing mem_cgroup > > > > + * via task struct is safe. To check the task is mm owner, we do loose > > > > + * check. And this is enough. > > > > + * There is small race at updating mm->onwer but we can ignore it. > > > > + * A problematic race here means that oom-selection logic by walking > > > > + * task list itself is racy. We can't make any strict guarantee between > > > > + * task's cgroup status and oom-killer selection, anyway. And, in real > > > > + * world, this will be no problem. > > > > + */ > > > > + mm = task->mm; > > > > + if (!mm || mm->owner != task) > > > > + return 0; > > > > > > You can't dereference task->mm->owner without holding task_lock(task), but > > > I don't see why you need to even deal with task->mm. All callers to this > > > function will check for !task->mm either during their iterations or with > > > oom_kill_task() returning 0. > > > > > Just for being careful. We don't hold task_lock(), which guards task->mm in > > callers. > > > > The callers don't care if it disappears out from under us since we never > dereference it, it's just a sanity check to ensure we don't pick a > kthread or an exiting task that won't free any memory. But we need the guarantee that it's safe to access mm->owner in this code. It's possible task->mm is set to be NULL while we come here. Hmm. taking task_lock() is better, finally ? But I don't like taking such a lock here to do easy checks.. *maybe* I'll postpone this updates and just post original fix again. There are task_lock() and task_unlock() but task_trylock() is not implemented. I think I shouldn't add a new trylock. For mm, I'll consinder some better way to ignore mm->owner. Maybe we can set some flag as "the task is mm_owner!" in the task struct.. it will allow us to remove task_lock here. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-09 9:22 ` KAMEZAWA Hiroyuki @ 2010-02-09 9:35 ` David Rientjes 0 siblings, 0 replies; 19+ messages in thread From: David Rientjes @ 2010-02-09 9:35 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, linux-mm, Balbir Singh, nishimura, Andrew Morton On Tue, 9 Feb 2010, KAMEZAWA Hiroyuki wrote: > > > > > - task_lock(task); > > > > > + /* > > > > > + * The task's task->mm pointer is guarded by task_lock() but it's > > > > > + * risky to take task_lock in oom kill situaion. Oom-killer may > > > > > + * kill a task which is in unknown status and cause siginificant delay > > > > > + * or deadlock. > > > > > + * So, we use some loose way. Because we're under taslist lock, "task" > > > > > + * pointer is always safe and we can access it. So, accessing mem_cgroup > > > > > + * via task struct is safe. To check the task is mm owner, we do loose > > > > > + * check. And this is enough. > > > > > + * There is small race at updating mm->onwer but we can ignore it. > > > > > + * A problematic race here means that oom-selection logic by walking > > > > > + * task list itself is racy. We can't make any strict guarantee between > > > > > + * task's cgroup status and oom-killer selection, anyway. And, in real > > > > > + * world, this will be no problem. > > > > > + */ > > > > > + mm = task->mm; > > > > > + if (!mm || mm->owner != task) > > > > > + return 0; > > > > > > > > You can't dereference task->mm->owner without holding task_lock(task), but > > > > I don't see why you need to even deal with task->mm. All callers to this > > > > function will check for !task->mm either during their iterations or with > > > > oom_kill_task() returning 0. > > > > > > > Just for being careful. We don't hold task_lock(), which guards task->mm in > > > callers. > > > > > > > The callers don't care if it disappears out from under us since we never > > dereference it, it's just a sanity check to ensure we don't pick a > > kthread or an exiting task that won't free any memory. > > But we need the guarantee that it's safe to access mm->owner in this code. > It's possible task->mm is set to be NULL while we come here. > Hmm. taking task_lock() is better, finally ? > That was my original point when I said you can't dereference task->mm->owner without task_lock(task), but I don't see why you need that check to begin with. > But I don't like taking such a lock here to do easy checks.. > *maybe* I'll postpone this updates and just post original fix again. > Feel free to add my Acked-by: David Rientjes <rientjes@google.com> since it's a much-needed fix for memcg both in mainline and in -stable. > There are task_lock() and task_unlock() but task_trylock() is not implemented. > I think I shouldn't add a new trylock. task_trylock() isn't appropriate for this usecase because it would exclude tasks from the iteration in select_bad_process() if its contended, i.e. we could panic the machine unnecessary simply because the lock is taken. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki 2010-02-09 7:50 ` David Rientjes @ 2010-02-09 9:27 ` Balbir Singh 1 sibling, 0 replies; 19+ messages in thread From: Balbir Singh @ 2010-02-09 9:27 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, linux-mm@kvack.org, nishimura@mxp.nes.nec.co.jp, akpm@linux-foundation.org, rientjes * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-02-09 12:02:09]: > How about this ? > Passed simple oom-kill test on mmotom-Feb06 > == > Now, oom-killer kills process's chidlren at first. But this means > a child in other cgroup can be killed. But it's not checked now. > > This patch fixes that. > > It's pointed out that task_lock in task_in_mem_cgroup is bad at > killing a task in oom-killer. I'll dig the earlier thread to see what you mean. It can cause siginificant delay or > deadlock. For removing unnecessary task_lock under oom-killer, we use > use some loose way. Considering oom-killer and task-walk in the tasklist, > checking "task is in mem_cgroup" itself includes some race and we don't > have to do strict check, here. > (IOW, we can't do it.) > > Changelog: 2009/02/09 > - modified task_in_mem_cgroup to be lockless. > > CC: Minchan Kim <minchan.kim@gmail.com> > CC: David Rientjes <rientjes@google.com> > CC: Balbir Singh <balbir@linux.vnet.ibm.com> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > include/linux/memcontrol.h | 5 +++-- > mm/memcontrol.c | 32 ++++++++++++++++++++++++++++---- > mm/oom_kill.c | 6 ++++-- > 3 files changed, 35 insertions(+), 8 deletions(-) > > Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h > =================================================================== > --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h > +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h > @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_ > struct mem_cgroup *mem_cont, > int active, int file); > extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask); > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); > +int task_in_oom_mem_cgroup(struct task_struct *task, > + const struct mem_cgroup *mem); > > extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); > extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); > @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct > return 1; > } > > -static inline int task_in_mem_cgroup(struct task_struct *task, > +static inline int task_in_oom_mem_cgroup(struct task_struct *task, > const struct mem_cgroup *mem) > { > return 1; > Index: mmotm-2.6.33-Feb06/mm/memcontrol.c > =================================================================== > --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c > +++ mmotm-2.6.33-Feb06/mm/memcontrol.c > @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page * > mem_cgroup_add_lru_list(page, to); > } > > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) > +/* > + * This function is called from OOM Killer. This checks the task is mm_owner > + * and checks it's mem_cgroup is under oom. > + */ > +int task_in_oom_mem_cgroup(struct task_struct *task, > + const struct mem_cgroup *mem) > { > + struct mm_struct *mm; > int ret; > struct mem_cgroup *curr = NULL; > > - task_lock(task); > + /* > + * The task's task->mm pointer is guarded by task_lock() but it's > + * risky to take task_lock in oom kill situaion. Oom-killer may > + * kill a task which is in unknown status and cause siginificant delay > + * or deadlock. task->mm is protected by task_lock() for several reasons including race with exec() and exit(). The task structure itself is protected via RCU, so task->task_lock. The OOM kill process should happen only when the signal is delivered (at context switch back to user space). I don't understand the race during OOM kill. > + * So, we use some loose way. Because we're under taslist lock, "task" > + * pointer is always safe and we can access it. So, accessing mem_cgroup > + * via task struct is safe. To check the task is mm owner, we do loose > + * check. And this is enough. > + * There is small race at updating mm->onwer but we can ignore it. > + * A problematic race here means that oom-selection logic by walking > + * task list itself is racy. We can't make any strict guarantee between > + * task's cgroup status and oom-killer selection, anyway. And, in real > + * world, this will be no problem. > + */ > + mm = task->mm; With the task_lock() gone, I'm afraid we might find the wrong task for OOM killing, specifically if the task is moving. > + if (!mm || mm->owner != task) > + return 0; > rcu_read_lock(); > - curr = try_get_mem_cgroup_from_mm(task->mm); > + curr = mem_cgroup_from_task(task); > + if (!css_tryget(&curr->css)); > + curr = NULL; > rcu_read_unlock(); > - task_unlock(task); > if (!curr) > return 0; > /* > Index: mmotm-2.6.33-Feb06/mm/oom_kill.c > =================================================================== > --- mmotm-2.6.33-Feb06.orig/mm/oom_kill.c > +++ mmotm-2.6.33-Feb06/mm/oom_kill.c > @@ -264,7 +264,7 @@ static struct task_struct *select_bad_pr > /* skip the init task */ > if (is_global_init(p)) > continue; > - if (mem && !task_in_mem_cgroup(p, mem)) > + if (mem && !task_in_oom_mem_cgroup(p, mem)) > continue; > > /* > @@ -332,7 +332,7 @@ static void dump_tasks(const struct mem_ > do_each_thread(g, p) { > struct mm_struct *mm; > > - if (mem && !task_in_mem_cgroup(p, mem)) > + if (mem && !task_in_oom_mem_cgroup(p, mem)) > continue; > if (!thread_group_leader(p)) > continue; > @@ -459,6 +459,8 @@ static int oom_kill_process(struct task_ > list_for_each_entry(c, &p->children, sibling) { > if (c->mm == p->mm) > continue; > + if (mem && !task_in_oom_mem_cgroup(c, mem)) > + continue; > if (!oom_kill_task(c)) > return 0; > } > -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-02-09 10:18 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-05 0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki 2010-02-05 0:57 ` David Rientjes 2010-02-05 16:30 ` Minchan Kim 2010-02-09 0:32 ` KAMEZAWA Hiroyuki 2010-02-09 0:56 ` KAMEZAWA Hiroyuki 2010-02-09 1:24 ` Minchan Kim 2010-02-09 1:34 ` KAMEZAWA Hiroyuki 2010-02-09 6:49 ` David Rientjes 2010-02-09 7:08 ` KAMEZAWA Hiroyuki 2010-02-09 9:40 ` Minchan Kim 2010-02-09 9:55 ` David Rientjes 2010-02-09 10:18 ` Minchan Kim 2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki 2010-02-09 7:50 ` David Rientjes 2010-02-09 8:02 ` KAMEZAWA Hiroyuki 2010-02-09 8:21 ` David Rientjes 2010-02-09 9:22 ` KAMEZAWA Hiroyuki 2010-02-09 9:35 ` David Rientjes 2010-02-09 9:27 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).