From: Minchan Kim <minchan.kim@gmail.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
rientjes@google.com
Subject: Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup
Date: Tue, 9 Feb 2010 10:24:45 +0900 [thread overview]
Message-ID: <28c262361002081724l1b64e316v3141fb4567dbf905@mail.gmail.com> (raw)
In-Reply-To: <20100209093246.36c50bae.kamezawa.hiroyu@jp.fujitsu.com>
On Tue, Feb 9, 2010 at 9:32 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Sat, 6 Feb 2010 01:30:49 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Kame.
>>
>> On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > Please take this patch in different context with recent discussion.
>> > This is a quick-fix for a terrible bug.
>> >
>> > This patch itself is against mmotm but can be easily applied to mainline or
>> > stable tree, I think. (But I don't CC stable tree until I get ack.)
>> >
>> > ==
>> > Now, oom-killer kills process's chidlren at first. But this means
>> > a child in other cgroup can be killed. But it's not checked now.
>> >
>> > This patch fixes that.
>> >
>> > CC: Balbir Singh <balbir@linux.vnet.ibm.com>
>> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > ---
>> > mm/oom_kill.c | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c
>> > ===================================================================
>> > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c
>> > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c
>> > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_
>> > list_for_each_entry(c, &p->children, sibling) {
>> > if (c->mm == p->mm)
>> > continue;
>> > + /* Children may be in other cgroup */
>> > + if (mem && !task_in_mem_cgroup(c, mem))
>> > + continue;
>> > if (!oom_kill_task(c))
>> > return 0;
>> > }
>> >
>> > --
>>
>> I am worried about latency of OOM at worst case.
>> I mean that task_in_mem_cgroup calls task_lock of child.
>> We have used task_lock in many place.
>> Some place task_lock hold and then other locks.
>> For example, exit_fs held task_lock and try to hold write_lock of fs->lock.
>> If child already hold task_lock and wait to write_lock of fs->lock, OOM latency
>> is dependent of fs->lock.
>>
>> I am not sure how many usecase is also dependent of other locks.
>> If it is not as is, we can't make sure in future.
>>
>> So How about try_task_in_mem_cgroup?
>> If we can't hold task_lock, let's continue next child.
>>
> It's recommended not to use trylock in unclear case.
>
> Then, I think possible replacement will be not-to-use any lock in
> task_in_mem_cgroup. In my short consideration, I don't think task_lock
> is necessary if we can add some tricks and memory barrier.
>
> Please let this patch to go as it is because this is an obvious bug fix
> and give me time.
I think it's not only a latency problem of OOM but it is also a
problem of deadlock.
We can't expect child's lock state in oom_kill_process.
So if you can remove lock like below your suggestion, I am OKAY.
>
> Now, I think of following.
> This makes use of the fact mm->owner is changed only at _exit() of the owner.
> If there is a race with _exit() and mm->owner is racy, the oom selection
> itself was racy and bad.
It seems to make sense to me.
> ==
> int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem)
> {
> struct mm_struct *mm;
> struct task_struct *tsk;
> int ret = 0;
>
> mm = tsk->mm;
> if (!mm)
> return ret;
> /*
> * we are not interested in tasks other than owner. mm->owner is
> * updated when the owner task exits. If the owner is exiting now
> * (and race with us), we may miss.
> */
> if (rcu_dereference(mm->owner) != tsk)
> return ret;
Yes. In this case, OOM killer can wait a few seconds until this task is exited.
If we don't do that, we could kill other innocent task.
> rcu_read_lock();
> /* while this task is alive, this task is the owner */
> if (mem == mem_cgroup_from_task(tsk))
> ret = 1;
> rcu_read_unlock();
> return ret;
> }
> ==
> Hmm, it seems no memory barrier is necessary.
>
> Does anyone has another idea ?
>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-09 1:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-05 0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki
2010-02-05 0:57 ` David Rientjes
2010-02-05 16:30 ` Minchan Kim
2010-02-09 0:32 ` KAMEZAWA Hiroyuki
2010-02-09 0:56 ` KAMEZAWA Hiroyuki
2010-02-09 1:24 ` Minchan Kim [this message]
2010-02-09 1:34 ` KAMEZAWA Hiroyuki
2010-02-09 6:49 ` David Rientjes
2010-02-09 7:08 ` KAMEZAWA Hiroyuki
2010-02-09 9:40 ` Minchan Kim
2010-02-09 9:55 ` David Rientjes
2010-02-09 10:18 ` Minchan Kim
2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki
2010-02-09 7:50 ` David Rientjes
2010-02-09 8:02 ` KAMEZAWA Hiroyuki
2010-02-09 8:21 ` David Rientjes
2010-02-09 9:22 ` KAMEZAWA Hiroyuki
2010-02-09 9:35 ` David Rientjes
2010-02-09 9:27 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28c262361002081724l1b64e316v3141fb4567dbf905@mail.gmail.com \
--to=minchan.kim@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).