linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan.kim@gmail.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	rientjes@google.com
Subject: Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup
Date: Tue, 9 Feb 2010 10:24:45 +0900	[thread overview]
Message-ID: <28c262361002081724l1b64e316v3141fb4567dbf905@mail.gmail.com> (raw)
In-Reply-To: <20100209093246.36c50bae.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, Feb 9, 2010 at 9:32 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Sat, 6 Feb 2010 01:30:49 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Kame.
>>
>> On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > Please take this patch in different context with recent discussion.
>> > This is a quick-fix for a terrible bug.
>> >
>> > This patch itself is against mmotm but can be easily applied to mainline or
>> > stable tree, I think. (But I don't CC stable tree until I get ack.)
>> >
>> > ==
>> > Now, oom-killer kills process's chidlren at first. But this means
>> > a child in other cgroup can be killed. But it's not checked now.
>> >
>> > This patch fixes that.
>> >
>> > CC: Balbir Singh <balbir@linux.vnet.ibm.com>
>> > CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > ---
>> >  mm/oom_kill.c |    3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c
>> > ===================================================================
>> > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c
>> > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c
>> > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_
>> >        list_for_each_entry(c, &p->children, sibling) {
>> >                if (c->mm == p->mm)
>> >                        continue;
>> > +               /* Children may be in other cgroup */
>> > +               if (mem && !task_in_mem_cgroup(c, mem))
>> > +                       continue;
>> >                if (!oom_kill_task(c))
>> >                        return 0;
>> >        }
>> >
>> > --
>>
>> I am worried about latency of OOM at worst case.
>> I mean that task_in_mem_cgroup calls task_lock of child.
>> We have used task_lock in many place.
>> Some place task_lock hold and then other locks.
>> For example, exit_fs held task_lock and try to hold write_lock of fs->lock.
>> If child already hold task_lock and wait to write_lock of fs->lock, OOM latency
>> is dependent of fs->lock.
>>
>> I am not sure how many usecase is also dependent of other locks.
>> If it is not as is, we can't make sure in future.
>>
>> So How about try_task_in_mem_cgroup?
>> If we can't hold task_lock, let's continue next child.
>>
> It's recommended not to use trylock in unclear case.
>
> Then, I think possible replacement will be not-to-use any lock in
> task_in_mem_cgroup. In my short consideration, I don't think task_lock
> is necessary if we can add some tricks and memory barrier.
>
> Please let this patch to go as it is because this is an obvious bug fix
> and give me time.

I think it's not only a latency problem of OOM but it is also a
problem of deadlock.
We can't expect child's lock state in oom_kill_process.

So if you can remove lock like below your suggestion, I am OKAY.

>
> Now, I think of following.
> This makes use of the fact mm->owner is changed only at _exit() of the owner.
> If there is a race with _exit() and mm->owner is racy, the oom selection
> itself was racy and bad.

It seems to make sense to me.

> ==
> int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem)
> {
>        struct mm_struct *mm;
>        struct task_struct *tsk;
>        int ret = 0;
>
>        mm = tsk->mm;
>        if (!mm)
>                return ret;
>        /*
>         * we are not interested in tasks other than owner. mm->owner is
>         * updated when the owner task exits. If the owner is exiting now
>         * (and race with us), we may miss.
>         */
>        if (rcu_dereference(mm->owner) != tsk)
>                return ret;

Yes. In this case, OOM killer can wait a few seconds until this task is exited.
If we don't do that, we could kill other innocent task.

>        rcu_read_lock();
>        /* while this task is alive, this task is the owner */
>        if (mem == mem_cgroup_from_task(tsk))
>                ret = 1;
>        rcu_read_unlock();
>        return ret;
> }
> ==
> Hmm, it seems no memory barrier is necessary.
>
> Does anyone has another idea ?
>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-02-09  1:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-05  0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki
2010-02-05  0:57 ` David Rientjes
2010-02-05 16:30 ` Minchan Kim
2010-02-09  0:32   ` KAMEZAWA Hiroyuki
2010-02-09  0:56     ` KAMEZAWA Hiroyuki
2010-02-09  1:24     ` Minchan Kim [this message]
2010-02-09  1:34       ` KAMEZAWA Hiroyuki
2010-02-09  6:49       ` David Rientjes
2010-02-09  7:08         ` KAMEZAWA Hiroyuki
2010-02-09  9:40         ` Minchan Kim
2010-02-09  9:55           ` David Rientjes
2010-02-09 10:18             ` Minchan Kim
2010-02-09  3:02   ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki
2010-02-09  7:50     ` David Rientjes
2010-02-09  8:02       ` KAMEZAWA Hiroyuki
2010-02-09  8:21         ` David Rientjes
2010-02-09  9:22           ` KAMEZAWA Hiroyuki
2010-02-09  9:35             ` David Rientjes
2010-02-09  9:27     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28c262361002081724l1b64e316v3141fb4567dbf905@mail.gmail.com \
    --to=minchan.kim@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).