From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
rientjes@google.com
Subject: Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2
Date: Tue, 9 Feb 2010 14:57:22 +0530 [thread overview]
Message-ID: <20100209092722.GE3290@balbir.in.ibm.com> (raw)
In-Reply-To: <20100209120209.686c348c.kamezawa.hiroyu@jp.fujitsu.com>
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-02-09 12:02:09]:
> How about this ?
> Passed simple oom-kill test on mmotom-Feb06
> ==
> Now, oom-killer kills process's chidlren at first. But this means
> a child in other cgroup can be killed. But it's not checked now.
>
> This patch fixes that.
>
> It's pointed out that task_lock in task_in_mem_cgroup is bad at
> killing a task in oom-killer.
I'll dig the earlier thread to see what you mean.
It can cause siginificant delay or
> deadlock. For removing unnecessary task_lock under oom-killer, we use
> use some loose way. Considering oom-killer and task-walk in the tasklist,
> checking "task is in mem_cgroup" itself includes some race and we don't
> have to do strict check, here.
> (IOW, we can't do it.)
>
> Changelog: 2009/02/09
> - modified task_in_mem_cgroup to be lockless.
>
> CC: Minchan Kim <minchan.kim@gmail.com>
> CC: David Rientjes <rientjes@google.com>
> CC: Balbir Singh <balbir@linux.vnet.ibm.com>
> CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> include/linux/memcontrol.h | 5 +++--
> mm/memcontrol.c | 32 ++++++++++++++++++++++++++++----
> mm/oom_kill.c | 6 ++++--
> 3 files changed, 35 insertions(+), 8 deletions(-)
>
> Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h
> ===================================================================
> --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h
> +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h
> @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_
> struct mem_cgroup *mem_cont,
> int active, int file);
> extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
> -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
> +int task_in_oom_mem_cgroup(struct task_struct *task,
> + const struct mem_cgroup *mem);
>
> extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
> extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
> @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct
> return 1;
> }
>
> -static inline int task_in_mem_cgroup(struct task_struct *task,
> +static inline int task_in_oom_mem_cgroup(struct task_struct *task,
> const struct mem_cgroup *mem)
> {
> return 1;
> Index: mmotm-2.6.33-Feb06/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb06/mm/memcontrol.c
> @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page *
> mem_cgroup_add_lru_list(page, to);
> }
>
> -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> +/*
> + * This function is called from OOM Killer. This checks the task is mm_owner
> + * and checks it's mem_cgroup is under oom.
> + */
> +int task_in_oom_mem_cgroup(struct task_struct *task,
> + const struct mem_cgroup *mem)
> {
> + struct mm_struct *mm;
> int ret;
> struct mem_cgroup *curr = NULL;
>
> - task_lock(task);
> + /*
> + * The task's task->mm pointer is guarded by task_lock() but it's
> + * risky to take task_lock in oom kill situaion. Oom-killer may
> + * kill a task which is in unknown status and cause siginificant delay
> + * or deadlock.
task->mm is protected by task_lock() for several reasons including
race with exec() and exit(). The task structure itself is protected via RCU, so
task->task_lock. The OOM kill process should happen only when the
signal is delivered (at context switch back to user space). I don't
understand the race during OOM kill.
> + * So, we use some loose way. Because we're under taslist lock, "task"
> + * pointer is always safe and we can access it. So, accessing mem_cgroup
> + * via task struct is safe. To check the task is mm owner, we do loose
> + * check. And this is enough.
> + * There is small race at updating mm->onwer but we can ignore it.
> + * A problematic race here means that oom-selection logic by walking
> + * task list itself is racy. We can't make any strict guarantee between
> + * task's cgroup status and oom-killer selection, anyway. And, in real
> + * world, this will be no problem.
> + */
> + mm = task->mm;
With the task_lock() gone, I'm afraid we might find the wrong task for
OOM killing, specifically if the task is moving.
> + if (!mm || mm->owner != task)
> + return 0;
> rcu_read_lock();
> - curr = try_get_mem_cgroup_from_mm(task->mm);
> + curr = mem_cgroup_from_task(task);
> + if (!css_tryget(&curr->css));
> + curr = NULL;
> rcu_read_unlock();
> - task_unlock(task);
> if (!curr)
> return 0;
> /*
> Index: mmotm-2.6.33-Feb06/mm/oom_kill.c
> ===================================================================
> --- mmotm-2.6.33-Feb06.orig/mm/oom_kill.c
> +++ mmotm-2.6.33-Feb06/mm/oom_kill.c
> @@ -264,7 +264,7 @@ static struct task_struct *select_bad_pr
> /* skip the init task */
> if (is_global_init(p))
> continue;
> - if (mem && !task_in_mem_cgroup(p, mem))
> + if (mem && !task_in_oom_mem_cgroup(p, mem))
> continue;
>
> /*
> @@ -332,7 +332,7 @@ static void dump_tasks(const struct mem_
> do_each_thread(g, p) {
> struct mm_struct *mm;
>
> - if (mem && !task_in_mem_cgroup(p, mem))
> + if (mem && !task_in_oom_mem_cgroup(p, mem))
> continue;
> if (!thread_group_leader(p))
> continue;
> @@ -459,6 +459,8 @@ static int oom_kill_process(struct task_
> list_for_each_entry(c, &p->children, sibling) {
> if (c->mm == p->mm)
> continue;
> + if (mem && !task_in_oom_mem_cgroup(c, mem))
> + continue;
> if (!oom_kill_task(c))
> return 0;
> }
>
--
Three Cheers,
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2010-02-09 9:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-05 0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki
2010-02-05 0:57 ` David Rientjes
2010-02-05 16:30 ` Minchan Kim
2010-02-09 0:32 ` KAMEZAWA Hiroyuki
2010-02-09 0:56 ` KAMEZAWA Hiroyuki
2010-02-09 1:24 ` Minchan Kim
2010-02-09 1:34 ` KAMEZAWA Hiroyuki
2010-02-09 6:49 ` David Rientjes
2010-02-09 7:08 ` KAMEZAWA Hiroyuki
2010-02-09 9:40 ` Minchan Kim
2010-02-09 9:55 ` David Rientjes
2010-02-09 10:18 ` Minchan Kim
2010-02-09 3:02 ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki
2010-02-09 7:50 ` David Rientjes
2010-02-09 8:02 ` KAMEZAWA Hiroyuki
2010-02-09 8:21 ` David Rientjes
2010-02-09 9:22 ` KAMEZAWA Hiroyuki
2010-02-09 9:35 ` David Rientjes
2010-02-09 9:27 ` Balbir Singh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100209092722.GE3290@balbir.in.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).