linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	rientjes@google.com
Subject: Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2
Date: Tue, 9 Feb 2010 14:57:22 +0530	[thread overview]
Message-ID: <20100209092722.GE3290@balbir.in.ibm.com> (raw)
In-Reply-To: <20100209120209.686c348c.kamezawa.hiroyu@jp.fujitsu.com>

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-02-09 12:02:09]:

> How about this ?
> Passed simple oom-kill test on mmotom-Feb06
> ==
> Now, oom-killer kills process's chidlren at first. But this means
> a child in other cgroup can be killed. But it's not checked now.
> 
> This patch fixes that.
> 
> It's pointed out that task_lock in task_in_mem_cgroup is bad at
> killing a task in oom-killer.

I'll dig the earlier thread to see what you mean.

 It can cause siginificant delay or
> deadlock. For removing unnecessary task_lock under oom-killer, we use
> use some loose way. Considering oom-killer and task-walk in the tasklist, 
> checking "task is in mem_cgroup" itself includes some race and we don't
> have to do strict check, here.
> (IOW, we can't do it.)
> 
> Changelog: 2009/02/09
>  - modified task_in_mem_cgroup to be lockless.
> 
> CC: Minchan Kim <minchan.kim@gmail.com>
> CC: David Rientjes <rientjes@google.com>
> CC: Balbir Singh <balbir@linux.vnet.ibm.com>
> CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/memcontrol.h |    5 +++--
>  mm/memcontrol.c            |   32 ++++++++++++++++++++++++++++----
>  mm/oom_kill.c              |    6 ++++--
>  3 files changed, 35 insertions(+), 8 deletions(-)
> 
> Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h
> ===================================================================
> --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h
> +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h
> @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_
>  					struct mem_cgroup *mem_cont,
>  					int active, int file);
>  extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
> -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
> +int task_in_oom_mem_cgroup(struct task_struct *task,
> +	const struct mem_cgroup *mem);
> 
>  extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
>  extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
> @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct
>  	return 1;
>  }
> 
> -static inline int task_in_mem_cgroup(struct task_struct *task,
> +static inline int task_in_oom_mem_cgroup(struct task_struct *task,
>  				     const struct mem_cgroup *mem)
>  {
>  	return 1;
> Index: mmotm-2.6.33-Feb06/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb06/mm/memcontrol.c
> @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page *
>  	mem_cgroup_add_lru_list(page, to);
>  }
> 
> -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> +/*
> + * This function is called from OOM Killer. This checks the task is mm_owner
> + * and checks it's mem_cgroup is under oom.
> + */
> +int task_in_oom_mem_cgroup(struct task_struct *task,
> +		const struct mem_cgroup *mem)
>  {
> +	struct mm_struct *mm;
>  	int ret;
>  	struct mem_cgroup *curr = NULL;
> 
> -	task_lock(task);
> +	/*
> + 	 * The task's task->mm pointer is guarded by task_lock() but it's
> + 	 * risky to take task_lock in oom kill situaion. Oom-killer may
> + 	 * kill a task which is in unknown status and cause siginificant delay
> + 	 * or deadlock.

task->mm is protected by task_lock() for several reasons including
race with exec() and exit(). The task structure itself is protected via RCU, so
task->task_lock. The OOM kill process should happen only when the
signal is delivered (at context switch back to user space). I don't
understand the race during OOM kill.

> + 	 * So, we use some loose way. Because we're under taslist lock, "task"
> + 	 * pointer is always safe and we can access it. So, accessing mem_cgroup
> + 	 * via task struct is safe. To check the task is mm owner, we do loose
> + 	 * check. And this is enough.
> + 	 * There is small race at updating mm->onwer but we can ignore it.
> + 	 * A problematic race here means that oom-selection logic by walking
> + 	 * task list itself is racy. We can't make any strict guarantee between
> + 	 * task's cgroup status and oom-killer selection, anyway. And, in real
> + 	 * world, this will be no problem.
> + 	 */
> +	mm = task->mm;

With the task_lock() gone, I'm afraid we might find the wrong task for
OOM killing, specifically if the task is moving.

> +	if (!mm || mm->owner != task)
> +		return 0;
>  	rcu_read_lock();
> -	curr = try_get_mem_cgroup_from_mm(task->mm);
> +	curr = mem_cgroup_from_task(task);
> +	if (!css_tryget(&curr->css));
> +		curr = NULL;
>  	rcu_read_unlock();
> -	task_unlock(task);
>  	if (!curr)
>  		return 0;
>  	/*
> Index: mmotm-2.6.33-Feb06/mm/oom_kill.c
> ===================================================================
> --- mmotm-2.6.33-Feb06.orig/mm/oom_kill.c
> +++ mmotm-2.6.33-Feb06/mm/oom_kill.c
> @@ -264,7 +264,7 @@ static struct task_struct *select_bad_pr
>  		/* skip the init task */
>  		if (is_global_init(p))
>  			continue;
> -		if (mem && !task_in_mem_cgroup(p, mem))
> +		if (mem && !task_in_oom_mem_cgroup(p, mem))
>  			continue;
> 
>  		/*
> @@ -332,7 +332,7 @@ static void dump_tasks(const struct mem_
>  	do_each_thread(g, p) {
>  		struct mm_struct *mm;
> 
> -		if (mem && !task_in_mem_cgroup(p, mem))
> +		if (mem && !task_in_oom_mem_cgroup(p, mem))
>  			continue;
>  		if (!thread_group_leader(p))
>  			continue;
> @@ -459,6 +459,8 @@ static int oom_kill_process(struct task_
>  	list_for_each_entry(c, &p->children, sibling) {
>  		if (c->mm == p->mm)
>  			continue;
> +		if (mem && !task_in_oom_mem_cgroup(c, mem))
> +			continue;
>  		if (!oom_kill_task(c))
>  			return 0;
>  	}
> 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      parent reply	other threads:[~2010-02-09  9:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-05  0:39 [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup KAMEZAWA Hiroyuki
2010-02-05  0:57 ` David Rientjes
2010-02-05 16:30 ` Minchan Kim
2010-02-09  0:32   ` KAMEZAWA Hiroyuki
2010-02-09  0:56     ` KAMEZAWA Hiroyuki
2010-02-09  1:24     ` Minchan Kim
2010-02-09  1:34       ` KAMEZAWA Hiroyuki
2010-02-09  6:49       ` David Rientjes
2010-02-09  7:08         ` KAMEZAWA Hiroyuki
2010-02-09  9:40         ` Minchan Kim
2010-02-09  9:55           ` David Rientjes
2010-02-09 10:18             ` Minchan Kim
2010-02-09  3:02   ` [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2 KAMEZAWA Hiroyuki
2010-02-09  7:50     ` David Rientjes
2010-02-09  8:02       ` KAMEZAWA Hiroyuki
2010-02-09  8:21         ` David Rientjes
2010-02-09  9:22           ` KAMEZAWA Hiroyuki
2010-02-09  9:35             ` David Rientjes
2010-02-09  9:27     ` Balbir Singh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100209092722.GE3290@balbir.in.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).