[PATCH 0/2] initial while_each

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] initial while_each_thread() fixes
@ 2013-12-02 15:24 Oleg Nesterov
  2013-12-02 15:24 ` [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-02 15:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Frederic Weisbecker, Mandeep Singh Baines,
	Ma, Xindong, Michal Hocko, Sameer Nanda, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

Hello.

This was reported several times, I believe the first report is
http://marc.info/?l=linux-kernel&m=127688978121665. Hmm, 3 years
ago. The lockless while_each_thread() is racy and broken, almost
every user can loop forever.

Recently people started to report they actually hit this problem in
oom_kill.c. This doesn't really matter and I can be wrong, but in
fact I do not think they really hit this race, it is very unlikely.
Another problem with while_each_thread() is that it is very easy
to use it wrongly, and oom_kill.c is the good example.

I came to conclusion that it is practically impossible to send a
single series which fixes all problems, too many different users.

So 1/2 adds the new for_each_thread() interface, and 2/2 fixes oom
kill as an example.

We obviously need a lot more changes like 2/2 before we can kill
while_each_thread() and task_struct->thread_group, but I hope they
will be straighforward. And in fact I hope that task->thread_group
can go away before we change all users of while_each_thread().

David, et al, I din't actually test 2/2, I do not know how. Please
review, although it looks simple.

Oleg.

 include/linux/init_task.h |    2 ++
 include/linux/sched.h     |   12 ++++++++++++
 kernel/exit.c             |    1 +
 kernel/fork.c             |    7 +++++++
 mm/oom_kill.c             |   37 ++++++++++++++++++++-----------------
 5 files changed, 42 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-02 15:24 [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
@ 2013-12-02 15:24 ` Oleg Nesterov
  2013-12-03 19:24   ` Sameer Nanda
  2013-12-02 15:24 ` [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-02 15:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Frederic Weisbecker, Mandeep Singh Baines,
	Ma, Xindong, Michal Hocko, Sameer Nanda, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

while_each_thread() and next_thread() should die, almost every
lockless usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

   while_each_thread(g, t) can loop forever if g exits, next_thread()
   can't reach the unhashed thread in this case. Note that this can
   happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
   it wrongly.

   It was never safe to just take rcu_read_lock() and loop unless
   you verify that pid_alive(g) == T, even the first next_thread()
   can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node
to create the normal rcu-safe list with the stable head. The new
for_each_thread(g, t) helper is always safe under rcu_read_lock()
as long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node
and the old task_struct->thread_group, we will kill it later, after
we change the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node.
But we can't do this right now because this will lead to subtle
behavioural changes. For example, do/while_each_thread() always
sees at least one task, while for_each_thread() can do nothing if
the whole thread group has died. Or thread_group_empty(), currently
its semantics is not clear unless thread_group_leader(p) and we
need to audit the callers before we can change it.

So this patch adds the new interface which has to coexist with the
old one for some time, hopefully the next changes will be more or
less straightforward and the old one will go away soon.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/init_task.h |    2 ++
 include/linux/sched.h     |   12 ++++++++++++
 kernel/exit.c             |    1 +
 kernel/fork.c             |    7 +++++++
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index b0ed422..668aae0 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -40,6 +40,7 @@ extern struct fs_struct init_fs;
 
 #define INIT_SIGNALS(sig) {						\
 	.nr_threads	= 1,						\
+	.thread_head	= LIST_HEAD_INIT(init_task.thread_node),	\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
 	.shared_pending	= { 						\
 		.list = LIST_HEAD_INIT(sig.shared_pending.list),	\
@@ -213,6 +214,7 @@ extern struct task_group root_task_group;
 		[PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),		\
 	},								\
 	.thread_group	= LIST_HEAD_INIT(tsk.thread_group),		\
+	.thread_node	= LIST_HEAD_INIT(init_signals.thread_head),	\
 	INIT_IDS							\
 	INIT_PERF_EVENTS(tsk)						\
 	INIT_TRACE_IRQFLAGS						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dc48056..0550083 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -487,6 +487,7 @@ struct signal_struct {
 	atomic_t		sigcnt;
 	atomic_t		live;
 	int			nr_threads;
+	struct list_head	thread_head;
 
 	wait_queue_head_t	wait_chldexit;	/* for wait4() */
 
@@ -1162,6 +1163,7 @@ struct task_struct {
 	/* PID/PID hash table linkage. */
 	struct pid_link pids[PIDTYPE_MAX];
 	struct list_head thread_group;
+	struct list_head thread_node;
 
 	struct completion *vfork_done;		/* for vfork() */
 	int __user *set_child_tid;		/* CLONE_CHILD_SETTID */
@@ -2225,6 +2227,16 @@ extern bool current_is_single_threaded(void);
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)
 
+#define __for_each_thread(signal, t)	\
+	list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
+
+#define for_each_thread(p, t)		\
+	__for_each_thread((p)->signal, t)
+
+/* Careful: this is a double loop, 'break' won't work as expected. */
+#define for_each_process_thread(p, t)	\
+	for_each_process(p) for_each_thread(p, t)
+
 static inline int get_nr_threads(struct task_struct *tsk)
 {
 	return tsk->signal->nr_threads;
diff --git a/kernel/exit.c b/kernel/exit.c
index a949819..1e77fc6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -74,6 +74,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
 		__this_cpu_dec(process_counts);
 	}
 	list_del_rcu(&p->thread_group);
+	list_del_rcu(&p->thread_node);
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 9be5154..b84bef7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1034,6 +1034,11 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	sig->nr_threads = 1;
 	atomic_set(&sig->live, 1);
 	atomic_set(&sig->sigcnt, 1);
+
+	/* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
+	sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
+	tsk->thread_node = (struct list_head)LIST_HEAD_INIT(sig->thread_head);
+
 	init_waitqueue_head(&sig->wait_chldexit);
 	sig->curr_target = tsk;
 	init_sigpending(&sig->shared_pending);
@@ -1470,6 +1475,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 			atomic_inc(&current->signal->sigcnt);
 			list_add_tail_rcu(&p->thread_group,
 					  &p->group_leader->thread_group);
+			list_add_tail_rcu(&p->thread_node,
+					  &p->signal->thread_head);
 		}
 		attach_pid(p, PIDTYPE_PID);
 		nr_threads++;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-02 15:24 [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
  2013-12-02 15:24 ` [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
@ 2013-12-02 15:24 ` Oleg Nesterov
  2013-12-03 18:57   ` Sameer Nanda
  2013-12-02 15:34 ` [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-02 15:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Frederic Weisbecker, Mandeep Singh Baines,
	Ma, Xindong, Michal Hocko, Sameer Nanda, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

1. Change oom_kill.c to use for_each_thread() rather than the racy
   while_each_thread() which can loop forever if we race with exit.

   Note also that most users were buggy even if while_each_thread()
   was fine, the task can exit even _before_ rcu_read_lock().

   Fortunately the new for_each_thread() only requires the stable
   task_struct, so this change fixes both problems.

2. At least out_of_memory() calls has_intersects_mems_allowed()
   without even rcu_read_lock(), this is obviously buggy.

   Add the necessary rcu_read_lock(). This means that we can not
   simply return from the loop, we need "bool ret" and "break".

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 mm/oom_kill.c |   37 ++++++++++++++++++++-----------------
 1 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e4a600..47dd4ce 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -54,12 +54,14 @@ static DEFINE_SPINLOCK(zone_scan_lock);
  * shares the same mempolicy nodes as current if it is bound by such a policy
  * and whether or not it has the same set of allowed cpuset nodes.
  */
-static bool has_intersects_mems_allowed(struct task_struct *tsk,
+static bool has_intersects_mems_allowed(struct task_struct *start,
 					const nodemask_t *mask)
 {
-	struct task_struct *start = tsk;
+	struct task_struct *tsk;
+	bool ret = false;
 
-	do {
+	rcu_read_lock();
+	for_each_thread(start, tsk) {
 		if (mask) {
 			/*
 			 * If this is a mempolicy constrained oom, tsk's
@@ -67,19 +69,20 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 			 * mempolicy intersects current, otherwise it may be
 			 * needlessly killed.
 			 */
-			if (mempolicy_nodemask_intersects(tsk, mask))
-				return true;
+			ret = mempolicy_nodemask_intersects(tsk, mask);
 		} else {
 			/*
 			 * This is not a mempolicy constrained oom, so only
 			 * check the mems of tsk's cpuset.
 			 */
-			if (cpuset_mems_allowed_intersects(current, tsk))
-				return true;
+			ret = cpuset_mems_allowed_intersects(current, tsk);
 		}
-	} while_each_thread(start, tsk);
+		if (ret)
+			break;
+	}
+	rcu_read_unlock();
 
-	return false;
+	return ret;
 }
 #else
 static bool has_intersects_mems_allowed(struct task_struct *tsk,
@@ -97,14 +100,14 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
  */
 struct task_struct *find_lock_task_mm(struct task_struct *p)
 {
-	struct task_struct *t = p;
+	struct task_struct *t;
 
-	do {
+	for_each_thread(p, t) {
 		task_lock(t);
 		if (likely(t->mm))
 			return t;
 		task_unlock(t);
-	} while_each_thread(p, t);
+	}
 
 	return NULL;
 }
@@ -301,7 +304,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	unsigned long chosen_points = 0;
 
 	rcu_read_lock();
-	do_each_thread(g, p) {
+	for_each_process_thread(g, p) {
 		unsigned int points;
 
 		switch (oom_scan_process_thread(p, totalpages, nodemask,
@@ -323,7 +326,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			chosen = p;
 			chosen_points = points;
 		}
-	} while_each_thread(g, p);
+	}
 	if (chosen)
 		get_task_struct(chosen);
 	rcu_read_unlock();
@@ -406,7 +409,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
-	struct task_struct *t = p;
+	struct task_struct *t;
 	struct mm_struct *mm;
 	unsigned int victim_points = 0;
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
@@ -437,7 +440,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * still freeing memory.
 	 */
 	read_lock(&tasklist_lock);
-	do {
+	for_each_thread(p, t) {
 		list_for_each_entry(child, &t->children, sibling) {
 			unsigned int child_points;
 
@@ -455,7 +458,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 				get_task_struct(victim);
 			}
 		}
-	} while_each_thread(p, t);
+	}
 	read_unlock(&tasklist_lock);
 
 	rcu_read_lock();
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-02 15:24 [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
  2013-12-02 15:24 ` [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
  2013-12-02 15:24 ` [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
@ 2013-12-02 15:34 ` Oleg Nesterov
  2013-12-03 15:46 ` Sergey Dyasly
  2013-12-03 16:53 ` William Dauchy
  4 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-02 15:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Frederic Weisbecker, Mandeep Singh Baines,
	Ma, Xindong, Michal Hocko, Sameer Nanda, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

On 12/02, Oleg Nesterov wrote:
>
> The lockless while_each_thread() is racy and broken, almost
> every user can loop forever.
> ...
> Another problem with while_each_thread() is that it is very easy
> to use it wrongly, and oom_kill.c is the good example.

Forgot to mention, it is not necessarily safe even under task
list lock if it is used wrongly. Again, oom_kill.c is the good
example, oom_kill_process() does read_lock(&tasklist_lock) but
it doesn't verify that p is still alive.

The new for_each_thread() is much simpler in this respect, it
only needs the stable task_struct.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-02 15:24 [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
                   ` (2 preceding siblings ...)
  2013-12-02 15:34 ` [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
@ 2013-12-03 15:46 ` Sergey Dyasly
  2013-12-03 18:52   ` Oleg Nesterov
  2013-12-03 19:01   ` Oleg Nesterov
  2013-12-03 16:53 ` William Dauchy
  4 siblings, 2 replies; 15+ messages in thread
From: Sergey Dyasly @ 2013-12-03 15:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Tu, Xiaobing, linux-kernel

Hi Oleg,

I was waiting for this one!

On Mon, 2 Dec 2013 16:24:23 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> Hello.
> 
> This was reported several times, I believe the first report is
> http://marc.info/?l=linux-kernel&m=127688978121665. Hmm, 3 years
> ago. The lockless while_each_thread() is racy and broken, almost
> every user can loop forever.
> 
> Recently people started to report they actually hit this problem in
> oom_kill.c. This doesn't really matter and I can be wrong, but in
> fact I do not think they really hit this race, it is very unlikely.

The race is very easy to catch if you have a process with several threads,
all of which allocates memory simultaneously. This leads to:

  1) OOMk selects and sends SIGKILL to one of the threads

  2) another thread invokes OOMk and the first thread gets selected,
     but it gets unhashed before while_each_thread...

> Another problem with while_each_thread() is that it is very easy
> to use it wrongly, and oom_kill.c is the good example.
> 
> I came to conclusion that it is practically impossible to send a
> single series which fixes all problems, too many different users.
> 
> So 1/2 adds the new for_each_thread() interface, and 2/2 fixes oom
> kill as an example.
> 
> We obviously need a lot more changes like 2/2 before we can kill
> while_each_thread() and task_struct->thread_group, but I hope they
> will be straighforward. And in fact I hope that task->thread_group
> can go away before we change all users of while_each_thread().
> 
> David, et al, I din't actually test 2/2, I do not know how. Please
> review, although it looks simple.

The patches look correct and my test case no longer hangs, so

Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>

> 
> Oleg.
> 
>  include/linux/init_task.h |    2 ++
>  include/linux/sched.h     |   12 ++++++++++++
>  kernel/exit.c             |    1 +
>  kernel/fork.c             |    7 +++++++
>  mm/oom_kill.c             |   37 ++++++++++++++++++++-----------------
>  5 files changed, 42 insertions(+), 17 deletions(-)
> 

-- 
Sergey Dyasly <dserrg@gmail.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-02 15:24 [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
                   ` (3 preceding siblings ...)
  2013-12-03 15:46 ` Sergey Dyasly
@ 2013-12-03 16:53 ` William Dauchy
  2013-12-03 20:22   ` Oleg Nesterov
  4 siblings, 1 reply; 15+ messages in thread
From: William Dauchy @ 2013-12-03 16:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel@vger.kernel.org

Hello Oleg,

On Mon, Dec 2, 2013 at 4:24 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> This was reported several times, I believe the first report is
> http://marc.info/?l=linux-kernel&m=127688978121665. Hmm, 3 years
> ago. The lockless while_each_thread() is racy and broken, almost
> every user can loop forever.
>
> Recently people started to report they actually hit this problem in
> oom_kill.c. This doesn't really matter and I can be wrong, but in
> fact I do not think they really hit this race, it is very unlikely.
> Another problem with while_each_thread() is that it is very easy
> to use it wrongly, and oom_kill.c is the good example.
>
> I came to conclusion that it is practically impossible to send a
> single series which fixes all problems, too many different users.
>
> So 1/2 adds the new for_each_thread() interface, and 2/2 fixes oom
> kill as an example.
>
> We obviously need a lot more changes like 2/2 before we can kill
> while_each_thread() and task_struct->thread_group, but I hope they
> will be straighforward. And in fact I hope that task->thread_group
> can go away before we change all users of while_each_thread().
>
> David, et al, I din't actually test 2/2, I do not know how. Please
> review, although it looks simple.

I was wondering if this patch was also targeted for stable branch?
Before this patch, I was testing this one
https://lkml.org/lkml/2013/11/13/336 which is fixing my oom issues.

I applied the two patches on top of a 3.10.x and got some tasks
stalled after the first OOM:

php5-fpm invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
php5-fpm cpuset=VM_X mems_allowed=0-1
CPU: 21 PID: 28256 Comm: php5-fpm Not tainted 3.10.22 #1
Hardware name: Dell Inc. PowerEdge C8220/0TDN55, BIOS 1.1.17 01/09/2013
 ffffc9001e3c9000 ffff881fbacbfbe8 ffffffff81569220 ffff881fbacbfc88
 ffffffff815662f1 ffff882011bb4800 0000000000020000 ffff881fbacbfc28
 ffffffff8156d133 0000000000000206 0000000000000206 ffff881f00000000
Call Trace:
 [<ffffffff81569220>] dump_stack+0x19/0x21
 [<ffffffff815662f1>] dump_header+0x7a/0x26c
 [<ffffffff8156d133>] ? preempt_schedule+0x33/0x50
 [<ffffffff8156e597>] ? _raw_spin_unlock_irqrestore+0x67/0x70
 [<ffffffff8124b026>] ? ___ratelimit+0xa6/0x130
 [<ffffffff810cb2b0>] oom_kill_process+0x2a0/0x440
 [<ffffffff8104e0f0>] ? has_capability+0x20/0x20
 [<ffffffff8111c8bd>] mem_cgroup_oom_synchronize+0x59d/0x5c0
 [<ffffffff8111bbc0>] ? mem_cgroup_charge_common+0xa0/0xa0
 [<ffffffff810cbc03>] pagefault_out_of_memory+0x13/0x90
 [<ffffffff81564022>] mm_fault_error+0xb8/0x19b
 [<ffffffff8102b073>] __do_page_fault+0x543/0x5b0
 [<ffffffff8156c6ac>] ? __schedule+0x3dc/0xd90
 [<ffffffff8107c970>] ? hrtick_update+0x70/0x70
 [<ffffffff8109be27>] ? SyS_futex+0x97/0x2f0
 [<ffffffff8156f21a>] ? retint_swapgs_pax+0xd/0x12
 [<ffffffff81252621>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8102b119>] do_page_fault+0x9/0x20
 [<ffffffff8156f4c8>] page_fault+0x38/0x40
Task in /lxc/VM_X killed as a result of limit of /lxc/VM_X
memory: usage 254644kB, limit 262144kB, failcnt 38594
memory+swap: usage 524288kB, limit 524288kB, failcnt 803
kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
Memory cgroup stats for /lxc/VM_X: cache:196KB rss:252116KB
rss_huge:4096KB mapped_file:164KB swap:274236KB inactive_anon:124788KB
active_anon:124332KB inactive_file:0KB active_file:0KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[16017]     0 16017     4444      336      14       94             0 paas-start
[18020]  5101 18020    64126     2113      57     4645         -1000 mysqld
[19162]  5000 19162    87663      463     122     1396             0 php5-fpm
[19216]  5001 19216    24156      473      51      444             0 apache2
[20257]  5001 20257   176880      656     122     2600             0 apache2
[20353]     0 20353     1023      108       8       21             0 sleep
[27746]  5000 27746    90454     1025     139     4316             0 php5-fpm
[28176]  5000 28176   348410    59568     547    53877             0 php5-fpm
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=15014 jiffies, g=65569, c=65568, q=6537)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=60019 jiffies, g=65569, c=65568, q=15632)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=105024 jiffies, g=65569, c=65568, q=32303)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=150029 jiffies, g=65569, c=65568, q=43552)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 5,
t=195034 jiffies, g=65569, c=65568, q=194557)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=240039 jiffies, g=65569, c=65568, q=205866)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=285044 jiffies, g=65569, c=65568, q=238233)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
t=330049 jiffies, g=65569, c=65568, q=274925)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 5,
t=375054 jiffies, g=65569, c=65568, q=297436)
INFO: Stall ended before state dump start

-- 
William

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-03 15:46 ` Sergey Dyasly
@ 2013-12-03 18:52   ` Oleg Nesterov
  2013-12-03 19:01   ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-03 18:52 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Tu, Xiaobing, linux-kernel

Hi Sergey,

On 12/03, Sergey Dyasly wrote:
>
> I was waiting for this one!

I know ;) Sorry for delay. and thanks for review!

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-02 15:24 ` [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
@ 2013-12-03 18:57   ` Sameer Nanda
  2013-12-03 20:05     ` Oleg Nesterov
  2013-12-04 12:57     ` Oleg Nesterov
  0 siblings, 2 replies; 15+ messages in thread
From: Sameer Nanda @ 2013-12-03 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

Thanks for helping get this fixed!  Couple of comments below.

On Mon, Dec 2, 2013 at 7:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 1. Change oom_kill.c to use for_each_thread() rather than the racy
>    while_each_thread() which can loop forever if we race with exit.
>
>    Note also that most users were buggy even if while_each_thread()
>    was fine, the task can exit even _before_ rcu_read_lock().
>
>    Fortunately the new for_each_thread() only requires the stable
>    task_struct, so this change fixes both problems.
>
> 2. At least out_of_memory() calls has_intersects_mems_allowed()
>    without even rcu_read_lock(), this is obviously buggy.
>
>    Add the necessary rcu_read_lock(). This means that we can not
>    simply return from the loop, we need "bool ret" and "break".
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>

> ---
>  mm/oom_kill.c |   37 ++++++++++++++++++++-----------------
>  1 files changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1e4a600..47dd4ce 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -54,12 +54,14 @@ static DEFINE_SPINLOCK(zone_scan_lock);
>   * shares the same mempolicy nodes as current if it is bound by such a policy
>   * and whether or not it has the same set of allowed cpuset nodes.
>   */
> -static bool has_intersects_mems_allowed(struct task_struct *tsk,
> +static bool has_intersects_mems_allowed(struct task_struct *start,
Comment block needs to be fixed up due to variable name change from
tsk to start.

>                                         const nodemask_t *mask)
>  {
> -       struct task_struct *start = tsk;
> +       struct task_struct *tsk;
> +       bool ret = false;
>
> -       do {
> +       rcu_read_lock();
> +       for_each_thread(start, tsk) {
>                 if (mask) {
>                         /*
>                          * If this is a mempolicy constrained oom, tsk's
> @@ -67,19 +69,20 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>                          * mempolicy intersects current, otherwise it may be
>                          * needlessly killed.
>                          */
> -                       if (mempolicy_nodemask_intersects(tsk, mask))
> -                               return true;
> +                       ret = mempolicy_nodemask_intersects(tsk, mask);
>                 } else {
>                         /*
>                          * This is not a mempolicy constrained oom, so only
>                          * check the mems of tsk's cpuset.
>                          */
> -                       if (cpuset_mems_allowed_intersects(current, tsk))
> -                               return true;
> +                       ret = cpuset_mems_allowed_intersects(current, tsk);
>                 }
> -       } while_each_thread(start, tsk);
> +               if (ret)
> +                       break;
> +       }
> +       rcu_read_unlock();
>
> -       return false;
> +       return ret;
>  }
>  #else
>  static bool has_intersects_mems_allowed(struct task_struct *tsk,
> @@ -97,14 +100,14 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>   */
>  struct task_struct *find_lock_task_mm(struct task_struct *p)
>  {
> -       struct task_struct *t = p;
> +       struct task_struct *t;
>
> -       do {
> +       for_each_thread(p, t) {
>                 task_lock(t);
>                 if (likely(t->mm))
>                         return t;
>                 task_unlock(t);
> -       } while_each_thread(p, t);
> +       }
>
>         return NULL;
>  }
> @@ -301,7 +304,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>         unsigned long chosen_points = 0;
>
>         rcu_read_lock();
> -       do_each_thread(g, p) {
> +       for_each_process_thread(g, p) {
>                 unsigned int points;
>
>                 switch (oom_scan_process_thread(p, totalpages, nodemask,
> @@ -323,7 +326,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>                         chosen = p;
>                         chosen_points = points;
>                 }
> -       } while_each_thread(g, p);
> +       }
>         if (chosen)
>                 get_task_struct(chosen);
>         rcu_read_unlock();
> @@ -406,7 +409,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  {
>         struct task_struct *victim = p;
>         struct task_struct *child;
> -       struct task_struct *t = p;
> +       struct task_struct *t;
>         struct mm_struct *mm;
>         unsigned int victim_points = 0;
>         static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> @@ -437,7 +440,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>          * still freeing memory.
>          */
>         read_lock(&tasklist_lock);
This can be a rcu_read_lock now, I think?

> -       do {
> +       for_each_thread(p, t) {
>                 list_for_each_entry(child, &t->children, sibling) {
>                         unsigned int child_points;
>
> @@ -455,7 +458,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>                                 get_task_struct(victim);
>                         }
>                 }
> -       } while_each_thread(p, t);
> +       }
>         read_unlock(&tasklist_lock);
>
>         rcu_read_lock();
> --
> 1.5.5.1
>



-- 
Sameer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-03 15:46 ` Sergey Dyasly
  2013-12-03 18:52   ` Oleg Nesterov
@ 2013-12-03 19:01   ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-03 19:01 UTC (permalink / raw)
  To: Sergey Dyasly
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Tu, Xiaobing, linux-kernel

argh, sorry, I didn't finish my email...

On 12/03, Sergey Dyasly wrote:
>
> On Mon, 2 Dec 2013 16:24:23 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Recently people started to report they actually hit this problem in
> > oom_kill.c. This doesn't really matter and I can be wrong, but in
> > fact I do not think they really hit this race, it is very unlikely.
>
> The race is very easy to catch if you have a process with several threads,
> all of which allocates memory simultaneously. This leads to:
>
>   1) OOMk selects and sends SIGKILL to one of the threads
>
>   2) another thread invokes OOMk and the first thread gets selected,
>      but it gets unhashed before while_each_thread...

Yes, but this is what I meant.

It was unhashed before even while_each_thread(g), and it should be
never used unless you ensure that g is still alive.

But this doesn't matter. while_each_thread() was buggy anyway. And
(perhaps even more importantly) it was not easy to use it correctly,
so I finally decided to add another helper which only needs the
stable task_struct.

> The patches look correct and my test case no longer hangs, so
>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-02 15:24 ` [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
@ 2013-12-03 19:24   ` Sameer Nanda
  0 siblings, 0 replies; 15+ messages in thread
From: Sameer Nanda @ 2013-12-03 19:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

On Mon, Dec 2, 2013 at 7:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> while_each_thread() and next_thread() should die, almost every
> lockless usage is wrong.
>
> 1. Unless g == current, the lockless while_each_thread() is not safe.
>
>    while_each_thread(g, t) can loop forever if g exits, next_thread()
>    can't reach the unhashed thread in this case. Note that this can
>    happen even if g is the group leader, it can exec.
>
> 2. Even if while_each_thread() itself was correct, people often use
>    it wrongly.
>
>    It was never safe to just take rcu_read_lock() and loop unless
>    you verify that pid_alive(g) == T, even the first next_thread()
>    can point to the already freed/reused memory.
>
> This patch adds signal_struct->thread_head and task->thread_node
> to create the normal rcu-safe list with the stable head. The new
> for_each_thread(g, t) helper is always safe under rcu_read_lock()
> as long as this task_struct can't go away.
>
> Note: of course it is ugly to have both task_struct->thread_node
> and the old task_struct->thread_group, we will kill it later, after
> we change the users of while_each_thread() to use for_each_thread().
>
> Perhaps we can kill it even before we convert all users, we can
> reimplement next_thread(t) using the new thread_head/thread_node.
> But we can't do this right now because this will lead to subtle
> behavioural changes. For example, do/while_each_thread() always
> sees at least one task, while for_each_thread() can do nothing if
> the whole thread group has died. Or thread_group_empty(), currently
> its semantics is not clear unless thread_group_leader(p) and we
> need to audit the callers before we can change it.
>
> So this patch adds the new interface which has to coexist with the
> old one for some time, hopefully the next changes will be more or
> less straightforward and the old one will go away soon.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>

> ---
>  include/linux/init_task.h |    2 ++
>  include/linux/sched.h     |   12 ++++++++++++
>  kernel/exit.c             |    1 +
>  kernel/fork.c             |    7 +++++++
>  4 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index b0ed422..668aae0 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -40,6 +40,7 @@ extern struct fs_struct init_fs;
>
>  #define INIT_SIGNALS(sig) {                                            \
>         .nr_threads     = 1,                                            \
> +       .thread_head    = LIST_HEAD_INIT(init_task.thread_node),        \
>         .wait_chldexit  = __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
>         .shared_pending = {                                             \
>                 .list = LIST_HEAD_INIT(sig.shared_pending.list),        \
> @@ -213,6 +214,7 @@ extern struct task_group root_task_group;
>                 [PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),            \
>         },                                                              \
>         .thread_group   = LIST_HEAD_INIT(tsk.thread_group),             \
> +       .thread_node    = LIST_HEAD_INIT(init_signals.thread_head),     \
>         INIT_IDS                                                        \
>         INIT_PERF_EVENTS(tsk)                                           \
>         INIT_TRACE_IRQFLAGS                                             \
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index dc48056..0550083 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -487,6 +487,7 @@ struct signal_struct {
>         atomic_t                sigcnt;
>         atomic_t                live;
>         int                     nr_threads;
> +       struct list_head        thread_head;
>
>         wait_queue_head_t       wait_chldexit;  /* for wait4() */
>
> @@ -1162,6 +1163,7 @@ struct task_struct {
>         /* PID/PID hash table linkage. */
>         struct pid_link pids[PIDTYPE_MAX];
>         struct list_head thread_group;
> +       struct list_head thread_node;
>
>         struct completion *vfork_done;          /* for vfork() */
>         int __user *set_child_tid;              /* CLONE_CHILD_SETTID */
> @@ -2225,6 +2227,16 @@ extern bool current_is_single_threaded(void);
>  #define while_each_thread(g, t) \
>         while ((t = next_thread(t)) != g)
>
> +#define __for_each_thread(signal, t)   \
> +       list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
> +
> +#define for_each_thread(p, t)          \
> +       __for_each_thread((p)->signal, t)
> +
> +/* Careful: this is a double loop, 'break' won't work as expected. */
> +#define for_each_process_thread(p, t)  \
> +       for_each_process(p) for_each_thread(p, t)
> +
>  static inline int get_nr_threads(struct task_struct *tsk)
>  {
>         return tsk->signal->nr_threads;
> diff --git a/kernel/exit.c b/kernel/exit.c
> index a949819..1e77fc6 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -74,6 +74,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
>                 __this_cpu_dec(process_counts);
>         }
>         list_del_rcu(&p->thread_group);
> +       list_del_rcu(&p->thread_node);
>  }
>
>  /*
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9be5154..b84bef7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1034,6 +1034,11 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
>         sig->nr_threads = 1;
>         atomic_set(&sig->live, 1);
>         atomic_set(&sig->sigcnt, 1);
> +
> +       /* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
> +       sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
> +       tsk->thread_node = (struct list_head)LIST_HEAD_INIT(sig->thread_head);
> +
>         init_waitqueue_head(&sig->wait_chldexit);
>         sig->curr_target = tsk;
>         init_sigpending(&sig->shared_pending);
> @@ -1470,6 +1475,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>                         atomic_inc(&current->signal->sigcnt);
>                         list_add_tail_rcu(&p->thread_group,
>                                           &p->group_leader->thread_group);
> +                       list_add_tail_rcu(&p->thread_node,
> +                                         &p->signal->thread_head);
>                 }
>                 attach_pid(p, PIDTYPE_PID);
>                 nr_threads++;
> --
> 1.5.5.1
>



-- 
Sameer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-03 18:57   ` Sameer Nanda
@ 2013-12-03 20:05     ` Oleg Nesterov
  2013-12-03 20:50       ` Oleg Nesterov
  2013-12-04 12:57     ` Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-03 20:05 UTC (permalink / raw)
  To: Sameer Nanda
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

On 12/03, Sameer Nanda wrote:
>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>

Thanks!

> > -static bool has_intersects_mems_allowed(struct task_struct *tsk,
> > +static bool has_intersects_mems_allowed(struct task_struct *start,
> Comment block needs to be fixed up due to variable name change from
> tsk to start.

thanks again, I'll update the patch.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-03 16:53 ` William Dauchy
@ 2013-12-03 20:22   ` Oleg Nesterov
  2013-12-03 20:28     ` William Dauchy
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-03 20:22 UTC (permalink / raw)
  To: William Dauchy
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel@vger.kernel.org

Hi William,

On 12/03, William Dauchy wrote:
>
> I was wondering if this patch was also targeted for stable branch?

Unlikely... but we will see.

> Before this patch, I was testing this one
> https://lkml.org/lkml/2013/11/13/336

perhaps this patch makes more sense for stable.

But, to clarify just in case, it is not needed after this series.

> which is fixing my oom issues.

Yes, but it doesn't fix all problems even in mm/oom_kill.c, and
we need to fix while_each_thread() anyway.

> I applied the two patches on top of a 3.10.x and got some tasks
> stalled after the first OOM:

So you are saying that this was introduced by this series?

Could you retest with the recent kernel?

> INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 21,
> t=15014 jiffies, g=65569, c=65568, q=6537)

This series does not expand the rcu-locked sections except: it adds
rcu_read_lock() into has_intersects_mems_allowed() but this is the
obvious bugfix.

So far I _think_ that this series should not be blamed for that, but
I'll try to recheck.

Oleg.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] initial while_each_thread() fixes
  2013-12-03 20:22   ` Oleg Nesterov
@ 2013-12-03 20:28     ` William Dauchy
  0 siblings, 0 replies; 15+ messages in thread
From: William Dauchy @ 2013-12-03 20:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel@vger.kernel.org

On Tue, Dec 3, 2013 at 9:22 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>> Before this patch, I was testing this one
>> https://lkml.org/lkml/2013/11/13/336
>
> perhaps this patch makes more sense for stable.

I guess we should consider it.

> But, to clarify just in case, it is not needed after this series.
>
>> which is fixing my oom issues.
>
> Yes, but it doesn't fix all problems even in mm/oom_kill.c, and
> we need to fix while_each_thread() anyway.
>
>> I applied the two patches on top of a 3.10.x and got some tasks
>> stalled after the first OOM:
>
> So you are saying that this was introduced by this series?

yes, without thse two patches I can't reproduce my issue.

> Could you retest with the recent kernel?

I'll see if I find time for it; will keep you up to date.

Thanks,

-- 
William

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-03 20:05     ` Oleg Nesterov
@ 2013-12-03 20:50       ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-03 20:50 UTC (permalink / raw)
  To: Sameer Nanda
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

On 12/03, Oleg Nesterov wrote:
>
> > > -static bool has_intersects_mems_allowed(struct task_struct *tsk,
> > > +static bool has_intersects_mems_allowed(struct task_struct *start,
> > Comment block needs to be fixed up due to variable name change from
> > tsk to start.
>
> thanks again, I'll update the patch.

Yes, I'll redo and resend.

I decided to split it into 2 changes, "add rcu_read_lock() into
has_intersects_mems_allowed()" should come as a separate change.

Because it seems that oom_kill.c needs more rcu_read_lock's, at
least find_lock_task_mm() can be called without rcu lock.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-03 18:57   ` Sameer Nanda
  2013-12-03 20:05     ` Oleg Nesterov
@ 2013-12-04 12:57     ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-12-04 12:57 UTC (permalink / raw)
  To: Sameer Nanda
  Cc: Andrew Morton, David Rientjes, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sergey Dyasly,
	Tu, Xiaobing, linux-kernel

Sameer, I didn't notice this part yesterday.

On 12/03, Sameer Nanda wrote:
>
> > @@ -406,7 +409,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> >  {
> >         struct task_struct *victim = p;
> >         struct task_struct *child;
> > -       struct task_struct *t = p;
> > +       struct task_struct *t;
> >         struct mm_struct *mm;
> >         unsigned int victim_points = 0;
> >         static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > @@ -437,7 +440,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> >          * still freeing memory.
> >          */
> >         read_lock(&tasklist_lock);
> This can be a rcu_read_lock now, I think?

No, we need tasklist for list_for_each_entry(t->children), it
is not rcu-safe.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-12-04 12:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-02 15:24 [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
2013-12-02 15:24 ` [PATCH 1/2] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
2013-12-03 19:24   ` Sameer Nanda
2013-12-02 15:24 ` [PATCH 2/2] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
2013-12-03 18:57   ` Sameer Nanda
2013-12-03 20:05     ` Oleg Nesterov
2013-12-03 20:50       ` Oleg Nesterov
2013-12-04 12:57     ` Oleg Nesterov
2013-12-02 15:34 ` [PATCH 0/2] initial while_each_thread() fixes Oleg Nesterov
2013-12-03 15:46 ` Sergey Dyasly
2013-12-03 18:52   ` Oleg Nesterov
2013-12-03 19:01   ` Oleg Nesterov
2013-12-03 16:53 ` William Dauchy
2013-12-03 20:22   ` Oleg Nesterov
2013-12-03 20:28     ` William Dauchy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox