public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] initial while_each_thread() fixes
@ 2013-12-04 13:03 Oleg Nesterov
  2013-12-04 13:04 ` [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 13:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

Hello.

Changes: split the change in mm/oom_kill.c + update the comment above
has_intersects_mems_allowed().

Sergey, Sameer, the resulting code is the same after 1-3, I preserved
your acks.

4/4 is new.

Andrew, just in case, 3/4 and 4/4 try to fix the unrelated bugs, this
has nothing to do with while_each_thread/for_each_thread changes.

To remind, this is the first series which changes the callers of the
buggy while_each_thread(), I'll send more.

Oleg.

 include/linux/init_task.h |    2 +
 include/linux/sched.h     |   12 ++++++++++
 kernel/exit.c             |    1 +
 kernel/fork.c             |    7 ++++++
 mm/oom_kill.c             |   51 +++++++++++++++++++++++++-------------------
 5 files changed, 51 insertions(+), 22 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-04 13:03 [PATCH v2 0/4] initial while_each_thread() fixes Oleg Nesterov
@ 2013-12-04 13:04 ` Oleg Nesterov
  2013-12-04 13:28   ` Frederic Weisbecker
  2013-12-05  0:58   ` David Rientjes
  2013-12-04 13:04 ` [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 13:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

while_each_thread() and next_thread() should die, almost every
lockless usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

   while_each_thread(g, t) can loop forever if g exits, next_thread()
   can't reach the unhashed thread in this case. Note that this can
   happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
   it wrongly.

   It was never safe to just take rcu_read_lock() and loop unless
   you verify that pid_alive(g) == T, even the first next_thread()
   can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node
to create the normal rcu-safe list with the stable head. The new
for_each_thread(g, t) helper is always safe under rcu_read_lock()
as long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node
and the old task_struct->thread_group, we will kill it later, after
we change the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node.
But we can't do this right now because this will lead to subtle
behavioural changes. For example, do/while_each_thread() always
sees at least one task, while for_each_thread() can do nothing if
the whole thread group has died. Or thread_group_empty(), currently
its semantics is not clear unless thread_group_leader(p) and we
need to audit the callers before we can change it.

So this patch adds the new interface which has to coexist with the
old one for some time, hopefully the next changes will be more or
less straightforward and the old one will go away soon.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
---
 include/linux/init_task.h |    2 ++
 include/linux/sched.h     |   12 ++++++++++++
 kernel/exit.c             |    1 +
 kernel/fork.c             |    7 +++++++
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index b0ed422..668aae0 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -40,6 +40,7 @@ extern struct fs_struct init_fs;
 
 #define INIT_SIGNALS(sig) {						\
 	.nr_threads	= 1,						\
+	.thread_head	= LIST_HEAD_INIT(init_task.thread_node),	\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
 	.shared_pending	= { 						\
 		.list = LIST_HEAD_INIT(sig.shared_pending.list),	\
@@ -213,6 +214,7 @@ extern struct task_group root_task_group;
 		[PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),		\
 	},								\
 	.thread_group	= LIST_HEAD_INIT(tsk.thread_group),		\
+	.thread_node	= LIST_HEAD_INIT(init_signals.thread_head),	\
 	INIT_IDS							\
 	INIT_PERF_EVENTS(tsk)						\
 	INIT_TRACE_IRQFLAGS						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dc48056..0550083 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -487,6 +487,7 @@ struct signal_struct {
 	atomic_t		sigcnt;
 	atomic_t		live;
 	int			nr_threads;
+	struct list_head	thread_head;
 
 	wait_queue_head_t	wait_chldexit;	/* for wait4() */
 
@@ -1162,6 +1163,7 @@ struct task_struct {
 	/* PID/PID hash table linkage. */
 	struct pid_link pids[PIDTYPE_MAX];
 	struct list_head thread_group;
+	struct list_head thread_node;
 
 	struct completion *vfork_done;		/* for vfork() */
 	int __user *set_child_tid;		/* CLONE_CHILD_SETTID */
@@ -2225,6 +2227,16 @@ extern bool current_is_single_threaded(void);
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)
 
+#define __for_each_thread(signal, t)	\
+	list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
+
+#define for_each_thread(p, t)		\
+	__for_each_thread((p)->signal, t)
+
+/* Careful: this is a double loop, 'break' won't work as expected. */
+#define for_each_process_thread(p, t)	\
+	for_each_process(p) for_each_thread(p, t)
+
 static inline int get_nr_threads(struct task_struct *tsk)
 {
 	return tsk->signal->nr_threads;
diff --git a/kernel/exit.c b/kernel/exit.c
index a949819..1e77fc6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -74,6 +74,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
 		__this_cpu_dec(process_counts);
 	}
 	list_del_rcu(&p->thread_group);
+	list_del_rcu(&p->thread_node);
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 9be5154..b84bef7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1034,6 +1034,11 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	sig->nr_threads = 1;
 	atomic_set(&sig->live, 1);
 	atomic_set(&sig->sigcnt, 1);
+
+	/* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
+	sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
+	tsk->thread_node = (struct list_head)LIST_HEAD_INIT(sig->thread_head);
+
 	init_waitqueue_head(&sig->wait_chldexit);
 	sig->curr_target = tsk;
 	init_sigpending(&sig->shared_pending);
@@ -1470,6 +1475,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 			atomic_inc(&current->signal->sigcnt);
 			list_add_tail_rcu(&p->thread_group,
 					  &p->group_leader->thread_group);
+			list_add_tail_rcu(&p->thread_node,
+					  &p->signal->thread_head);
 		}
 		attach_pid(p, PIDTYPE_PID);
 		nr_threads++;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-04 13:03 [PATCH v2 0/4] initial while_each_thread() fixes Oleg Nesterov
  2013-12-04 13:04 ` [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
@ 2013-12-04 13:04 ` Oleg Nesterov
  2013-12-04 15:37   ` Michal Hocko
  2013-12-05  0:39   ` David Rientjes
  2013-12-04 13:04 ` [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() Oleg Nesterov
  2013-12-04 13:04 ` [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm() Oleg Nesterov
  3 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 13:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

Change oom_kill.c to use for_each_thread() rather than the racy
while_each_thread() which can loop forever if we race with exit.

Note also that most users were buggy even if while_each_thread()
was fine, the task can exit even _before_ rcu_read_lock().

Fortunately the new for_each_thread() only requires the stable
task_struct, so this change fixes both problems.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
---
 mm/oom_kill.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e4a600..96d7945 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -59,7 +59,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 {
 	struct task_struct *start = tsk;
 
-	do {
+	for_each_thread(start, tsk) {
 		if (mask) {
 			/*
 			 * If this is a mempolicy constrained oom, tsk's
@@ -77,7 +77,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 			if (cpuset_mems_allowed_intersects(current, tsk))
 				return true;
 		}
-	} while_each_thread(start, tsk);
+	}
 
 	return false;
 }
@@ -97,14 +97,14 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
  */
 struct task_struct *find_lock_task_mm(struct task_struct *p)
 {
-	struct task_struct *t = p;
+	struct task_struct *t;
 
-	do {
+	for_each_thread(p, t) {
 		task_lock(t);
 		if (likely(t->mm))
 			return t;
 		task_unlock(t);
-	} while_each_thread(p, t);
+	}
 
 	return NULL;
 }
@@ -301,7 +301,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	unsigned long chosen_points = 0;
 
 	rcu_read_lock();
-	do_each_thread(g, p) {
+	for_each_process_thread(g, p) {
 		unsigned int points;
 
 		switch (oom_scan_process_thread(p, totalpages, nodemask,
@@ -323,7 +323,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			chosen = p;
 			chosen_points = points;
 		}
-	} while_each_thread(g, p);
+	}
 	if (chosen)
 		get_task_struct(chosen);
 	rcu_read_unlock();
@@ -406,7 +406,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
-	struct task_struct *t = p;
+	struct task_struct *t;
 	struct mm_struct *mm;
 	unsigned int victim_points = 0;
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
@@ -437,7 +437,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * still freeing memory.
 	 */
 	read_lock(&tasklist_lock);
-	do {
+	for_each_thread(p, t) {
 		list_for_each_entry(child, &t->children, sibling) {
 			unsigned int child_points;
 
@@ -455,7 +455,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 				get_task_struct(victim);
 			}
 		}
-	} while_each_thread(p, t);
+	}
 	read_unlock(&tasklist_lock);
 
 	rcu_read_lock();
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  2013-12-04 13:03 [PATCH v2 0/4] initial while_each_thread() fixes Oleg Nesterov
  2013-12-04 13:04 ` [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
  2013-12-04 13:04 ` [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
@ 2013-12-04 13:04 ` Oleg Nesterov
  2013-12-04 15:37   ` Michal Hocko
  2013-12-05  0:41   ` David Rientjes
  2013-12-04 13:04 ` [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm() Oleg Nesterov
  3 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 13:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

At least out_of_memory() calls has_intersects_mems_allowed()
without even rcu_read_lock(), this is obviously buggy.

Add the necessary rcu_read_lock(). This means that we can not
simply return from the loop, we need "bool ret" and "break".

While at it, swap the names of task_struct's (the argument and
the local). This cleanups the code a little bit and avoids the
unnecessary initialization.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
---
 mm/oom_kill.c |   19 +++++++++++--------
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 96d7945..0d8ad1e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -47,18 +47,20 @@ static DEFINE_SPINLOCK(zone_scan_lock);
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
- * @tsk: task struct of which task to consider
+ * @start: task struct of which task to consider
  * @mask: nodemask passed to page allocator for mempolicy ooms
  *
  * Task eligibility is determined by whether or not a candidate task, @tsk,
  * shares the same mempolicy nodes as current if it is bound by such a policy
  * and whether or not it has the same set of allowed cpuset nodes.
  */
-static bool has_intersects_mems_allowed(struct task_struct *tsk,
+static bool has_intersects_mems_allowed(struct task_struct *start,
 					const nodemask_t *mask)
 {
-	struct task_struct *start = tsk;
+	struct task_struct *tsk;
+	bool ret = false;
 
+	rcu_read_lock();
 	for_each_thread(start, tsk) {
 		if (mask) {
 			/*
@@ -67,19 +69,20 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 			 * mempolicy intersects current, otherwise it may be
 			 * needlessly killed.
 			 */
-			if (mempolicy_nodemask_intersects(tsk, mask))
-				return true;
+			ret = mempolicy_nodemask_intersects(tsk, mask);
 		} else {
 			/*
 			 * This is not a mempolicy constrained oom, so only
 			 * check the mems of tsk's cpuset.
 			 */
-			if (cpuset_mems_allowed_intersects(current, tsk))
-				return true;
+			ret = cpuset_mems_allowed_intersects(current, tsk);
 		}
+		if (ret)
+			break;
 	}
+	rcu_read_unlock();
 
-	return false;
+	return ret;
 }
 #else
 static bool has_intersects_mems_allowed(struct task_struct *tsk,
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm()
  2013-12-04 13:03 [PATCH v2 0/4] initial while_each_thread() fixes Oleg Nesterov
                   ` (2 preceding siblings ...)
  2013-12-04 13:04 ` [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() Oleg Nesterov
@ 2013-12-04 13:04 ` Oleg Nesterov
  2013-12-04 15:40   ` Michal Hocko
  2013-12-05  0:42   ` David Rientjes
  3 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 13:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

find_lock_task_mm() expects it is called under rcu or tasklist lock,
but it seems that at least oom_unkillable_task()->task_in_mem_cgroup()
and mem_cgroup_out_of_memory()->oom_badness() can call it lockless.

Perhaps we could fix the callers, but this patch simply adds rcu lock
into find_lock_task_mm(). This also allows to simplify a bit one of its
callers, oom_kill_process().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 mm/oom_kill.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0d8ad1e..054ff47 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -102,14 +102,19 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
 {
 	struct task_struct *t;
 
+	rcu_read_lock();
+
 	for_each_thread(p, t) {
 		task_lock(t);
 		if (likely(t->mm))
-			return t;
+			goto found;
 		task_unlock(t);
 	}
+	t = NULL;
+found:
+	rcu_read_unlock();
 
-	return NULL;
+	return t;
 }
 
 /* return true if the task is not adequate as candidate victim task. */
@@ -461,10 +466,8 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}
 	read_unlock(&tasklist_lock);
 
-	rcu_read_lock();
 	p = find_lock_task_mm(victim);
 	if (!p) {
-		rcu_read_unlock();
 		put_task_struct(victim);
 		return;
 	} else if (victim != p) {
@@ -490,6 +493,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * That thread will now get access to memory reserves since it has a
 	 * pending fatal signal.
 	 */
+	rcu_read_lock();
 	for_each_process(p)
 		if (p->mm == mm && !same_thread_group(p, victim) &&
 		    !(p->flags & PF_KTHREAD)) {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-04 13:04 ` [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
@ 2013-12-04 13:28   ` Frederic Weisbecker
  2013-12-04 13:49     ` Oleg Nesterov
  2013-12-05  0:58   ` David Rientjes
  1 sibling, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2013-12-04 13:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed, Dec 04, 2013 at 02:04:09PM +0100, Oleg Nesterov wrote:
> while_each_thread() and next_thread() should die, almost every
> lockless usage is wrong.
> 
> 1. Unless g == current, the lockless while_each_thread() is not safe.
> 
>    while_each_thread(g, t) can loop forever if g exits, next_thread()
>    can't reach the unhashed thread in this case. Note that this can
>    happen even if g is the group leader, it can exec.
> 
> 2. Even if while_each_thread() itself was correct, people often use
>    it wrongly.
> 
>    It was never safe to just take rcu_read_lock() and loop unless
>    you verify that pid_alive(g) == T, even the first next_thread()
>    can point to the already freed/reused memory.
> 
> This patch adds signal_struct->thread_head and task->thread_node
> to create the normal rcu-safe list with the stable head. The new
> for_each_thread(g, t) helper is always safe under rcu_read_lock()
> as long as this task_struct can't go away.

Thanks, it looks indeed much saner to put the head in the signal struct!

> 
> Note: of course it is ugly to have both task_struct->thread_node
> and the old task_struct->thread_group, we will kill it later, after
> we change the users of while_each_thread() to use for_each_thread().
> 
> Perhaps we can kill it even before we convert all users, we can
> reimplement next_thread(t) using the new thread_head/thread_node.
> But we can't do this right now because this will lead to subtle
> behavioural changes. For example, do/while_each_thread() always
> sees at least one task, while for_each_thread() can do nothing if
> the whole thread group has died.

Would it be safe to have for_each_thread_continue() instead?

> Or thread_group_empty(), currently
> its semantics is not clear unless thread_group_leader(p) and we
> need to audit the callers before we can change it.
> 
> So this patch adds the new interface which has to coexist with the
> old one for some time, hopefully the next changes will be more or
> less straightforward and the old one will go away soon.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>
> ---

Yeah if the conversion needs careful audit, it makes sense to switch incrementally.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-04 13:28   ` Frederic Weisbecker
@ 2013-12-04 13:49     ` Oleg Nesterov
  2013-12-04 14:17       ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 13:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On 12/04, Frederic Weisbecker wrote:
>
> On Wed, Dec 04, 2013 at 02:04:09PM +0100, Oleg Nesterov wrote:
>
> > For example, do/while_each_thread() always
> > sees at least one task, while for_each_thread() can do nothing if
> > the whole thread group has died.
>
> Would it be safe to have for_each_thread_continue() instead?

Yes, and no.

Yes, perhaps we will need for_each_thread_continue(). I am not sure
yet. And note that, say, check_hung_uninterruptible_tasks() already
does _continue if fact, although it is still not clear to me if we
actually need this helper.

But no, _continue() can't help if the whole thread group has died,
we simply can not continue.

Note also that _continue() can't be safely used lockless, unless
you verify pid_alive() or something similar.

And,

> Yeah if the conversion needs careful audit, it makes sense to switch incrementally.

Yes. For example the case above. If someone does

	do
		do_something(t);
	while_each_thread(g, t);

we should check that it can tolerate the case when do_something()
won't be called at all, or ensure that this is not possible.

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-04 13:49     ` Oleg Nesterov
@ 2013-12-04 14:17       ` Frederic Weisbecker
  2013-12-04 15:27         ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2013-12-04 14:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed, Dec 04, 2013 at 02:49:17PM +0100, Oleg Nesterov wrote:
> On 12/04, Frederic Weisbecker wrote:
> >
> > On Wed, Dec 04, 2013 at 02:04:09PM +0100, Oleg Nesterov wrote:
> >
> > > For example, do/while_each_thread() always
> > > sees at least one task, while for_each_thread() can do nothing if
> > > the whole thread group has died.
> >
> > Would it be safe to have for_each_thread_continue() instead?
> 
> Yes, and no.
> 
> Yes, perhaps we will need for_each_thread_continue(). I am not sure
> yet. And note that, say, check_hung_uninterruptible_tasks() already
> does _continue if fact, although it is still not clear to me if we
> actually need this helper.

So that's one of the possible users. _continue() can make sense if the
reader can easily cope with missing a few threads from time to time, which
is the case of the hung task detector.

> 
> But no, _continue() can't help if the whole thread group has died,
> we simply can not continue.

Right, but if the whole group has died, the list is empty anyway. I mean
pure rcu walking requires the user to tolerate the miss of some concurrent
updates anyway.

> 
> Note also that _continue() can't be safely used lockless, unless
> you verify pid_alive() or something similar.

Hmm, due to concurrent list_del()?

Right, tsk->thread_list.next could point to junk after a list_del(), say if the next
entry has been freed.

> 
> And,
> 
> > Yeah if the conversion needs careful audit, it makes sense to switch incrementally.
> 
> Yes. For example the case above. If someone does
> 
> 	do
> 		do_something(t);
> 	while_each_thread(g, t);
> 
> we should check that it can tolerate the case when do_something()
> won't be called at all, or ensure that this is not possible.

Right!

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-04 14:17       ` Frederic Weisbecker
@ 2013-12-04 15:27         ` Oleg Nesterov
  0 siblings, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-04 15:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On 12/04, Frederic Weisbecker wrote:
>
> On Wed, Dec 04, 2013 at 02:49:17PM +0100, Oleg Nesterov wrote:
> >
> > Yes, perhaps we will need for_each_thread_continue(). I am not sure
> > yet. And note that, say, check_hung_uninterruptible_tasks() already
> > does _continue if fact, although it is still not clear to me if we
> > actually need this helper.
>
> So that's one of the possible users. _continue() can make sense if the
> reader can easily cope with missing a few threads from time to time, which
> is the case of the hung task detector.

Yes, but again it is not clear if we need the new helper.

For example. Note that you can simply do something like:

	// p can't go away

	rcu_read_lock();
	for_each_thread(p, t) {
		do_something(t);

		if (need_to_sleep()) {
			get_task_struct(t);
			rcu_read_unlock();

			schedule_timeout_interruptible(...);

			rcu_read_lock();
			put_task_struct();
			if (!pid_alive(t))
				break;
		}
	}
	rcu_read_unlock();

If you rewrite this code using for_each_thread_continue (which is just
list_for_each_entry_continue_rcu) the code will look more complex.

> > Note also that _continue() can't be safely used lockless, unless
> > you verify pid_alive() or something similar.
>
> Hmm, due to concurrent list_del()?
>
> Right, tsk->thread_list.next could point to junk after a list_del(), say if the next
> entry has been freed.

Yes. The same problem which while_each_thread() currently has (I mean,
even ignoring the fact it is itself buggy).

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-04 13:04 ` [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
@ 2013-12-04 15:37   ` Michal Hocko
  2013-12-05  0:39   ` David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2013-12-04 15:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Frederic Weisbecker, Mandeep Singh Baines, Ma, Xindong,
	Sameer Nanda, Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed 04-12-13 14:04:12, Oleg Nesterov wrote:
> Change oom_kill.c to use for_each_thread() rather than the racy
> while_each_thread() which can loop forever if we race with exit.
> 
> Note also that most users were buggy even if while_each_thread()
> was fine, the task can exit even _before_ rcu_read_lock().
> 
> Fortunately the new for_each_thread() only requires the stable
> task_struct, so this change fixes both problems.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>

Looks good to me
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!
> ---
>  mm/oom_kill.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1e4a600..96d7945 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -59,7 +59,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>  {
>  	struct task_struct *start = tsk;
>  
> -	do {
> +	for_each_thread(start, tsk) {
>  		if (mask) {
>  			/*
>  			 * If this is a mempolicy constrained oom, tsk's
> @@ -77,7 +77,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>  			if (cpuset_mems_allowed_intersects(current, tsk))
>  				return true;
>  		}
> -	} while_each_thread(start, tsk);
> +	}
>  
>  	return false;
>  }
> @@ -97,14 +97,14 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>   */
>  struct task_struct *find_lock_task_mm(struct task_struct *p)
>  {
> -	struct task_struct *t = p;
> +	struct task_struct *t;
>  
> -	do {
> +	for_each_thread(p, t) {
>  		task_lock(t);
>  		if (likely(t->mm))
>  			return t;
>  		task_unlock(t);
> -	} while_each_thread(p, t);
> +	}
>  
>  	return NULL;
>  }
> @@ -301,7 +301,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  	unsigned long chosen_points = 0;
>  
>  	rcu_read_lock();
> -	do_each_thread(g, p) {
> +	for_each_process_thread(g, p) {
>  		unsigned int points;
>  
>  		switch (oom_scan_process_thread(p, totalpages, nodemask,
> @@ -323,7 +323,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  			chosen = p;
>  			chosen_points = points;
>  		}
> -	} while_each_thread(g, p);
> +	}
>  	if (chosen)
>  		get_task_struct(chosen);
>  	rcu_read_unlock();
> @@ -406,7 +406,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  {
>  	struct task_struct *victim = p;
>  	struct task_struct *child;
> -	struct task_struct *t = p;
> +	struct task_struct *t;
>  	struct mm_struct *mm;
>  	unsigned int victim_points = 0;
>  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> @@ -437,7 +437,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	 * still freeing memory.
>  	 */
>  	read_lock(&tasklist_lock);
> -	do {
> +	for_each_thread(p, t) {
>  		list_for_each_entry(child, &t->children, sibling) {
>  			unsigned int child_points;
>  
> @@ -455,7 +455,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  				get_task_struct(victim);
>  			}
>  		}
> -	} while_each_thread(p, t);
> +	}
>  	read_unlock(&tasklist_lock);
>  
>  	rcu_read_lock();
> -- 
> 1.5.5.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  2013-12-04 13:04 ` [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() Oleg Nesterov
@ 2013-12-04 15:37   ` Michal Hocko
  2013-12-05  0:41   ` David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2013-12-04 15:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Frederic Weisbecker, Mandeep Singh Baines, Ma, Xindong,
	Sameer Nanda, Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed 04-12-13 14:04:16, Oleg Nesterov wrote:
> At least out_of_memory() calls has_intersects_mems_allowed()
> without even rcu_read_lock(), this is obviously buggy.
> 
> Add the necessary rcu_read_lock(). This means that we can not
> simply return from the loop, we need "bool ret" and "break".
> 
> While at it, swap the names of task_struct's (the argument and
> the local). This cleanups the code a little bit and avoids the
> unnecessary initialization.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!
> ---
>  mm/oom_kill.c |   19 +++++++++++--------
>  1 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 96d7945..0d8ad1e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -47,18 +47,20 @@ static DEFINE_SPINLOCK(zone_scan_lock);
>  #ifdef CONFIG_NUMA
>  /**
>   * has_intersects_mems_allowed() - check task eligiblity for kill
> - * @tsk: task struct of which task to consider
> + * @start: task struct of which task to consider
>   * @mask: nodemask passed to page allocator for mempolicy ooms
>   *
>   * Task eligibility is determined by whether or not a candidate task, @tsk,
>   * shares the same mempolicy nodes as current if it is bound by such a policy
>   * and whether or not it has the same set of allowed cpuset nodes.
>   */
> -static bool has_intersects_mems_allowed(struct task_struct *tsk,
> +static bool has_intersects_mems_allowed(struct task_struct *start,
>  					const nodemask_t *mask)
>  {
> -	struct task_struct *start = tsk;
> +	struct task_struct *tsk;
> +	bool ret = false;
>  
> +	rcu_read_lock();
>  	for_each_thread(start, tsk) {
>  		if (mask) {
>  			/*
> @@ -67,19 +69,20 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>  			 * mempolicy intersects current, otherwise it may be
>  			 * needlessly killed.
>  			 */
> -			if (mempolicy_nodemask_intersects(tsk, mask))
> -				return true;
> +			ret = mempolicy_nodemask_intersects(tsk, mask);
>  		} else {
>  			/*
>  			 * This is not a mempolicy constrained oom, so only
>  			 * check the mems of tsk's cpuset.
>  			 */
> -			if (cpuset_mems_allowed_intersects(current, tsk))
> -				return true;
> +			ret = cpuset_mems_allowed_intersects(current, tsk);
>  		}
> +		if (ret)
> +			break;
>  	}
> +	rcu_read_unlock();
>  
> -	return false;
> +	return ret;
>  }
>  #else
>  static bool has_intersects_mems_allowed(struct task_struct *tsk,
> -- 
> 1.5.5.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm()
  2013-12-04 13:04 ` [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm() Oleg Nesterov
@ 2013-12-04 15:40   ` Michal Hocko
  2013-12-05  0:42   ` David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2013-12-04 15:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Eric W. Biederman,
	Frederic Weisbecker, Mandeep Singh Baines, Ma, Xindong,
	Sameer Nanda, Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed 04-12-13 14:04:20, Oleg Nesterov wrote:
> find_lock_task_mm() expects it is called under rcu or tasklist lock,
> but it seems that at least oom_unkillable_task()->task_in_mem_cgroup()
> and mem_cgroup_out_of_memory()->oom_badness() can call it lockless.
> 
> Perhaps we could fix the callers, but this patch simply adds rcu lock
> into find_lock_task_mm(). This also allows to simplify a bit one of its
> callers, oom_kill_process().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!
> ---
>  mm/oom_kill.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 0d8ad1e..054ff47 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -102,14 +102,19 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
>  {
>  	struct task_struct *t;
>  
> +	rcu_read_lock();
> +
>  	for_each_thread(p, t) {
>  		task_lock(t);
>  		if (likely(t->mm))
> -			return t;
> +			goto found;
>  		task_unlock(t);
>  	}
> +	t = NULL;
> +found:
> +	rcu_read_unlock();
>  
> -	return NULL;
> +	return t;
>  }
>  
>  /* return true if the task is not adequate as candidate victim task. */
> @@ -461,10 +466,8 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	rcu_read_lock();
>  	p = find_lock_task_mm(victim);
>  	if (!p) {
> -		rcu_read_unlock();
>  		put_task_struct(victim);
>  		return;
>  	} else if (victim != p) {
> @@ -490,6 +493,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	 * That thread will now get access to memory reserves since it has a
>  	 * pending fatal signal.
>  	 */
> +	rcu_read_lock();
>  	for_each_process(p)
>  		if (p->mm == mm && !same_thread_group(p, victim) &&
>  		    !(p->flags & PF_KTHREAD)) {
> -- 
> 1.5.5.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread()
  2013-12-04 13:04 ` [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
  2013-12-04 15:37   ` Michal Hocko
@ 2013-12-05  0:39   ` David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: David Rientjes @ 2013-12-05  0:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed, 4 Dec 2013, Oleg Nesterov wrote:

> Change oom_kill.c to use for_each_thread() rather than the racy
> while_each_thread() which can loop forever if we race with exit.
> 
> Note also that most users were buggy even if while_each_thread()
> was fine, the task can exit even _before_ rcu_read_lock().
> 
> Fortunately the new for_each_thread() only requires the stable
> task_struct, so this change fixes both problems.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  2013-12-04 13:04 ` [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() Oleg Nesterov
  2013-12-04 15:37   ` Michal Hocko
@ 2013-12-05  0:41   ` David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: David Rientjes @ 2013-12-05  0:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed, 4 Dec 2013, Oleg Nesterov wrote:

> At least out_of_memory() calls has_intersects_mems_allowed()
> without even rcu_read_lock(), this is obviously buggy.
> 
> Add the necessary rcu_read_lock(). This means that we can not
> simply return from the loop, we need "bool ret" and "break".
> 
> While at it, swap the names of task_struct's (the argument and
> the local). This cleanups the code a little bit and avoids the
> unnecessary initialization.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm()
  2013-12-04 13:04 ` [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm() Oleg Nesterov
  2013-12-04 15:40   ` Michal Hocko
@ 2013-12-05  0:42   ` David Rientjes
  1 sibling, 0 replies; 18+ messages in thread
From: David Rientjes @ 2013-12-05  0:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed, 4 Dec 2013, Oleg Nesterov wrote:

> find_lock_task_mm() expects it is called under rcu or tasklist lock,
> but it seems that at least oom_unkillable_task()->task_in_mem_cgroup()
> and mem_cgroup_out_of_memory()->oom_badness() can call it lockless.
> 
> Perhaps we could fix the callers, but this patch simply adds rcu lock
> into find_lock_task_mm(). This also allows to simplify a bit one of its
> callers, oom_kill_process().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-04 13:04 ` [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
  2013-12-04 13:28   ` Frederic Weisbecker
@ 2013-12-05  0:58   ` David Rientjes
  2013-12-05 18:16     ` Oleg Nesterov
  1 sibling, 1 reply; 18+ messages in thread
From: David Rientjes @ 2013-12-05  0:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Wed, 4 Dec 2013, Oleg Nesterov wrote:

> while_each_thread() and next_thread() should die, almost every
> lockless usage is wrong.
> 
> 1. Unless g == current, the lockless while_each_thread() is not safe.
> 
>    while_each_thread(g, t) can loop forever if g exits, next_thread()
>    can't reach the unhashed thread in this case. Note that this can
>    happen even if g is the group leader, it can exec.
> 
> 2. Even if while_each_thread() itself was correct, people often use
>    it wrongly.
> 
>    It was never safe to just take rcu_read_lock() and loop unless
>    you verify that pid_alive(g) == T, even the first next_thread()
>    can point to the already freed/reused memory.
> 
> This patch adds signal_struct->thread_head and task->thread_node
> to create the normal rcu-safe list with the stable head. The new
> for_each_thread(g, t) helper is always safe under rcu_read_lock()
> as long as this task_struct can't go away.
> 
> Note: of course it is ugly to have both task_struct->thread_node
> and the old task_struct->thread_group, we will kill it later, after
> we change the users of while_each_thread() to use for_each_thread().
> 
> Perhaps we can kill it even before we convert all users, we can
> reimplement next_thread(t) using the new thread_head/thread_node.
> But we can't do this right now because this will lead to subtle
> behavioural changes. For example, do/while_each_thread() always
> sees at least one task, while for_each_thread() can do nothing if
> the whole thread group has died. Or thread_group_empty(), currently
> its semantics is not clear unless thread_group_leader(p) and we
> need to audit the callers before we can change it.
> 
> So this patch adds the new interface which has to coexist with the
> old one for some time, hopefully the next changes will be more or
> less straightforward and the old one will go away soon.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-and-Tested-by: Sergey Dyasly <dserrg@gmail.com>
> Reviewed-by: Sameer Nanda <snanda@chromium.org>

Acked-by: David Rientjes <rientjes@google.com>

Minor: should the definitions of thread_{head,node} be annotated with 
__rcu for users of CONFIG_SPARSE_RCU_POINTER?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-05  0:58   ` David Rientjes
@ 2013-12-05 18:16     ` Oleg Nesterov
  2013-12-05 23:23       ` David Rientjes
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2013-12-05 18:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On 12/04, David Rientjes wrote:
>
> Acked-by: David Rientjes <rientjes@google.com>

Thanks!

> Minor: should the definitions of thread_{head,node} be annotated with
> __rcu for users of CONFIG_SPARSE_RCU_POINTER?

Perhaps. And perhaps task_struct->tasks. And perhaps we should add
rcu_read_lock_held() || lockdep_is_held(tasklist) || lockdep_is_held(siglock)
into for_each_process/thead.

But lets do this later. At least we should avoid the false positives.

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread()
  2013-12-05 18:16     ` Oleg Nesterov
@ 2013-12-05 23:23       ` David Rientjes
  0 siblings, 0 replies; 18+ messages in thread
From: David Rientjes @ 2013-12-05 23:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Sameer Nanda,
	Sergey Dyasly, Tu, Xiaobing, linux-kernel

On Thu, 5 Dec 2013, Oleg Nesterov wrote:

> > Minor: should the definitions of thread_{head,node} be annotated with
> > __rcu for users of CONFIG_SPARSE_RCU_POINTER?
> 
> Perhaps. And perhaps task_struct->tasks. And perhaps we should add
> rcu_read_lock_held() || lockdep_is_held(tasklist) || lockdep_is_held(siglock)
> into for_each_process/thead.
> 
> But lets do this later. At least we should avoid the false positives.
> 

Ok, thanks.  I think anything we can do to catch these cases of 
unprotected usage even with some debugging options like 
CONFIG_SPARSE_RCU_POINTER and CONFIG_LOCKDEP will help given the number of 
places you found that were doing it incorrectly already.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-12-05 23:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-04 13:03 [PATCH v2 0/4] initial while_each_thread() fixes Oleg Nesterov
2013-12-04 13:04 ` [PATCH v2 1/4] introduce for_each_thread() to replace the buggy while_each_thread() Oleg Nesterov
2013-12-04 13:28   ` Frederic Weisbecker
2013-12-04 13:49     ` Oleg Nesterov
2013-12-04 14:17       ` Frederic Weisbecker
2013-12-04 15:27         ` Oleg Nesterov
2013-12-05  0:58   ` David Rientjes
2013-12-05 18:16     ` Oleg Nesterov
2013-12-05 23:23       ` David Rientjes
2013-12-04 13:04 ` [PATCH v2 2/4] oom_kill: change oom_kill.c to use for_each_thread() Oleg Nesterov
2013-12-04 15:37   ` Michal Hocko
2013-12-05  0:39   ` David Rientjes
2013-12-04 13:04 ` [PATCH v2 3/4] oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() Oleg Nesterov
2013-12-04 15:37   ` Michal Hocko
2013-12-05  0:41   ` David Rientjes
2013-12-04 13:04 ` [PATCH v2 4/4] oom_kill: add rcu_read_lock() into find_lock_task_mm() Oleg Nesterov
2013-12-04 15:40   ` Michal Hocko
2013-12-05  0:42   ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox