[PATCH 0/3] introduce for_each_process_thread

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers
@ 2016-07-25 16:23 Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

Hello,

IMHO this makes sense in any case, but mostly this is preparation for
another change: show_state_filter() should be preemptible. But this needs
more discussion, I'll write another email/patch when I fully understand
the hard-lockup caused by sysrq-t.

Oleg.

 include/linux/sched.h | 10 ++++++++++
 kernel/exit.c         | 42 ++++++++++++++++++++++++++++++++++++++++++
 kernel/hung_task.c    | 41 ++++++++++++++++++++++++-----------------
 3 files changed, 76 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
@ 2016-07-25 16:23 ` Oleg Nesterov
  2016-08-02 11:58   ` Peter Zijlstra
  2016-07-25 16:23 ` [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

Example:

	rcu_read_lock();
	for_each_process_thread(p, t) {
		do_something_slow(p, t);

		if (SPENT_TOO_MANY_TIME) {
			for_each_process_thread_break(p, t);
			rcu_read_unlock();
			schedule();
			rcu_read_lock();
			for_each_process_thread_continue(&p, &t);
		}
	}
	rcu_read_unlock();

This looks similar to rcu_lock_break(), but much better and the next patch
changes check_hung_uninterruptible_tasks() to use these new helpers. But my
real target is show_state_filter() which can trivially lead to lockup.

Compared to rcu_lock_break(), for_each_process_thread_continue() never gives
up, it relies on fact that both process and thread lists are sorted by the
task->start_time key. So, for example, even if both leader/thread are already
dead we can find the next alive process and continue.

TODO: This is another indication we should rename pid_alive(tsk) (and probably
change it), most users call it to ensure that rcu_read_lock() still protects
this tsk.

NOTE: it seems that, contrary to the comment, task_struct->start_time is not
really monotonic, and this should be probably fixed. Until then  _continue()
might skip more threads with the same ->start_time than necessary.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h | 10 ++++++++++
 kernel/exit.c         | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..004c9d5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2859,6 +2859,16 @@ extern bool current_is_single_threaded(void);
 #define for_each_process_thread(p, t)	\
 	for_each_process(p) for_each_thread(p, t)
 
+static inline void
+for_each_process_thread_break(struct task_struct *p, struct task_struct *t)
+{
+	get_task_struct(p);
+	get_task_struct(t);
+}
+
+extern void
+for_each_process_thread_continue(struct task_struct **, struct task_struct **);
+
 static inline int get_nr_threads(struct task_struct *tsk)
 {
 	return tsk->signal->nr_threads;
diff --git a/kernel/exit.c b/kernel/exit.c
index 9e6e135..08f49d4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -210,6 +210,48 @@ repeat:
 		goto repeat;
 }
 
+void for_each_process_thread_continue(struct task_struct **p_leader,
+				      struct task_struct **p_thread)
+{
+	struct task_struct *leader = *p_leader, *thread = *p_thread;
+	struct task_struct *prev, *next;
+	u64 start_time;
+
+	if (pid_alive(thread)) {
+		/* mt exec could change the leader */
+		*p_leader = thread->group_leader;
+	} else if (pid_alive(leader)) {
+		start_time = thread->start_time;
+		prev = leader;
+
+		for_each_thread(leader, next) {
+			if (next->start_time > start_time)
+				break;
+			prev = next;
+		}
+
+		*p_thread = prev;
+	} else {
+		start_time = leader->start_time;
+		prev = &init_task;
+
+		for_each_process(next) {
+			if (next->start_time > start_time)
+				break;
+			prev = next;
+		}
+
+		*p_leader = prev;
+		/* a new thread can come after that, but this is fine */
+		*p_thread = list_last_entry(&prev->signal->thread_head,
+						struct task_struct,
+						thread_node);
+	}
+
+	put_task_struct(leader);
+	put_task_struct(thread);
+}
+
 /*
  * Determine if a process group is "orphaned", according to the POSIX
  * definition in 2.2.2.52.  Orphaned process groups are not to be affected
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
@ 2016-08-02 11:58   ` Peter Zijlstra
  2016-08-03 20:26     ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-02 11:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Dave Anderson, Ingo Molnar, Paul E. McKenney,
	Wang Shu, linux-kernel

On Mon, Jul 25, 2016 at 06:23:48PM +0200, Oleg Nesterov wrote:
> +void for_each_process_thread_continue(struct task_struct **p_leader,
> +				      struct task_struct **p_thread)
> +{
> +	struct task_struct *leader = *p_leader, *thread = *p_thread;
> +	struct task_struct *prev, *next;
> +	u64 start_time;
> +
> +	if (pid_alive(thread)) {
> +		/* mt exec could change the leader */
> +		*p_leader = thread->group_leader;
> +	} else if (pid_alive(leader)) {
> +		start_time = thread->start_time;
> +		prev = leader;
> +
> +		for_each_thread(leader, next) {
> +			if (next->start_time > start_time)
> +				break;
> +			prev = next;
> +		}

This,

> +		*p_thread = prev;
> +	} else {
> +		start_time = leader->start_time;
> +		prev = &init_task;
> +
> +		for_each_process(next) {
> +			if (next->start_time > start_time)
> +				break;
> +			prev = next;
> +		}

and this, could be 'SPEND_TOO_MUCH_TIME' all by themselves.

Unlikely though, nor do I really have a better suggestion :/

> +
> +		*p_leader = prev;
> +		/* a new thread can come after that, but this is fine */
> +		*p_thread = list_last_entry(&prev->signal->thread_head,
> +						struct task_struct,
> +						thread_node);
> +	}
> +
> +	put_task_struct(leader);
> +	put_task_struct(thread);
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-08-02 11:58   ` Peter Zijlstra
@ 2016-08-03 20:26     ` Oleg Nesterov
  2016-08-08 12:20       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-08-03 20:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Dave Anderson, Ingo Molnar, Paul E. McKenney,
	Wang Shu, linux-kernel

Thanks for review and sorry for delay, I am travelling till the end of this week.

On 08/02, Peter Zijlstra wrote:
>
> On Mon, Jul 25, 2016 at 06:23:48PM +0200, Oleg Nesterov wrote:
> > +void for_each_process_thread_continue(struct task_struct **p_leader,
> > +				      struct task_struct **p_thread)
> > +{
> > +	struct task_struct *leader = *p_leader, *thread = *p_thread;
> > +	struct task_struct *prev, *next;
> > +	u64 start_time;
> > +
> > +	if (pid_alive(thread)) {
> > +		/* mt exec could change the leader */
> > +		*p_leader = thread->group_leader;
> > +	} else if (pid_alive(leader)) {
> > +		start_time = thread->start_time;
> > +		prev = leader;
> > +
> > +		for_each_thread(leader, next) {
> > +			if (next->start_time > start_time)
> > +				break;
> > +			prev = next;
> > +		}
>
> This,
>
> > +		*p_thread = prev;
> > +	} else {
> > +		start_time = leader->start_time;
> > +		prev = &init_task;
> > +
> > +		for_each_process(next) {
> > +			if (next->start_time > start_time)
> > +				break;
> > +			prev = next;
> > +		}
>
> and this, could be 'SPEND_TOO_MUCH_TIME' all by themselves.
>
> Unlikely though, nor do I really have a better suggestion :/

Yeees, I don't think this can actually hurt "in practice", but I agree, compared
to rcu_lock_break() this is only bounded by PID_MAX_DEFAULT in theory.

Will you agree if I just add the "int max_scan" argument and make it return a boolean
for the start? The caller will need to abort the for_each_process_thread() loop if
_continue() fails.

Probably this is not what we actually want for show_filter_state(), we can make it
better later. We can "embed" the rcu_lock_break() logic into _continue(), or change
break/continue to record the state (leader_start_time, thread_start_time) so that
a "false" return from _continue() means that the caller needs another schedule(),

	struct state state;

        rcu_read_lock();
        for_each_process_thread(p, t) {
                do_something_slow(p, t);

                if (SPENT_TOO_MANY_TIME) {
                        for_each_process_thread_break(p, t, &state);
        another_break:
                        rcu_read_unlock();
                        schedule();
                        rcu_read_lock();
                        if (!for_each_process_thread_continue(&p, &t, LIMIT, &state))
                        	goto another_break;
               	}
       	}
	rcu_read_unlock();

Not sure. I'd like to do something simple for the start. We need to make
show_state_filter() "preemptible" in any case. And even killable, I think.
Not only it can trivially trigger the soft-lockups (at least), it can simply
never finish.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-08-03 20:26     ` Oleg Nesterov
@ 2016-08-08 12:20       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-08 12:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Dave Anderson, Ingo Molnar, Paul E. McKenney,
	Wang Shu, linux-kernel

On Wed, Aug 03, 2016 at 10:26:09PM +0200, Oleg Nesterov wrote:
> Not sure. I'd like to do something simple for the start. We need to make
> show_state_filter() "preemptible" in any case. And even killable, I think.
> Not only it can trivially trigger the soft-lockups (at least), it can simply
> never finish.

Yeah, no objection raelly. Just make sure to put a comment explaining
the limitation/issues with whatever you descide to go for.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
@ 2016-07-25 16:23 ` Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 3/3] hung_task.c: change the "max_count" " Oleg Nesterov
  2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
  3 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

Change check_hung_uninterruptible_tasks() to use the new helpers and remove
the "can_cont" logic from rcu_lock_break() which we will probably export
later for other users (show_state_filter).

We could add for_each_process_thread_break/continue right into rcu_lock_break()
but see the next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/hung_task.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d234022..517f52e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -134,20 +134,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
  * For preemptible RCU it is sufficient to call rcu_read_unlock in order
  * to exit the grace period. For classic RCU, a reschedule is required.
  */
-static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
+static void rcu_lock_break(void)
 {
-	bool can_cont;
-
-	get_task_struct(g);
-	get_task_struct(t);
 	rcu_read_unlock();
 	cond_resched();
 	rcu_read_lock();
-	can_cont = pid_alive(g) && pid_alive(t);
-	put_task_struct(t);
-	put_task_struct(g);
-
-	return can_cont;
 }
 
 /*
@@ -170,16 +161,18 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
-		if (!max_count--)
+		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
+		if (t->state == TASK_UNINTERRUPTIBLE)
+			check_hung_task(t, timeout);
+
+		if (!--max_count)
 			goto unlock;
 		if (!--batch_count) {
 			batch_count = HUNG_TASK_BATCHING;
-			if (!rcu_lock_break(g, t))
-				goto unlock;
+			for_each_process_thread_break(g, t);
+			rcu_lock_break();
+			for_each_process_thread_continue(&g, &t);
 		}
-		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
-		if (t->state == TASK_UNINTERRUPTIBLE)
-			check_hung_task(t, timeout);
 	}
  unlock:
 	rcu_read_unlock();
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] hung_task.c: change the "max_count" code to use for_each_process_thread_break/continue
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue Oleg Nesterov
@ 2016-07-25 16:23 ` Oleg Nesterov
  2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
  3 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

If sysctl_hung_task_check_count is low and the first "max_count" threads do
not exit, check_hung_uninterruptible_tasks() will never check other tasks,
this is just ugly.

Now that we have for_each_process_thread_continue(), we can save g/t before
return and restart on the next call.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/hung_task.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 517f52e..e6e516d 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -148,6 +148,9 @@ static void rcu_lock_break(void)
  */
 static void check_hung_uninterruptible_tasks(unsigned long timeout)
 {
+	/* we have only one watchdog() thread */
+	static struct task_struct *g_saved, *t_saved;
+
 	int max_count = sysctl_hung_task_check_count;
 	int batch_count = HUNG_TASK_BATCHING;
 	struct task_struct *g, *t;
@@ -160,18 +163,29 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		return;
 
 	rcu_read_lock();
+	if (g_saved) {
+		g = g_saved;
+		t = t_saved;
+		g_saved = NULL;
+		goto resume;
+	}
+
 	for_each_process_thread(g, t) {
 		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
 		if (t->state == TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, timeout);
 
-		if (!--max_count)
-			goto unlock;
-		if (!--batch_count) {
-			batch_count = HUNG_TASK_BATCHING;
+		if (!--max_count || !--batch_count) {
 			for_each_process_thread_break(g, t);
+			if (!max_count) {
+				g_saved = g;
+				t_saved = t;
+				goto unlock;
+			}
 			rcu_lock_break();
+ resume:
 			for_each_process_thread_continue(&g, &t);
+			batch_count = HUNG_TASK_BATCHING;
 		}
 	}
  unlock:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers)
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
                   ` (2 preceding siblings ...)
  2016-07-25 16:23 ` [PATCH 3/3] hung_task.c: change the "max_count" " Oleg Nesterov
@ 2016-07-26 18:57 ` Oleg Nesterov
  2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
  3 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-26 18:57 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Dave Anderson, Peter Zijlstra, Paul E. McKenney, Wang Shu,
	linux-kernel, Tejun Heo, Thomas Gleixner

On 07/25, Oleg Nesterov wrote:
>
> IMHO this makes sense in any case, but mostly this is preparation for
> another change: show_state_filter() should be preemptible. But this needs
> more discussion, I'll write another email/patch when I fully understand
> the hard-lockup caused by sysrq-t.

Yes, we need to do something with show_state_filter() anyway, I think.

OTOH, I believe this simple change in multi_cpu_stop() makes sense too
regardless.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE
  2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
@ 2016-07-26 18:57   ` Oleg Nesterov
  2016-07-27  8:06     ` Thomas Gleixner
  2016-07-27 10:42     ` [tip:core/urgent] stop_machine: Touch_nmi_watchdog() " tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-26 18:57 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Dave Anderson, Peter Zijlstra, Paul E. McKenney, Wang Shu,
	linux-kernel, Tejun Heo, Thomas Gleixner

Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
hard-lockup can be triggered on another CPU which does nothing wrong and
the trace from nmi_panic() won't help to investigate the problem.

And this change "fixes" the problem we (seem to) hit in practice.

- stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

- CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
  lockup and tries to report the problem.

- show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
  which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

- CPU_1 spends more than 10 seconds trying to flush the log buffer to
  the slow serial console.

- NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

Reported-by: Wang Shu <shuwang@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/stop_machine.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index a467e6c..4a1ca5f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/nmi.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -209,6 +210,13 @@ static int multi_cpu_stop(void *data)
 				break;
 			}
 			ack_state(msdata);
+		} else if (curstate > MULTI_STOP_PREPARE) {
+			/*
+			 * At this stage all other CPUs we depend on must spin
+			 * in the same loop. Any reason for hard-lockup should
+			 * be detected and reported on their side.
+			 */
+			touch_nmi_watchdog();
 		}
 	} while (curstate != MULTI_STOP_EXIT);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE
  2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
@ 2016-07-27  8:06     ` Thomas Gleixner
  2016-07-27 10:42     ` [tip:core/urgent] stop_machine: Touch_nmi_watchdog() " tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2016-07-27  8:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Dave Anderson, Peter Zijlstra,
	Paul E. McKenney, Wang Shu, linux-kernel, Tejun Heo

On Tue, 26 Jul 2016, Oleg Nesterov wrote:
> Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
> hard-lockup can be triggered on another CPU which does nothing wrong and
> the trace from nmi_panic() won't help to investigate the problem.
> 
> And this change "fixes" the problem we (seem to) hit in practice.
> 
> - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.
> 
> - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
>   lockup and tries to report the problem.
> 
> - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
>   which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.
> 
> - CPU_1 spends more than 10 seconds trying to flush the log buffer to
>   the slow serial console.
> 
> - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().
> 
> Reported-by: Wang Shu <shuwang@redhat.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

> ---
>  kernel/stop_machine.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index a467e6c..4a1ca5f 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -21,6 +21,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/atomic.h>
>  #include <linux/lglock.h>
> +#include <linux/nmi.h>
>  
>  /*
>   * Structure to determine completion condition and record errors.  May
> @@ -209,6 +210,13 @@ static int multi_cpu_stop(void *data)
>  				break;
>  			}
>  			ack_state(msdata);
> +		} else if (curstate > MULTI_STOP_PREPARE) {
> +			/*
> +			 * At this stage all other CPUs we depend on must spin
> +			 * in the same loop. Any reason for hard-lockup should
> +			 * be detected and reported on their side.
> +			 */
> +			touch_nmi_watchdog();
>  		}
>  	} while (curstate != MULTI_STOP_EXIT);
>  
> -- 
> 2.5.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:core/urgent] stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE
  2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
  2016-07-27  8:06     ` Thomas Gleixner
@ 2016-07-27 10:42     ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-07-27 10:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: paulmck, oleg, tj, linux-kernel, akpm, shuwang, peterz, tglx,
	mingo, hpa, anderson, torvalds

Commit-ID:  ce4f06dcbb5d6d04d202f1b81ac72d5679dcdfc0
Gitweb:     http://git.kernel.org/tip/ce4f06dcbb5d6d04d202f1b81ac72d5679dcdfc0
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 26 Jul 2016 20:57:36 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 Jul 2016 11:12:11 +0200

stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE

Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
hard-lockup can be triggered on another CPU which does nothing wrong and
the trace from nmi_panic() won't help to investigate the problem.

And this change "fixes" the problem we (seem to) hit in practice.

 - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

 - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
   lockup and tries to report the problem.

 - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
   which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

 - CPU_1 spends more than 10 seconds trying to flush the log buffer to
   the slow serial console.

 - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

Reported-by: Wang Shu <shuwang@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/stop_machine.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index a467e6c..4a1ca5f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/nmi.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -209,6 +210,13 @@ static int multi_cpu_stop(void *data)
 				break;
 			}
 			ack_state(msdata);
+		} else if (curstate > MULTI_STOP_PREPARE) {
+			/*
+			 * At this stage all other CPUs we depend on must spin
+			 * in the same loop. Any reason for hard-lockup should
+			 * be detected and reported on their side.
+			 */
+			touch_nmi_watchdog();
 		}
 	} while (curstate != MULTI_STOP_EXIT);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-08 12:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
2016-08-02 11:58   ` Peter Zijlstra
2016-08-03 20:26     ` Oleg Nesterov
2016-08-08 12:20       ` Peter Zijlstra
2016-07-25 16:23 ` [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue Oleg Nesterov
2016-07-25 16:23 ` [PATCH 3/3] hung_task.c: change the "max_count" " Oleg Nesterov
2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
2016-07-27  8:06     ` Thomas Gleixner
2016-07-27 10:42     ` [tip:core/urgent] stop_machine: Touch_nmi_watchdog() " tip-bot for Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox