[PATCH 0/3] introduce for_each_process_thread

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers
@ 2016-07-25 16:23 Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

Hello,

IMHO this makes sense in any case, but mostly this is preparation for
another change: show_state_filter() should be preemptible. But this needs
more discussion, I'll write another email/patch when I fully understand
the hard-lockup caused by sysrq-t.

Oleg.

 include/linux/sched.h | 10 ++++++++++
 kernel/exit.c         | 42 ++++++++++++++++++++++++++++++++++++++++++
 kernel/hung_task.c    | 41 ++++++++++++++++++++++++-----------------
 3 files changed, 76 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
@ 2016-07-25 16:23 ` Oleg Nesterov
  2016-08-02 11:58   ` Peter Zijlstra
  2016-07-25 16:23 ` [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

Example:

	rcu_read_lock();
	for_each_process_thread(p, t) {
		do_something_slow(p, t);

		if (SPENT_TOO_MANY_TIME) {
			for_each_process_thread_break(p, t);
			rcu_read_unlock();
			schedule();
			rcu_read_lock();
			for_each_process_thread_continue(&p, &t);
		}
	}
	rcu_read_unlock();

This looks similar to rcu_lock_break(), but much better and the next patch
changes check_hung_uninterruptible_tasks() to use these new helpers. But my
real target is show_state_filter() which can trivially lead to lockup.

Compared to rcu_lock_break(), for_each_process_thread_continue() never gives
up, it relies on fact that both process and thread lists are sorted by the
task->start_time key. So, for example, even if both leader/thread are already
dead we can find the next alive process and continue.

TODO: This is another indication we should rename pid_alive(tsk) (and probably
change it), most users call it to ensure that rcu_read_lock() still protects
this tsk.

NOTE: it seems that, contrary to the comment, task_struct->start_time is not
really monotonic, and this should be probably fixed. Until then  _continue()
might skip more threads with the same ->start_time than necessary.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h | 10 ++++++++++
 kernel/exit.c         | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..004c9d5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2859,6 +2859,16 @@ extern bool current_is_single_threaded(void);
 #define for_each_process_thread(p, t)	\
 	for_each_process(p) for_each_thread(p, t)
 
+static inline void
+for_each_process_thread_break(struct task_struct *p, struct task_struct *t)
+{
+	get_task_struct(p);
+	get_task_struct(t);
+}
+
+extern void
+for_each_process_thread_continue(struct task_struct **, struct task_struct **);
+
 static inline int get_nr_threads(struct task_struct *tsk)
 {
 	return tsk->signal->nr_threads;
diff --git a/kernel/exit.c b/kernel/exit.c
index 9e6e135..08f49d4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -210,6 +210,48 @@ repeat:
 		goto repeat;
 }
 
+void for_each_process_thread_continue(struct task_struct **p_leader,
+				      struct task_struct **p_thread)
+{
+	struct task_struct *leader = *p_leader, *thread = *p_thread;
+	struct task_struct *prev, *next;
+	u64 start_time;
+
+	if (pid_alive(thread)) {
+		/* mt exec could change the leader */
+		*p_leader = thread->group_leader;
+	} else if (pid_alive(leader)) {
+		start_time = thread->start_time;
+		prev = leader;
+
+		for_each_thread(leader, next) {
+			if (next->start_time > start_time)
+				break;
+			prev = next;
+		}
+
+		*p_thread = prev;
+	} else {
+		start_time = leader->start_time;
+		prev = &init_task;
+
+		for_each_process(next) {
+			if (next->start_time > start_time)
+				break;
+			prev = next;
+		}
+
+		*p_leader = prev;
+		/* a new thread can come after that, but this is fine */
+		*p_thread = list_last_entry(&prev->signal->thread_head,
+						struct task_struct,
+						thread_node);
+	}
+
+	put_task_struct(leader);
+	put_task_struct(thread);
+}
+
 /*
  * Determine if a process group is "orphaned", according to the POSIX
  * definition in 2.2.2.52.  Orphaned process groups are not to be affected
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
@ 2016-07-25 16:23 ` Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 3/3] hung_task.c: change the "max_count" " Oleg Nesterov
  2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
  3 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

Change check_hung_uninterruptible_tasks() to use the new helpers and remove
the "can_cont" logic from rcu_lock_break() which we will probably export
later for other users (show_state_filter).

We could add for_each_process_thread_break/continue right into rcu_lock_break()
but see the next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/hung_task.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d234022..517f52e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -134,20 +134,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
  * For preemptible RCU it is sufficient to call rcu_read_unlock in order
  * to exit the grace period. For classic RCU, a reschedule is required.
  */
-static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
+static void rcu_lock_break(void)
 {
-	bool can_cont;
-
-	get_task_struct(g);
-	get_task_struct(t);
 	rcu_read_unlock();
 	cond_resched();
 	rcu_read_lock();
-	can_cont = pid_alive(g) && pid_alive(t);
-	put_task_struct(t);
-	put_task_struct(g);
-
-	return can_cont;
 }
 
 /*
@@ -170,16 +161,18 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
-		if (!max_count--)
+		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
+		if (t->state == TASK_UNINTERRUPTIBLE)
+			check_hung_task(t, timeout);
+
+		if (!--max_count)
 			goto unlock;
 		if (!--batch_count) {
 			batch_count = HUNG_TASK_BATCHING;
-			if (!rcu_lock_break(g, t))
-				goto unlock;
+			for_each_process_thread_break(g, t);
+			rcu_lock_break();
+			for_each_process_thread_continue(&g, &t);
 		}
-		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
-		if (t->state == TASK_UNINTERRUPTIBLE)
-			check_hung_task(t, timeout);
 	}
  unlock:
 	rcu_read_unlock();
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] hung_task.c: change the "max_count" code to use for_each_process_thread_break/continue
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
  2016-07-25 16:23 ` [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue Oleg Nesterov
@ 2016-07-25 16:23 ` Oleg Nesterov
  2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
  3 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-25 16:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Anderson, Ingo Molnar, Peter Zijlstra, Paul E. McKenney,
	Wang Shu, linux-kernel

If sysctl_hung_task_check_count is low and the first "max_count" threads do
not exit, check_hung_uninterruptible_tasks() will never check other tasks,
this is just ugly.

Now that we have for_each_process_thread_continue(), we can save g/t before
return and restart on the next call.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/hung_task.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 517f52e..e6e516d 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -148,6 +148,9 @@ static void rcu_lock_break(void)
  */
 static void check_hung_uninterruptible_tasks(unsigned long timeout)
 {
+	/* we have only one watchdog() thread */
+	static struct task_struct *g_saved, *t_saved;
+
 	int max_count = sysctl_hung_task_check_count;
 	int batch_count = HUNG_TASK_BATCHING;
 	struct task_struct *g, *t;
@@ -160,18 +163,29 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		return;
 
 	rcu_read_lock();
+	if (g_saved) {
+		g = g_saved;
+		t = t_saved;
+		g_saved = NULL;
+		goto resume;
+	}
+
 	for_each_process_thread(g, t) {
 		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
 		if (t->state == TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, timeout);
 
-		if (!--max_count)
-			goto unlock;
-		if (!--batch_count) {
-			batch_count = HUNG_TASK_BATCHING;
+		if (!--max_count || !--batch_count) {
 			for_each_process_thread_break(g, t);
+			if (!max_count) {
+				g_saved = g;
+				t_saved = t;
+				goto unlock;
+			}
 			rcu_lock_break();
+ resume:
 			for_each_process_thread_continue(&g, &t);
+			batch_count = HUNG_TASK_BATCHING;
 		}
 	}
  unlock:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers)
  2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
                   ` (2 preceding siblings ...)
  2016-07-25 16:23 ` [PATCH 3/3] hung_task.c: change the "max_count" " Oleg Nesterov
@ 2016-07-26 18:57 ` Oleg Nesterov
  2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
  3 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-26 18:57 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Dave Anderson, Peter Zijlstra, Paul E. McKenney, Wang Shu,
	linux-kernel, Tejun Heo, Thomas Gleixner

On 07/25, Oleg Nesterov wrote:
>
> IMHO this makes sense in any case, but mostly this is preparation for
> another change: show_state_filter() should be preemptible. But this needs
> more discussion, I'll write another email/patch when I fully understand
> the hard-lockup caused by sysrq-t.

Yes, we need to do something with show_state_filter() anyway, I think.

OTOH, I believe this simple change in multi_cpu_stop() makes sense too
regardless.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE
  2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
@ 2016-07-26 18:57   ` Oleg Nesterov
  2016-07-27  8:06     ` Thomas Gleixner
  2016-07-27 10:42     ` [tip:core/urgent] stop_machine: Touch_nmi_watchdog() " tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-07-26 18:57 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Dave Anderson, Peter Zijlstra, Paul E. McKenney, Wang Shu,
	linux-kernel, Tejun Heo, Thomas Gleixner

Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
hard-lockup can be triggered on another CPU which does nothing wrong and
the trace from nmi_panic() won't help to investigate the problem.

And this change "fixes" the problem we (seem to) hit in practice.

- stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

- CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
  lockup and tries to report the problem.

- show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
  which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

- CPU_1 spends more than 10 seconds trying to flush the log buffer to
  the slow serial console.

- NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

Reported-by: Wang Shu <shuwang@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/stop_machine.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index a467e6c..4a1ca5f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/nmi.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -209,6 +210,13 @@ static int multi_cpu_stop(void *data)
 				break;
 			}
 			ack_state(msdata);
+		} else if (curstate > MULTI_STOP_PREPARE) {
+			/*
+			 * At this stage all other CPUs we depend on must spin
+			 * in the same loop. Any reason for hard-lockup should
+			 * be detected and reported on their side.
+			 */
+			touch_nmi_watchdog();
 		}
 	} while (curstate != MULTI_STOP_EXIT);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE
  2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
@ 2016-07-27  8:06     ` Thomas Gleixner
  2016-07-27 10:42     ` [tip:core/urgent] stop_machine: Touch_nmi_watchdog() " tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2016-07-27  8:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Dave Anderson, Peter Zijlstra,
	Paul E. McKenney, Wang Shu, linux-kernel, Tejun Heo

On Tue, 26 Jul 2016, Oleg Nesterov wrote:
> Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
> hard-lockup can be triggered on another CPU which does nothing wrong and
> the trace from nmi_panic() won't help to investigate the problem.
> 
> And this change "fixes" the problem we (seem to) hit in practice.
> 
> - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.
> 
> - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
>   lockup and tries to report the problem.
> 
> - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
>   which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.
> 
> - CPU_1 spends more than 10 seconds trying to flush the log buffer to
>   the slow serial console.
> 
> - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().
> 
> Reported-by: Wang Shu <shuwang@redhat.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

> ---
>  kernel/stop_machine.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index a467e6c..4a1ca5f 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -21,6 +21,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/atomic.h>
>  #include <linux/lglock.h>
> +#include <linux/nmi.h>
>  
>  /*
>   * Structure to determine completion condition and record errors.  May
> @@ -209,6 +210,13 @@ static int multi_cpu_stop(void *data)
>  				break;
>  			}
>  			ack_state(msdata);
> +		} else if (curstate > MULTI_STOP_PREPARE) {
> +			/*
> +			 * At this stage all other CPUs we depend on must spin
> +			 * in the same loop. Any reason for hard-lockup should
> +			 * be detected and reported on their side.
> +			 */
> +			touch_nmi_watchdog();
>  		}
>  	} while (curstate != MULTI_STOP_EXIT);
>  
> -- 
> 2.5.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:core/urgent] stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE
  2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
  2016-07-27  8:06     ` Thomas Gleixner
@ 2016-07-27 10:42     ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-07-27 10:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: paulmck, oleg, tj, linux-kernel, akpm, shuwang, peterz, tglx,
	mingo, hpa, anderson, torvalds

Commit-ID:  ce4f06dcbb5d6d04d202f1b81ac72d5679dcdfc0
Gitweb:     http://git.kernel.org/tip/ce4f06dcbb5d6d04d202f1b81ac72d5679dcdfc0
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 26 Jul 2016 20:57:36 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 Jul 2016 11:12:11 +0200

stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE

Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
hard-lockup can be triggered on another CPU which does nothing wrong and
the trace from nmi_panic() won't help to investigate the problem.

And this change "fixes" the problem we (seem to) hit in practice.

 - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

 - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
   lockup and tries to report the problem.

 - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
   which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

 - CPU_1 spends more than 10 seconds trying to flush the log buffer to
   the slow serial console.

 - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

Reported-by: Wang Shu <shuwang@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/stop_machine.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index a467e6c..4a1ca5f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include <linux/smpboot.h>
 #include <linux/atomic.h>
 #include <linux/lglock.h>
+#include <linux/nmi.h>
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -209,6 +210,13 @@ static int multi_cpu_stop(void *data)
 				break;
 			}
 			ack_state(msdata);
+		} else if (curstate > MULTI_STOP_PREPARE) {
+			/*
+			 * At this stage all other CPUs we depend on must spin
+			 * in the same loop. Any reason for hard-lockup should
+			 * be detected and reported on their side.
+			 */
+			touch_nmi_watchdog();
 		}
 	} while (curstate != MULTI_STOP_EXIT);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
@ 2016-08-02 11:58   ` Peter Zijlstra
  2016-08-03 20:26     ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-02 11:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Dave Anderson, Ingo Molnar, Paul E. McKenney,
	Wang Shu, linux-kernel

On Mon, Jul 25, 2016 at 06:23:48PM +0200, Oleg Nesterov wrote:
> +void for_each_process_thread_continue(struct task_struct **p_leader,
> +				      struct task_struct **p_thread)
> +{
> +	struct task_struct *leader = *p_leader, *thread = *p_thread;
> +	struct task_struct *prev, *next;
> +	u64 start_time;
> +
> +	if (pid_alive(thread)) {
> +		/* mt exec could change the leader */
> +		*p_leader = thread->group_leader;
> +	} else if (pid_alive(leader)) {
> +		start_time = thread->start_time;
> +		prev = leader;
> +
> +		for_each_thread(leader, next) {
> +			if (next->start_time > start_time)
> +				break;
> +			prev = next;
> +		}

This,

> +		*p_thread = prev;
> +	} else {
> +		start_time = leader->start_time;
> +		prev = &init_task;
> +
> +		for_each_process(next) {
> +			if (next->start_time > start_time)
> +				break;
> +			prev = next;
> +		}

and this, could be 'SPEND_TOO_MUCH_TIME' all by themselves.

Unlikely though, nor do I really have a better suggestion :/

> +
> +		*p_leader = prev;
> +		/* a new thread can come after that, but this is fine */
> +		*p_thread = list_last_entry(&prev->signal->thread_head,
> +						struct task_struct,
> +						thread_node);
> +	}
> +
> +	put_task_struct(leader);
> +	put_task_struct(thread);
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-08-02 11:58   ` Peter Zijlstra
@ 2016-08-03 20:26     ` Oleg Nesterov
  2016-08-08 12:20       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-08-03 20:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Dave Anderson, Ingo Molnar, Paul E. McKenney,
	Wang Shu, linux-kernel

Thanks for review and sorry for delay, I am travelling till the end of this week.

On 08/02, Peter Zijlstra wrote:
>
> On Mon, Jul 25, 2016 at 06:23:48PM +0200, Oleg Nesterov wrote:
> > +void for_each_process_thread_continue(struct task_struct **p_leader,
> > +				      struct task_struct **p_thread)
> > +{
> > +	struct task_struct *leader = *p_leader, *thread = *p_thread;
> > +	struct task_struct *prev, *next;
> > +	u64 start_time;
> > +
> > +	if (pid_alive(thread)) {
> > +		/* mt exec could change the leader */
> > +		*p_leader = thread->group_leader;
> > +	} else if (pid_alive(leader)) {
> > +		start_time = thread->start_time;
> > +		prev = leader;
> > +
> > +		for_each_thread(leader, next) {
> > +			if (next->start_time > start_time)
> > +				break;
> > +			prev = next;
> > +		}
>
> This,
>
> > +		*p_thread = prev;
> > +	} else {
> > +		start_time = leader->start_time;
> > +		prev = &init_task;
> > +
> > +		for_each_process(next) {
> > +			if (next->start_time > start_time)
> > +				break;
> > +			prev = next;
> > +		}
>
> and this, could be 'SPEND_TOO_MUCH_TIME' all by themselves.
>
> Unlikely though, nor do I really have a better suggestion :/

Yeees, I don't think this can actually hurt "in practice", but I agree, compared
to rcu_lock_break() this is only bounded by PID_MAX_DEFAULT in theory.

Will you agree if I just add the "int max_scan" argument and make it return a boolean
for the start? The caller will need to abort the for_each_process_thread() loop if
_continue() fails.

Probably this is not what we actually want for show_filter_state(), we can make it
better later. We can "embed" the rcu_lock_break() logic into _continue(), or change
break/continue to record the state (leader_start_time, thread_start_time) so that
a "false" return from _continue() means that the caller needs another schedule(),

	struct state state;

        rcu_read_lock();
        for_each_process_thread(p, t) {
                do_something_slow(p, t);

                if (SPENT_TOO_MANY_TIME) {
                        for_each_process_thread_break(p, t, &state);
        another_break:
                        rcu_read_unlock();
                        schedule();
                        rcu_read_lock();
                        if (!for_each_process_thread_continue(&p, &t, LIMIT, &state))
                        	goto another_break;
               	}
       	}
	rcu_read_unlock();

Not sure. I'd like to do something simple for the start. We need to make
show_state_filter() "preemptible" in any case. And even killable, I think.
Not only it can trivially trigger the soft-lockups (at least), it can simply
never finish.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers
  2016-08-03 20:26     ` Oleg Nesterov
@ 2016-08-08 12:20       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-08 12:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Dave Anderson, Ingo Molnar, Paul E. McKenney,
	Wang Shu, linux-kernel

On Wed, Aug 03, 2016 at 10:26:09PM +0200, Oleg Nesterov wrote:
> Not sure. I'd like to do something simple for the start. We need to make
> show_state_filter() "preemptible" in any case. And even killable, I think.
> Not only it can trivially trigger the soft-lockups (at least), it can simply
> never finish.

Yeah, no objection raelly. Just make sure to put a comment explaining
the limitation/issues with whatever you descide to go for.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-08 12:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-25 16:23 [PATCH 0/3] introduce for_each_process_thread_{break,continue}() helpers Oleg Nesterov
2016-07-25 16:23 ` [PATCH 1/3] " Oleg Nesterov
2016-08-02 11:58   ` Peter Zijlstra
2016-08-03 20:26     ` Oleg Nesterov
2016-08-08 12:20       ` Peter Zijlstra
2016-07-25 16:23 ` [PATCH 2/3] hung_task.c: change rcu_lock_break() code to use for_each_process_thread_break/continue Oleg Nesterov
2016-07-25 16:23 ` [PATCH 3/3] hung_task.c: change the "max_count" " Oleg Nesterov
2016-07-26 18:57 ` [PATCH 0/1] (Was: introduce for_each_process_thread_{break,continue}() helpers) Oleg Nesterov
2016-07-26 18:57   ` [PATCH 1/1] stop_machine: touch_nmi_watchdog() after MULTI_STOP_PREPARE Oleg Nesterov
2016-07-27  8:06     ` Thomas Gleixner
2016-07-27 10:42     ` [tip:core/urgent] stop_machine: Touch_nmi_watchdog() " tip-bot for Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.