linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] sched: Start stopper early
@ 2015-10-07  8:41 Peter Zijlstra
  2015-10-07 12:30 ` Oleg Nesterov
                   ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-07  8:41 UTC (permalink / raw)
  To: heiko.carstens
  Cc: linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar, Rik van Riel

Hi,

So Heiko reported some 'interesting' fail where stop_two_cpus() got
stuck in multi_cpu_stop() with one cpu waiting for another that never
happens.

It _looks_ like the 'other' cpu isn't running and the current best
theory is that we race on cpu-up and get the stop_two_cpus() call in
before the stopper task is running.

This _is_ possible because we set 'online && active' _before_ we do the
smpboot_unpark thing because of ONLINE notifier order.

The below test patch manually starts the stopper task early.

It boots and hotplugs a cpu on my test box so its not insta broken.

---
 kernel/sched/core.c   |    7 ++++++-
 kernel/stop_machine.c |    5 +++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1764a0f..9a56ef7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
 	rq->age_stamp = sched_clock_cpu(cpu);
 }
 
+extern void cpu_stopper_unpark(unsigned int cpu);
+
 static int sched_cpu_active(struct notifier_block *nfb,
 				      unsigned long action, void *hcpu)
 {
+	int cpu = (long)hcpu;
+
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_STARTING:
 		set_cpu_rq_start_time();
 		return NOTIFY_OK;
 	case CPU_ONLINE:
+		cpu_stopper_unpark(cpu);
 		/*
 		 * At this point a starting CPU has marked itself as online via
 		 * set_cpu_online(). But it might not yet have marked itself
@@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
 		 * Thus, fall-through and help the starting CPU along.
 		 */
 	case CPU_DOWN_FAILED:
-		set_cpu_active((long)hcpu, true);
+		set_cpu_active(cpu, true);
 		return NOTIFY_OK;
 	default:
 		return NOTIFY_DONE;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5..c674371 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
 	.selfparking		= true,
 };
 
+void cpu_stopper_unpark(unsigned int cpu)
+{
+	kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
+}
+
 static int __init cpu_stop_init(void)
 {
 	unsigned int cpu;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-10-26 20:20 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07  8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
2015-10-07 12:30 ` Oleg Nesterov
2015-10-07 12:38   ` Peter Zijlstra
2015-10-07 13:20     ` Oleg Nesterov
2015-10-07 13:24       ` Oleg Nesterov
2015-10-07 13:36       ` kbuild test robot
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
2015-10-08 14:51   ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
2015-10-14 15:34     ` Peter Zijlstra
2015-10-14 19:03       ` Oleg Nesterov
2015-10-14 20:32         ` Peter Zijlstra
2015-10-15 17:02           ` Oleg Nesterov
2015-10-16 10:49             ` Peter Zijlstra
2015-10-20  9:32     ` [tip:sched/core] stop_machine: Ensure " tip-bot for Oleg Nesterov
2015-10-08 14:51   ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
2015-10-20  9:33     ` [tip:sched/core] stop_machine: Introduce " tip-bot for Oleg Nesterov
2015-10-08 14:51   ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2015-10-08 15:04     ` Peter Zijlstra
2015-10-08 15:59       ` Oleg Nesterov
2015-10-08 16:08         ` Oleg Nesterov
2015-10-08 17:01     ` [PATCH v2 " Oleg Nesterov
2015-10-09 16:37       ` Peter Zijlstra
2015-10-09 16:40         ` Oleg Nesterov
2015-10-20  9:33       ` [tip:sched/core] stop_machine: Change " tip-bot for Oleg Nesterov
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
2015-10-08 18:47   ` Oleg Nesterov
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-09 16:00   ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
2015-10-20  9:33     ` [tip:sched/core] stop_machine: Kill smp_hotplug_thread-> pre_unpark, " tip-bot for Oleg Nesterov
2015-10-09 16:00   ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
2015-10-20  9:34     ` [tip:sched/core] stop_machine: Kill " tip-bot for Oleg Nesterov
2015-10-09 16:00   ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2015-10-09 16:49     ` Oleg Nesterov
2015-10-20  9:34     ` [tip:sched/core] sched: Start " tip-bot for Peter Zijlstra
2015-10-16  8:22 ` [RFC][PATCH] " Heiko Carstens
2015-10-16  9:57   ` Peter Zijlstra
2015-10-16 12:01     ` Heiko Carstens
2015-10-26 14:24       ` Michael Holzheu
2015-10-26 20:20         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).