From: Peter Zijlstra <peterz@infradead.org>
To: heiko.carstens@de.ibm.com
Cc: linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@kernel.org>,
Rik van Riel <riel@redhat.com>
Subject: [RFC][PATCH] sched: Start stopper early
Date: Wed, 7 Oct 2015 10:41:10 +0200 [thread overview]
Message-ID: <20151007084110.GX2881@worktop.programming.kicks-ass.net> (raw)
Hi,
So Heiko reported some 'interesting' fail where stop_two_cpus() got
stuck in multi_cpu_stop() with one cpu waiting for another that never
happens.
It _looks_ like the 'other' cpu isn't running and the current best
theory is that we race on cpu-up and get the stop_two_cpus() call in
before the stopper task is running.
This _is_ possible because we set 'online && active' _before_ we do the
smpboot_unpark thing because of ONLINE notifier order.
The below test patch manually starts the stopper task early.
It boots and hotplugs a cpu on my test box so its not insta broken.
---
kernel/sched/core.c | 7 ++++++-
kernel/stop_machine.c | 5 +++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1764a0f..9a56ef7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
rq->age_stamp = sched_clock_cpu(cpu);
}
+extern void cpu_stopper_unpark(unsigned int cpu);
+
static int sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (long)hcpu;
+
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
set_cpu_rq_start_time();
return NOTIFY_OK;
case CPU_ONLINE:
+ cpu_stopper_unpark(cpu);
/*
* At this point a starting CPU has marked itself as online via
* set_cpu_online(). But it might not yet have marked itself
@@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
* Thus, fall-through and help the starting CPU along.
*/
case CPU_DOWN_FAILED:
- set_cpu_active((long)hcpu, true);
+ set_cpu_active(cpu, true);
return NOTIFY_OK;
default:
return NOTIFY_DONE;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5..c674371 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.selfparking = true,
};
+void cpu_stopper_unpark(unsigned int cpu)
+{
+ kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
+}
+
static int __init cpu_stop_init(void)
{
unsigned int cpu;
next reply other threads:[~2015-10-07 8:41 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-07 8:41 Peter Zijlstra [this message]
2015-10-07 12:30 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
2015-10-07 12:38 ` Peter Zijlstra
2015-10-07 13:20 ` Oleg Nesterov
2015-10-07 13:24 ` Oleg Nesterov
2015-10-07 13:36 ` kbuild test robot
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
2015-10-14 15:34 ` Peter Zijlstra
2015-10-14 19:03 ` Oleg Nesterov
2015-10-14 20:32 ` Peter Zijlstra
2015-10-15 17:02 ` Oleg Nesterov
2015-10-16 10:49 ` Peter Zijlstra
2015-10-20 9:32 ` [tip:sched/core] stop_machine: Ensure " tip-bot for Oleg Nesterov
2015-10-08 14:51 ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Introduce " tip-bot for Oleg Nesterov
2015-10-08 14:51 ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2015-10-08 15:04 ` Peter Zijlstra
2015-10-08 15:59 ` Oleg Nesterov
2015-10-08 16:08 ` Oleg Nesterov
2015-10-08 17:01 ` [PATCH v2 " Oleg Nesterov
2015-10-09 16:37 ` Peter Zijlstra
2015-10-09 16:40 ` Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Change " tip-bot for Oleg Nesterov
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
2015-10-08 18:47 ` Oleg Nesterov
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-09 16:00 ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Kill smp_hotplug_thread-> pre_unpark, " tip-bot for Oleg Nesterov
2015-10-09 16:00 ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] stop_machine: Kill " tip-bot for Oleg Nesterov
2015-10-09 16:00 ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2015-10-09 16:49 ` Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] sched: Start " tip-bot for Peter Zijlstra
2015-10-16 8:22 ` [RFC][PATCH] " Heiko Carstens
2015-10-16 9:57 ` Peter Zijlstra
2015-10-16 12:01 ` Heiko Carstens
2015-10-26 14:24 ` Michael Holzheu
2015-10-26 20:20 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151007084110.GX2881@worktop.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.