From: Russ Anderson <rja@sgi.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>, Robin Holt <holt@sgi.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Shawn Guo <shawn.guo@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH] Do not force shutdown/reboot to boot cpu.
Date: Wed, 10 Apr 2013 17:29:11 -0500 [thread overview]
Message-ID: <20130410222910.GA8112@sgi.com> (raw)
In-Reply-To: <20130410152911.GA3011@sgi.com>
On Wed, Apr 10, 2013 at 10:29:12AM -0500, Russ Anderson wrote:
> On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote:
> >
> > Yeah, we've had issues with ACPI in the past, so I do think we should
> > always reboot using the BP. Even if it almost certainly works on 99+%
> > of all machines on any random CPU.
> >
> > The optimal solution would be to just speed up the
> > disable_nonboot_cpus() code so much that it isn't an issue. That would
> > be good for suspending too, although I guess suspend isn't a big issue
> > if you have a thousand CPU's.
> >
> > Has anybody checked whether we could do the cpu_down() on non-boot
> > CPU's in parallel? Right now we serialize the thing completely, with
> > one single
> >
> > for_each_online_cpu(cpu) {
> > ...
> >
> > loop that does a synchrinous _cpu_down() for each CPU. No wonder it
> > takes forever. We do __stop_machine() over and over and over again:
> > the whole thing is basically O(n**2) in CPU's.
>
> Yes, I have a test patch that replaces for_each_online_cpu(cpu)
> with a cpu bitmask in disable_nonboot_cpus(). The lower level
> routines already take a bitmask. It allows __stop_machine() to
> be called just once. That change reduces shutdown time on a
> 1024 cpu machine from 16 minutes 4 minutes. Significant improvement,
> but not good enough.
>
> The next significant bottleneck is __cpu_notify(). Tried creating
> worker threads to parallelize the shutdown, but the problem is
> __cpu_notify() is not thread safe. Putting a lock around it
> caused all the worker threads to fight over the lock.
>
> Note that __cpu_notify() has to be called for all cpus being
> shut down because the cpu_chain notifier call chain has cpu as a
> parameter. The delema is that cpu_chain notifiers need to be called on
> all cpus, but cannot be done in parallel due to __cpu_notify() not being
> thread safe. Spinning through the notifier chain sequentially for all
> cpus just takes a long time.
>
> The real fix would be to make the &cpu_chain notifier per cpu, or at
> least thread safe, so that all the cpus being shut down could do so
> in parallel. That is a significant change with ramifications on
> other code.
>
> I will post a patch shortly with the cpu bitmask change. Changing
> __cpu_notify() will take more discussion.
Here is the test patch with the cpu bitmask change. It results
in calling __stop_machine() just once.
After making feedback changes I'll formally submit the patch.
---
kernel/cpu.c | 94 +++++++++++++++++++++++++++++++++++------------------------
1 file changed, 56 insertions(+), 38 deletions(-)
Index: linux/kernel/cpu.c
===================================================================
--- linux.orig/kernel/cpu.c 2013-04-10 17:16:40.084960495 -0500
+++ linux/kernel/cpu.c 2013-04-10 17:17:27.988245160 -0500
@@ -262,10 +262,11 @@ static int __ref take_cpu_down(void *_pa
}
/* Requires cpu_add_remove_lock to be held */
-static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
+static int __ref _cpu_down(const cpumask_t *cpus_to_offline, int tasks_frozen)
{
- int err, nr_calls = 0;
+ int cpu = 0, err = 0, nr_calls = 0;
void *hcpu = (void *)(long)cpu;
+ cpumask_var_t cpus_offlined;
unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
struct take_cpu_down_param tcd_param = {
.mod = mod,
@@ -278,46 +279,65 @@ static int __ref _cpu_down(unsigned int
if (!cpu_online(cpu))
return -EINVAL;
+ if (!alloc_cpumask_var(&cpus_offlined, GFP_KERNEL))
+ return -ENOMEM;
+
cpu_hotplug_begin();
+ cpumask_clear(cpus_offlined);
+ cpumask_copy(cpus_offlined, cpus_to_offline);
- err = __cpu_notify(CPU_DOWN_PREPARE | mod, hcpu, -1, &nr_calls);
- if (err) {
- nr_calls--;
- __cpu_notify(CPU_DOWN_FAILED | mod, hcpu, nr_calls, NULL);
- printk("%s: attempt to take down CPU %u failed\n",
+ for_each_cpu_mask(cpu, *cpus_to_offline) {
+ hcpu = (void *)(long)cpu;
+ if (!cpu_online(cpu))
+ continue;
+ tcd_param.hcpu = hcpu;
+ err = __cpu_notify(CPU_DOWN_PREPARE | mod, hcpu, -1, &nr_calls);
+ if (err) {
+ nr_calls--;
+ __cpu_notify(CPU_DOWN_FAILED | mod, hcpu, nr_calls, NULL);
+ pr_err("%s: attempt to take down CPU %u failed\n",
__func__, cpu);
- goto out_release;
+ goto out_release;
+ }
+ smpboot_park_threads(cpu);
}
- smpboot_park_threads(cpu);
- err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+ err = __stop_machine(take_cpu_down, &tcd_param, cpus_to_offline);
if (err) {
/* CPU didn't die: tell everyone. Can't complain. */
- smpboot_unpark_threads(cpu);
- cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
+ for_each_cpu_mask(cpu, *cpus_to_offline) {
+ hcpu = (void *)(long)cpu;
+ smpboot_unpark_threads(cpu);
+ cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
+ }
goto out_release;
}
- BUG_ON(cpu_online(cpu));
/*
* The migration_call() CPU_DYING callback will have removed all
* runnable tasks from the cpu, there's only the idle task left now
* that the migration thread is done doing the stop_machine thing.
- *
- * Wait for the stop thread to go away.
*/
- while (!idle_cpu(cpu))
- cpu_relax();
+ for_each_cpu_mask(cpu, *cpus_offlined) {
+ BUG_ON(cpu_online(cpu));
- /* This actually kills the CPU. */
- __cpu_die(cpu);
-
- /* CPU is completely dead: tell everyone. Too late to complain. */
- cpu_notify_nofail(CPU_DEAD | mod, hcpu);
-
- check_for_tasks(cpu);
+ /*
+ * Wait for the stop thread to go away.
+ */
+ while (!idle_cpu(cpu))
+ cpu_relax();
+
+ /* This actually kills the CPU. */
+ __cpu_die(cpu);
+
+ /* CPU is completely dead: tell everyone. Too late to complain. */
+ hcpu = (void *)(long)cpu;
+ cpu_notify_nofail(CPU_DEAD | mod, hcpu);
+ check_for_tasks(cpu);
+ }
out_release:
+ free_cpumask_var(cpus_offlined);
cpu_hotplug_done();
if (!err)
cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
@@ -327,6 +347,7 @@ out_release:
int __ref cpu_down(unsigned int cpu)
{
int err;
+ cpumask_var_t cpumask;
cpu_maps_update_begin();
@@ -335,7 +356,11 @@ int __ref cpu_down(unsigned int cpu)
goto out;
}
- err = _cpu_down(cpu, 0);
+ if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+ return -ENOMEM;
+ cpumask_set_cpu(cpu, cpumask);
+ err = _cpu_down(cpumask, 0);
+ free_cpumask_var(cpumask);
out:
cpu_maps_update_done();
@@ -459,7 +484,7 @@ static cpumask_var_t frozen_cpus;
int disable_nonboot_cpus(void)
{
- int cpu, first_cpu, error = 0;
+ int first_cpu, error = 0;
cpu_maps_update_begin();
first_cpu = cpumask_first(cpu_online_mask);
@@ -470,18 +495,11 @@ int disable_nonboot_cpus(void)
cpumask_clear(frozen_cpus);
printk("Disabling non-boot CPUs ...\n");
- for_each_online_cpu(cpu) {
- if (cpu == first_cpu)
- continue;
- error = _cpu_down(cpu, 1);
- if (!error)
- cpumask_set_cpu(cpu, frozen_cpus);
- else {
- printk(KERN_ERR "Error taking CPU%d down: %d\n",
- cpu, error);
- break;
- }
- }
+ cpumask_copy(frozen_cpus, cpu_online_mask);
+ cpumask_clear_cpu(first_cpu, frozen_cpus); /* all but one cpu*/
+ error = _cpu_down(frozen_cpus, 1);
+ if (error)
+ pr_err("Error %d stopping cpus\n", error);
if (!error) {
BUG_ON(num_online_cpus() > 1);
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
next prev parent reply other threads:[~2013-04-10 22:29 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 19:37 [PATCH] Do not force shutdown/reboot to boot cpu Robin Holt
2013-04-08 15:57 ` Ingo Molnar
2013-04-08 16:11 ` H. Peter Anvin
2013-04-08 16:59 ` Robin Holt
2013-04-10 11:16 ` Ingo Molnar
2013-04-10 14:01 ` Robin Holt
2013-04-10 15:10 ` Linus Torvalds
2013-04-10 15:29 ` Russ Anderson
2013-04-10 16:59 ` Ingo Molnar
2013-04-10 17:14 ` Robin Holt
2013-04-10 17:22 ` Ingo Molnar
2013-04-10 17:55 ` Robin Holt
2013-04-10 19:00 ` Robin Holt
2013-04-11 8:57 ` Ingo Molnar
2013-04-11 11:34 ` Robin Holt
2013-04-11 12:00 ` Ingo Molnar
2013-04-11 12:03 ` Robin Holt
2013-04-11 12:08 ` Robin Holt
2013-04-11 12:14 ` Ingo Molnar
2013-04-10 17:58 ` H. Peter Anvin
2013-04-10 23:02 ` Russ Anderson
2013-04-10 22:29 ` Russ Anderson [this message]
2013-04-11 5:31 ` Paul Mackerras
2013-04-11 12:45 ` Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.) Srivatsa S. Bhat
2013-04-11 13:48 ` Robin Holt
2013-04-12 5:37 ` Ingo Molnar
2013-04-12 6:09 ` Srivatsa S. Bhat
2013-04-12 9:31 ` Robin Holt
2013-04-12 10:01 ` Robin Holt
2013-04-13 16:30 ` Oleg Nesterov
2013-04-15 16:04 ` Robin Holt
2013-04-15 16:09 ` Oleg Nesterov
2013-04-15 16:10 ` Robin Holt
2013-04-13 17:01 ` Srivatsa S. Bhat
2013-04-15 10:16 ` Ingo Molnar
2013-04-15 12:02 ` Robin Holt
2013-04-15 15:59 ` Robin Holt
2013-04-16 9:40 ` Ingo Molnar
2013-04-11 14:23 ` Russ Anderson
2013-04-11 14:45 ` Srivatsa S. Bhat
2013-04-11 20:08 ` Russ Anderson
2013-04-11 20:17 ` Srivatsa S. Bhat
2013-04-11 21:08 ` Robin Holt
2013-04-08 16:54 ` [PATCH] Do not force shutdown/reboot to boot cpu Robin Holt
2013-04-08 17:07 ` Russ Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130410222910.GA8112@sgi.com \
--to=rja@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=holt@sgi.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=shawn.guo@linaro.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.