From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Arjan van de Ven <arjan@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
mikey@neuling.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
gregkh@linuxfoundation.org,
ppc-dev <linuxppc-dev@lists.ozlabs.org>,
linux-kernel@vger.kernel.org, Milton Miller <miltonm@bga.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
arjanvandeven@gmail.com,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: smp: Start up non-boot CPUs asynchronously
Date: Tue, 14 Feb 2012 13:47:21 +0530 [thread overview]
Message-ID: <4F3A1891.8060001@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120131082439.575978c0@infradead.org>
On 01/31/2012 09:54 PM, Arjan van de Ven wrote:
> From ee65be59057c920042747d46dc174c5a5a56c744 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Mon, 30 Jan 2012 20:44:51 -0800
> Subject: [PATCH] smp: Start up non-boot CPUs asynchronously
>
> The starting of the "not first" CPUs actually takes a lot of boot time
> of the kernel... upto "minutes" on some of the bigger SGI boxes.
> Right now, this is a fully sequential operation with the rest of the kernel
> boot.
>
> This patch turns this bringup of the other cpus into an asynchronous operation.
> With some other changes (not in this patch) this can save significant kernel
> boot time (upto 40% on my laptop!!).
> Basically now CPUs could get brought up in parallel to disk enumeration, graphic
> mode bringup etc etc etc.
>
> Note that the implementation in this patch still waits for all CPUs to
> be brought up before starting userspace; I would love to remove that
> restriction over time (technically that is simple), but that becomes
> then a change in behavior... I'd like to see more discussion on that
> being a good idea before I write that patch.
>
> Second note on version 2 of the patch:
> This patch does currently not save any boot time, due to a situation
> where the cpu hotplug lock gets taken for write by the cpu bringup code,
> which starves out readers of this lock throughout the kernel.
> Ingo specifically requested this behavior to expose this lock problem.
>
> CC: Milton Miller <miltonm@bga.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Ingo Molnar <mingo@elte.hu>
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
> kernel/smp.c | 21 ++++++++++++++++++++-
> 1 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index db197d6..ea48418 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -12,6 +12,8 @@
> #include <linux/gfp.h>
> #include <linux/smp.h>
> #include <linux/cpu.h>
> +#include <linux/async.h>
> +#include <linux/delay.h>
>
> #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
> static struct {
> @@ -664,17 +666,34 @@ void __init setup_nr_cpu_ids(void)
> nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
> }
>
> +void __init async_cpu_up(void *data, async_cookie_t cookie)
> +{
> + unsigned long nr = (unsigned long) data;
> + /*
> + * we can only up one cpu at a time, as enforced by the hotplug
> + * lock; it's better to wait for all earlier CPUs to be done before
> + * we bring up ours, so that the bring up order is predictable.
> + */
> + async_synchronize_cookie(cookie);
> + cpu_up(nr);
> +}
> +
> /* Called by boot processor to activate the rest. */
> void __init smp_init(void)
> {
> unsigned int cpu;
>
> /* FIXME: This should be done in userspace --RR */
> +
> + /*
> + * But until we do this in userspace, we're going to do this
> + * in parallel to the rest of the kernel boot up.-- Arjan
> + */
> for_each_present_cpu(cpu) {
> if (num_online_cpus() >= setup_max_cpus)
> break;
> if (!cpu_online(cpu))
> - cpu_up(cpu);
> + async_schedule(async_cpu_up, (void *) cpu);
> }
>
> /* Any cleanup work */
If I understand correctly, with this patch, the booting of non-boot CPUs
will happen in parallel with the rest of the kernel boot, but bringing up
of individual CPU is still serialized (due to hotplug lock).
If that is correct, I see several issues with this patch:
1. In smp_init(), after the comment "Any cleanup work" (see above), we print:
printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
So this can potentially print less than expected number of CPUs and might
surprise users.
2. Just below that we have smp_cpus_done(setup_max_cpus); and this translates
to native_smp_cpus_done() under x86, which calls impress_friends().
And that means, the bogosum calculation and the total activated processor
count which is printed, may get messed up.
3. sched_init_smp() is called immediately after smp_init(). And that calls
init_sched_domains(cpu_active_mask). Of course, it registers a hotplug
notifier callback to handle hot-added cpus.. but with this patch, boot up can
actually become unnecessarily slow at this point - what could have been done
in one go with an appropriately filled up cpu_active_mask, needs to be done
again and again using notifier callbacks. IOW, building sched domains can
potentially become a bottleneck, especially if there are lots and lots of
cpus in the machine.
4. There is an unhandled race condition (tiny window) in sched_init_smp():
get_online_cpus();
...
init_sched_domains(cpu_active_mask);
...
put_online_cpus();
<<<<<<<<<<<<<<<<<<<<<<<< There!
hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
At the point shown above, some non-boot cpus can get booted up, without
being noticed by the scheduler.
5. And in powerpc, it creates a new race condition, as explained in
https://lkml.org/lkml/2012/2/13/383
(Of course, we can fix it trivially by using get/put_online_cpus().)
There could be many more things that this patch breaks.. I haven't checked
thoroughly.
Regards,
Srivatsa S. Bhat
WARNING: multiple messages have this Message-ID (diff)
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org, Milton Miller <miltonm@bga.com>,
Andrew Morton <akpm@linux-foundation.org>,
arjanvandeven@gmail.com,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
benh@kernel.crashing.org,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
Stephen Rothwell <sfr@canb.auug.org.au>,
mikey@neuling.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
gregkh@linuxfoundation.org,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
ppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: smp: Start up non-boot CPUs asynchronously
Date: Tue, 14 Feb 2012 13:47:21 +0530 [thread overview]
Message-ID: <4F3A1891.8060001@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120131082439.575978c0@infradead.org>
On 01/31/2012 09:54 PM, Arjan van de Ven wrote:
> From ee65be59057c920042747d46dc174c5a5a56c744 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Mon, 30 Jan 2012 20:44:51 -0800
> Subject: [PATCH] smp: Start up non-boot CPUs asynchronously
>
> The starting of the "not first" CPUs actually takes a lot of boot time
> of the kernel... upto "minutes" on some of the bigger SGI boxes.
> Right now, this is a fully sequential operation with the rest of the kernel
> boot.
>
> This patch turns this bringup of the other cpus into an asynchronous operation.
> With some other changes (not in this patch) this can save significant kernel
> boot time (upto 40% on my laptop!!).
> Basically now CPUs could get brought up in parallel to disk enumeration, graphic
> mode bringup etc etc etc.
>
> Note that the implementation in this patch still waits for all CPUs to
> be brought up before starting userspace; I would love to remove that
> restriction over time (technically that is simple), but that becomes
> then a change in behavior... I'd like to see more discussion on that
> being a good idea before I write that patch.
>
> Second note on version 2 of the patch:
> This patch does currently not save any boot time, due to a situation
> where the cpu hotplug lock gets taken for write by the cpu bringup code,
> which starves out readers of this lock throughout the kernel.
> Ingo specifically requested this behavior to expose this lock problem.
>
> CC: Milton Miller <miltonm@bga.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Ingo Molnar <mingo@elte.hu>
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
> kernel/smp.c | 21 ++++++++++++++++++++-
> 1 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index db197d6..ea48418 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -12,6 +12,8 @@
> #include <linux/gfp.h>
> #include <linux/smp.h>
> #include <linux/cpu.h>
> +#include <linux/async.h>
> +#include <linux/delay.h>
>
> #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
> static struct {
> @@ -664,17 +666,34 @@ void __init setup_nr_cpu_ids(void)
> nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
> }
>
> +void __init async_cpu_up(void *data, async_cookie_t cookie)
> +{
> + unsigned long nr = (unsigned long) data;
> + /*
> + * we can only up one cpu at a time, as enforced by the hotplug
> + * lock; it's better to wait for all earlier CPUs to be done before
> + * we bring up ours, so that the bring up order is predictable.
> + */
> + async_synchronize_cookie(cookie);
> + cpu_up(nr);
> +}
> +
> /* Called by boot processor to activate the rest. */
> void __init smp_init(void)
> {
> unsigned int cpu;
>
> /* FIXME: This should be done in userspace --RR */
> +
> + /*
> + * But until we do this in userspace, we're going to do this
> + * in parallel to the rest of the kernel boot up.-- Arjan
> + */
> for_each_present_cpu(cpu) {
> if (num_online_cpus() >= setup_max_cpus)
> break;
> if (!cpu_online(cpu))
> - cpu_up(cpu);
> + async_schedule(async_cpu_up, (void *) cpu);
> }
>
> /* Any cleanup work */
If I understand correctly, with this patch, the booting of non-boot CPUs
will happen in parallel with the rest of the kernel boot, but bringing up
of individual CPU is still serialized (due to hotplug lock).
If that is correct, I see several issues with this patch:
1. In smp_init(), after the comment "Any cleanup work" (see above), we print:
printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
So this can potentially print less than expected number of CPUs and might
surprise users.
2. Just below that we have smp_cpus_done(setup_max_cpus); and this translates
to native_smp_cpus_done() under x86, which calls impress_friends().
And that means, the bogosum calculation and the total activated processor
count which is printed, may get messed up.
3. sched_init_smp() is called immediately after smp_init(). And that calls
init_sched_domains(cpu_active_mask). Of course, it registers a hotplug
notifier callback to handle hot-added cpus.. but with this patch, boot up can
actually become unnecessarily slow at this point - what could have been done
in one go with an appropriately filled up cpu_active_mask, needs to be done
again and again using notifier callbacks. IOW, building sched domains can
potentially become a bottleneck, especially if there are lots and lots of
cpus in the machine.
4. There is an unhandled race condition (tiny window) in sched_init_smp():
get_online_cpus();
...
init_sched_domains(cpu_active_mask);
...
put_online_cpus();
<<<<<<<<<<<<<<<<<<<<<<<< There!
hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
At the point shown above, some non-boot cpus can get booted up, without
being noticed by the scheduler.
5. And in powerpc, it creates a new race condition, as explained in
https://lkml.org/lkml/2012/2/13/383
(Of course, we can fix it trivially by using get/put_online_cpus().)
There could be many more things that this patch breaks.. I haven't checked
thoroughly.
Regards,
Srivatsa S. Bhat
next prev parent reply other threads:[~2012-02-14 8:17 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-31 4:54 smp: Start up non-boot CPUs asynchronously Arjan van de Ven
2012-01-31 12:52 ` Ingo Molnar
2012-01-31 13:41 ` Arjan van de Ven
2012-01-31 14:31 ` Ingo Molnar
2012-01-31 15:22 ` Arjan van de Ven
2012-01-31 16:12 ` Ingo Molnar
2012-01-31 16:24 ` Arjan van de Ven
2012-02-01 22:55 ` Andrew Morton
[not found] ` <CADyApD0yVOePmaLznks_h6xR_BCUjzEFUB7VtsL9vvsoHwCOVw@mail.gmail.com>
2012-02-01 23:31 ` Linus Torvalds
2012-02-14 8:17 ` Srivatsa S. Bhat [this message]
2012-02-14 8:17 ` Srivatsa S. Bhat
2012-02-14 9:48 ` Srivatsa S. Bhat
2012-02-14 9:48 ` Srivatsa S. Bhat
2012-02-14 14:31 ` Arjan van de Ven
2012-02-14 15:20 ` Peter Zijlstra
2012-02-14 15:20 ` Peter Zijlstra
2012-02-14 16:01 ` Arjan van de Ven
2012-02-14 19:57 ` Srivatsa S. Bhat
2012-02-14 19:57 ` Srivatsa S. Bhat
2012-02-14 20:00 ` Peter Zijlstra
2012-02-14 20:00 ` Peter Zijlstra
2012-02-14 21:02 ` Arjan van de Ven
2012-02-14 21:02 ` Arjan van de Ven
2012-02-14 19:32 ` Srivatsa S. Bhat
2012-02-14 19:32 ` Srivatsa S. Bhat
2012-02-14 21:28 ` Benjamin Herrenschmidt
2012-02-14 21:28 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F3A1891.8060001@linux.vnet.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=arjanvandeven@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mikey@neuling.org \
--cc=miltonm@bga.com \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.