* Re: [PATCH] percpu data: only iterate over possible CPUs
[not found] <200602051959.k15JxoHK001630@hera.kernel.org>
@ 2006-02-07 15:15 ` Heiko Carstens
2006-02-07 15:31 ` Jens Axboe
2006-02-07 16:25 ` Eric Dumazet
2006-02-08 22:31 ` Rik van Riel
1 sibling, 2 replies; 58+ messages in thread
From: Heiko Carstens @ 2006-02-07 15:15 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Eric Dumazet, David S. Miller, James Bottomley, Ingo Molnar,
Jens Axboe, Anton Blanchard, William Irwin, Andi Kleen,
Andrew Morton, Linus Torvalds
> tree 8c30052a0d7fadec37c785a42a71b28d0a9c5fcf
> parent cef5076987dd545ac74f4efcf1c962be8eac34b0
> author Eric Dumazet <dada1@cosmosbay.com> Sun, 05 Feb 2006 15:27:36 -0800
> committer Linus Torvalds <torvalds@g5.osdl.org> Mon, 06 Feb 2006 03:06:51 -0800
>
> [PATCH] percpu data: only iterate over possible CPUs
>
> percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
> cpudata, instead of allocating memory only for possible cpus.
>
> As a preparation for changing that, we need to convert various 0 -> NR_CPUS
> loops to use for_each_cpu().
>
> (The above only applies to users of asm-generic/percpu.h. powerpc has gone it
> alone and is presently only allocating memory for present CPUs, so it's
> currently corrupting memory).
This patch is broken since it replaces several loops that iterate NR_CPUS
times with for_each_cpu before cpu_possible_map is setup:
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list
> void __init files_defer_init(void)
> {
> int i;
> - /* Really early - can't use for_each_cpu */
> - for (i = 0; i < NR_CPUS; i++)
> + for_each_cpu(i)
> fdtable_defer_list_init(i);
> }
The old comment indicates it: called before smp_prepare_cpus gets called
which sets up cpu_possible_map.
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f77f23f..839466f 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -6109,7 +6109,7 @@ void __init sched_init(void)
> runqueue_t *rq;
> int i, j, k;
>
> - for (i = 0; i < NR_CPUS; i++) {
> + for_each_cpu(i) {
> prio_array_t *array;
Same here.
I didn't check the rest, but it looks like we end up with a bit of
uninitialized stuff.
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 15:15 ` Heiko Carstens
@ 2006-02-07 15:31 ` Jens Axboe
2006-02-07 16:25 ` Eric Dumazet
1 sibling, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2006-02-07 15:31 UTC (permalink / raw)
To: Heiko Carstens
Cc: Linux Kernel Mailing List, Eric Dumazet, David S. Miller,
James Bottomley, Ingo Molnar, Anton Blanchard, William Irwin,
Andi Kleen, Andrew Morton, Linus Torvalds
On Tue, Feb 07 2006, Heiko Carstens wrote:
> > tree 8c30052a0d7fadec37c785a42a71b28d0a9c5fcf
> > parent cef5076987dd545ac74f4efcf1c962be8eac34b0
> > author Eric Dumazet <dada1@cosmosbay.com> Sun, 05 Feb 2006 15:27:36 -0800
> > committer Linus Torvalds <torvalds@g5.osdl.org> Mon, 06 Feb 2006 03:06:51 -0800
> >
> > [PATCH] percpu data: only iterate over possible CPUs
> >
> > percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
> > cpudata, instead of allocating memory only for possible cpus.
> >
> > As a preparation for changing that, we need to convert various 0 -> NR_CPUS
> > loops to use for_each_cpu().
> >
> > (The above only applies to users of asm-generic/percpu.h. powerpc has gone it
> > alone and is presently only allocating memory for present CPUs, so it's
> > currently corrupting memory).
>
> This patch is broken since it replaces several loops that iterate NR_CPUS
> times with for_each_cpu before cpu_possible_map is setup:
Hrmpf, chicking and egg - we must not initialize data for unknown CPUs,
but we can't check since it's not setup.
To me it sounds really broken that core structures like that are not
setup before we init eg fs stuff.
--
Jens Axboe
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 15:15 ` Heiko Carstens
2006-02-07 15:31 ` Jens Axboe
@ 2006-02-07 16:25 ` Eric Dumazet
2006-02-07 16:42 ` Linus Torvalds
1 sibling, 1 reply; 58+ messages in thread
From: Eric Dumazet @ 2006-02-07 16:25 UTC (permalink / raw)
To: Heiko Carstens
Cc: Linux Kernel Mailing List, David S. Miller, James Bottomley,
Ingo Molnar, Jens Axboe, Anton Blanchard, William Irwin,
Andi Kleen, Andrew Morton, Linus Torvalds
Heiko Carstens a écrit :
>> tree 8c30052a0d7fadec37c785a42a71b28d0a9c5fcf
>> parent cef5076987dd545ac74f4efcf1c962be8eac34b0
>> author Eric Dumazet <dada1@cosmosbay.com> Sun, 05 Feb 2006 15:27:36 -0800
>> committer Linus Torvalds <torvalds@g5.osdl.org> Mon, 06 Feb 2006 03:06:51 -0800
>>
>> [PATCH] percpu data: only iterate over possible CPUs
>>
>> percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
>> cpudata, instead of allocating memory only for possible cpus.
>>
>> As a preparation for changing that, we need to convert various 0 -> NR_CPUS
>> loops to use for_each_cpu().
>>
>> (The above only applies to users of asm-generic/percpu.h. powerpc has gone it
>> alone and is presently only allocating memory for present CPUs, so it's
>> currently corrupting memory).
>
> This patch is broken since it replaces several loops that iterate NR_CPUS
> times with for_each_cpu before cpu_possible_map is setup:
This patch assumes that cpu_possible_map is setup before setup_per_cpu_areas().
That sounds a reasonable assumption, but maybe not on your architecture ?
I dont think cpu_possible_map has to be filled at smp_prepare_cpus() time, but
long before.
On i386/x86_64/ia64, this is done from setup_arch() called from start_kernel()
just before setup_per_cpu_areas()
On powerpc it's done from setup_system(), called before start_kernel()
Eric
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 16:25 ` Eric Dumazet
@ 2006-02-07 16:42 ` Linus Torvalds
2006-02-07 17:34 ` Andrew Morton
0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2006-02-07 16:42 UTC (permalink / raw)
To: Eric Dumazet
Cc: Heiko Carstens, Linux Kernel Mailing List, David S. Miller,
James Bottomley, Ingo Molnar, Jens Axboe, Anton Blanchard,
William Irwin, Andi Kleen, Andrew Morton
On Tue, 7 Feb 2006, Eric Dumazet wrote:
>
> This patch assumes that cpu_possible_map is setup before
> setup_per_cpu_areas().
>
> That sounds a reasonable assumption, but maybe not on your architecture ?
I have to say, it sounds not just like a reasonable assumption, but like
the only sane assumption that there _could_ be.
> I dont think cpu_possible_map has to be filled at smp_prepare_cpus() time, but
> long before.
>
> On i386/x86_64/ia64, this is done from setup_arch() called from start_kernel()
> just before setup_per_cpu_areas()
>
> On powerpc it's done from setup_system(), called before start_kernel()
It absolutely _has_ to be done from setup_arch() or earlier, as shown by
the fact that "setup_per_cpu_areas()" is the very next thing that
init/main.c calls (and clearly, that needs to know what CPU's are
possible).
ppc64 certainly calls it early enough, as does x86/x86-64/ia64. I don't
see anybody else doing it too late either.
Heiko, can you point to the "old comment" you mentioned in the email, or
the architecture that does this wrong?
Linus
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 16:42 ` Linus Torvalds
@ 2006-02-07 17:34 ` Andrew Morton
2006-02-07 17:48 ` Linus Torvalds
0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-07 17:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: dada1, heiko.carstens, linux-kernel, davem, James.Bottomley,
mingo, axboe, anton, wli, ak
Linus Torvalds <torvalds@osdl.org> wrote:
>
>
>
> On Tue, 7 Feb 2006, Eric Dumazet wrote:
> >
> > This patch assumes that cpu_possible_map is setup before
> > setup_per_cpu_areas().
> >
> > That sounds a reasonable assumption, but maybe not on your architecture ?
>
> I have to say, it sounds not just like a reasonable assumption, but like
> the only sane assumption that there _could_ be.
>
> > I dont think cpu_possible_map has to be filled at smp_prepare_cpus() time, but
> > long before.
> >
> > On i386/x86_64/ia64, this is done from setup_arch() called from start_kernel()
> > just before setup_per_cpu_areas()
> >
> > On powerpc it's done from setup_system(), called before start_kernel()
>
> It absolutely _has_ to be done from setup_arch() or earlier, as shown by
> the fact that "setup_per_cpu_areas()" is the very next thing that
> init/main.c calls (and clearly, that needs to know what CPU's are
> possible).
>
> ppc64 certainly calls it early enough, as does x86/x86-64/ia64. I don't
> see anybody else doing it too late either.
>
> Heiko, can you point to the "old comment" you mentioned in the email, or
> the architecture that does this wrong?
>
This one:
--- devel/fs/file.c~reduce-size-of-percpudata-and-make-sure-per_cpuobject 2006-02-04 23:27:17.000000000 -0800
+++ devel-akpm/fs/file.c 2006-02-04 23:27:17.000000000 -0800
@@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list
void __init files_defer_init(void)
{
int i;
- /* Really early - can't use for_each_cpu */
- for (i = 0; i < NR_CPUS; i++)
+ for_each_cpu(i)
fdtable_defer_list_init(i);
}
And yes, me too - when I saw that comment disappear I checked and decided
that the comment was both wrong and undesirable.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 17:34 ` Andrew Morton
@ 2006-02-07 17:48 ` Linus Torvalds
2006-02-07 18:30 ` Dipankar Sarma
0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2006-02-07 17:48 UTC (permalink / raw)
To: Andrew Morton
Cc: dada1, heiko.carstens, linux-kernel, davem, James.Bottomley,
mingo, axboe, anton, wli, ak
On Tue, 7 Feb 2006, Andrew Morton wrote:
>
> This one:
>
> --- devel/fs/file.c~reduce-size-of-percpudata-and-make-sure-per_cpuobject 2006-02-04 23:27:17.000000000 -0800
> +++ devel-akpm/fs/file.c 2006-02-04 23:27:17.000000000 -0800
> @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list
> void __init files_defer_init(void)
> {
> int i;
> - /* Really early - can't use for_each_cpu */
> - for (i = 0; i < NR_CPUS; i++)
> + for_each_cpu(i)
> fdtable_defer_list_init(i);
> }
>
> And yes, me too - when I saw that comment disappear I checked and decided
> that the comment was both wrong and undesirable.
Ahh, yes. The comment is totally incorrect, we must have done the CPU
setup much too later a long long time ago ;)
Linus
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 17:48 ` Linus Torvalds
@ 2006-02-07 18:30 ` Dipankar Sarma
2006-02-07 18:43 ` Linus Torvalds
0 siblings, 1 reply; 58+ messages in thread
From: Dipankar Sarma @ 2006-02-07 18:30 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, dada1, heiko.carstens, linux-kernel, davem,
James.Bottomley, mingo, axboe, anton, wli, ak
On Tue, Feb 07, 2006 at 09:48:41AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 7 Feb 2006, Andrew Morton wrote:
> >
> > This one:
> >
> > --- devel/fs/file.c~reduce-size-of-percpudata-and-make-sure-per_cpuobject 2006-02-04 23:27:17.000000000 -0800
> > +++ devel-akpm/fs/file.c 2006-02-04 23:27:17.000000000 -0800
> > @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list
> > void __init files_defer_init(void)
> > {
> > int i;
> > - /* Really early - can't use for_each_cpu */
> > - for (i = 0; i < NR_CPUS; i++)
> > + for_each_cpu(i)
> > fdtable_defer_list_init(i);
> > }
> >
> > And yes, me too - when I saw that comment disappear I checked and decided
> > that the comment was both wrong and undesirable.
>
> Ahh, yes. The comment is totally incorrect, we must have done the CPU
> setup much too later a long long time ago ;)
One would think so, but I recall not all archs did that. Alpha for
example sets up cpu_possible_map in smp_prepare_cpus(). It however
makes more sense to fix the arch then use NR_CPUS, IMO.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 18:30 ` Dipankar Sarma
@ 2006-02-07 18:43 ` Linus Torvalds
2006-02-07 18:53 ` Dipankar Sarma
0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2006-02-07 18:43 UTC (permalink / raw)
To: Dipankar Sarma
Cc: Andrew Morton, dada1, heiko.carstens, linux-kernel, davem,
James.Bottomley, mingo, axboe, anton, wli, ak
On Wed, 8 Feb 2006, Dipankar Sarma wrote:
>
> One would think so, but I recall not all archs did that. Alpha for
> example sets up cpu_possible_map in smp_prepare_cpus(). It however
> makes more sense to fix the arch then use NR_CPUS, IMO.
Ehh? alpha does it in setup_smp(), which in turn is called very early from
setup_arch().
Were you perhaps thinking of something else? Or am I just going blind and
confused?
Linus
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 18:43 ` Linus Torvalds
@ 2006-02-07 18:53 ` Dipankar Sarma
2006-02-07 19:11 ` Linus Torvalds
0 siblings, 1 reply; 58+ messages in thread
From: Dipankar Sarma @ 2006-02-07 18:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, dada1, heiko.carstens, linux-kernel, davem,
James.Bottomley, mingo, axboe, anton, wli, ak
On Tue, Feb 07, 2006 at 10:43:55AM -0800, Linus Torvalds wrote:
>
>
> On Wed, 8 Feb 2006, Dipankar Sarma wrote:
> >
> > One would think so, but I recall not all archs did that. Alpha for
> > example sets up cpu_possible_map in smp_prepare_cpus(). It however
> > makes more sense to fix the arch then use NR_CPUS, IMO.
>
> Ehh? alpha does it in setup_smp(), which in turn is called very early from
> setup_arch().
>
> Were you perhaps thinking of something else? Or am I just going blind and
> confused?
I am looking at 2.6.16-rc1 and I don't see cpu_possible_map
being set in setup_smp(). That said, it seems alpha setup_smp()
probes for cpus there, so there is no reason why it cannot
be set there. I think it is wrong not to set cpu_possible_map
very early.
Or perhaps it got fixed later on, in which case, oh well, I need
to download more often. <sigh>.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 18:53 ` Dipankar Sarma
@ 2006-02-07 19:11 ` Linus Torvalds
2006-02-08 4:40 ` Heiko Carstens
2006-02-08 8:55 ` Ivan Kokshaysky
0 siblings, 2 replies; 58+ messages in thread
From: Linus Torvalds @ 2006-02-07 19:11 UTC (permalink / raw)
To: Dipankar Sarma, Richard Henderson, Ivan Kokshaysky
Cc: Andrew Morton, dada1, heiko.carstens, Linux Kernel Mailing List,
David S. Miller, James.Bottomley, Ingo Molnar, axboe, anton, wli,
ak
On Wed, 8 Feb 2006, Dipankar Sarma wrote:
>
> I am looking at 2.6.16-rc1 and I don't see cpu_possible_map
> being set in setup_smp()
You're right, my bad. I looked at setup_smp() and how it walked through
every CPU in the firmware, but it doesn't actually ever set the possible
map, it fills in just hwrpb_cpu_present_mask (which is then then only used
_later_ to set cpu_possible_map for some silly reason).
As far as I can tell, "hwrpb_cpu_present_mask" is just wrong, and the code
_should_ be using cpu_possible_map.
rth? Ivan?
Linus
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 19:11 ` Linus Torvalds
@ 2006-02-08 4:40 ` Heiko Carstens
2006-02-08 8:55 ` Ivan Kokshaysky
1 sibling, 0 replies; 58+ messages in thread
From: Heiko Carstens @ 2006-02-08 4:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Dipankar Sarma, Richard Henderson, Ivan Kokshaysky, Andrew Morton,
dada1, Linux Kernel Mailing List, David S. Miller,
James.Bottomley, Ingo Molnar, axboe, anton, wli, ak
> > I am looking at 2.6.16-rc1 and I don't see cpu_possible_map
> > being set in setup_smp()
>
> You're right, my bad. I looked at setup_smp() and how it walked through
> every CPU in the firmware, but it doesn't actually ever set the possible
> map, it fills in just hwrpb_cpu_present_mask (which is then then only used
> _later_ to set cpu_possible_map for some silly reason).
>
> As far as I can tell, "hwrpb_cpu_present_mask" is just wrong, and the code
> _should_ be using cpu_possible_map.
We still have this one in init/main.c:
/* Sets up cpus_possible() */
smp_prepare_cpus(max_cpus);
That is actually why s390 is doing this in smp_prepare_cpus. We also use the
passed value of max_cpus to set the number of bits in cpu_possible_map
accordingly. This isn't possible anymore if this should be done in setup_arch.
So it looks like we have to switch to setup_arch and set NR_CPUS bits in
cpu_possible_map on s390.
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-07 19:11 ` Linus Torvalds
2006-02-08 4:40 ` Heiko Carstens
@ 2006-02-08 8:55 ` Ivan Kokshaysky
1 sibling, 0 replies; 58+ messages in thread
From: Ivan Kokshaysky @ 2006-02-08 8:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Dipankar Sarma, Richard Henderson, Andrew Morton, dada1,
heiko.carstens, Linux Kernel Mailing List, David S. Miller,
James.Bottomley, Ingo Molnar, axboe, anton, wli, ak
On Tue, Feb 07, 2006 at 11:11:56AM -0800, Linus Torvalds wrote:
> You're right, my bad. I looked at setup_smp() and how it walked through
> every CPU in the firmware, but it doesn't actually ever set the possible
> map, it fills in just hwrpb_cpu_present_mask (which is then then only used
> _later_ to set cpu_possible_map for some silly reason).
>
> As far as I can tell, "hwrpb_cpu_present_mask" is just wrong, and the code
> _should_ be using cpu_possible_map.
Yep, it seems that we can get rid of hwrpb_cpu_present_mask.
The appended patch is only minimally tested (works on UP with SMP kernel).
Ivan.
--- linux/arch/alpha/kernel/smp.c.orig Wed Jan 18 02:09:27 2006
+++ linux/arch/alpha/kernel/smp.c Wed Feb 8 02:38:46 2006
@@ -73,9 +73,6 @@ cpumask_t cpu_online_map;
EXPORT_SYMBOL(cpu_online_map);
-/* cpus reported in the hwrpb */
-static unsigned long hwrpb_cpu_present_mask __initdata = 0;
-
int smp_num_probed; /* Internal processor count */
int smp_num_cpus = 1; /* Number that came online. */
@@ -442,7 +439,7 @@ setup_smp(void)
if ((cpu->flags & 0x1cc) == 0x1cc) {
smp_num_probed++;
/* Assume here that "whami" == index */
- hwrpb_cpu_present_mask |= (1UL << i);
+ cpu_set(i, cpu_possible_map);
cpu->pal_revision = boot_cpu_palrev;
}
@@ -453,12 +450,12 @@ setup_smp(void)
}
} else {
smp_num_probed = 1;
- hwrpb_cpu_present_mask = (1UL << boot_cpuid);
+ cpu_set(boot_cpuid, cpu_possible_map);
}
cpu_present_mask = cpumask_of_cpu(boot_cpuid);
printk(KERN_INFO "SMP: %d CPUs probed -- cpu_present_mask = %lx\n",
- smp_num_probed, hwrpb_cpu_present_mask);
+ smp_num_probed, cpu_possible_map.bits[0]);
}
/*
@@ -467,8 +464,6 @@ setup_smp(void)
void __init
smp_prepare_cpus(unsigned int max_cpus)
{
- int cpu_count, i;
-
/* Take care of some initial bookkeeping. */
memset(ipi_data, 0, sizeof(ipi_data));
@@ -486,19 +481,7 @@ smp_prepare_cpus(unsigned int max_cpus)
printk(KERN_INFO "SMP starting up secondaries.\n");
- cpu_count = 1;
- for (i = 0; (i < NR_CPUS) && (cpu_count < max_cpus); i++) {
- if (i == boot_cpuid)
- continue;
-
- if (((hwrpb_cpu_present_mask >> i) & 1) == 0)
- continue;
-
- cpu_set(i, cpu_possible_map);
- cpu_count++;
- }
-
- smp_num_cpus = cpu_count;
+ smp_num_cpus = smp_num_probed;
}
void __devinit
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
[not found] <200602051959.k15JxoHK001630@hera.kernel.org>
2006-02-07 15:15 ` Heiko Carstens
@ 2006-02-08 22:31 ` Rik van Riel
2006-02-09 1:20 ` Rik van Riel
` (2 more replies)
1 sibling, 3 replies; 58+ messages in thread
From: Rik van Riel @ 2006-02-08 22:31 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Eric Dumazet, Linus Torvalds
On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote:
> [PATCH] percpu data: only iterate over possible CPUs
This sched.c bit breaks Xen, and probably also other architectures
that have CPU hotplug. I suspect the reason is that during early
bootup only the boot CPU is online, so nothing initialises the
runqueues for CPUs that are brought up afterwards.
I suspect we can get rid of this problem quite easily by moving
runqueue initialisation to init_idle()...
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f77f23f..839466f 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -6109,7 +6109,7 @@ void __init sched_init(void)
> runqueue_t *rq;
> int i, j, k;
>
> - for (i = 0; i < NR_CPUS; i++) {
> + for_each_cpu(i) {
> prio_array_t *array;
>
> rq = cpu_rq(i);
--
All Rights Reversed
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-08 22:31 ` Rik van Riel
@ 2006-02-09 1:20 ` Rik van Riel
2006-02-09 3:05 ` Andrew Morton
2006-02-09 4:39 ` Eric Dumazet
2 siblings, 0 replies; 58+ messages in thread
From: Rik van Riel @ 2006-02-09 1:20 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Eric Dumazet, Linus Torvalds, xen-devel
On Wed, 8 Feb 2006, Rik van Riel wrote:
> On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote:
>
> > [PATCH] percpu data: only iterate over possible CPUs
>
> The sched.c bit breaks Xen, and probably also other architectures
> that have CPU hotplug. I suspect the reason is that during early
> bootup only the boot CPU is online, so nothing initialises the
> runqueues for CPUs that are brought up afterwards.
>
> I suspect we can get rid of this problem quite easily by moving
> runqueue initialisation to init_idle()...
Well, it works. This (fairly trivial) patch makes hotplug cpu
work again, by ensuring that the runqueues of a newly brought
up CPU are initialized just before they are needed.
Without this patch the "spin_lock_irqsave(&rq->lock, flags);"
in init_idle() would oops if CONFIG_DEBUG_SPINLOCK was set.
With this patch, things just work.
Signed-off-by: Rik van Riel <riel@redhat.com>
--- linux-2.6.15.i686/kernel/sched.c.idle_init 2006-02-08 17:56:50.000000000 -0500
+++ linux-2.6.15.i686/kernel/sched.c 2006-02-08 17:58:57.000000000 -0500
@@ -4437,6 +4437,35 @@ void __devinit init_idle(task_t *idle, i
{
runqueue_t *rq = cpu_rq(cpu);
unsigned long flags;
+ prio_array_t *array;
+ int j, k;
+
+ spin_lock_init(&rq->lock);
+ rq->nr_running = 0;
+ rq->active = rq->arrays;
+ rq->expired = rq->arrays + 1;
+ rq->best_expired_prio = MAX_PRIO;
+
+#ifdef CONFIG_SMP
+ rq->sd = NULL;
+ for (j = 1; j < 3; j++)
+ rq->cpu_load[j] = 0;
+ rq->active_balance = 0;
+ rq->push_cpu = 0;
+ rq->migration_thread = NULL;
+ INIT_LIST_HEAD(&rq->migration_queue);
+#endif
+ atomic_set(&rq->nr_iowait, 0);
+
+ for (j = 0; j < 2; j++) {
+ array = rq->arrays + j;
+ for (k = 0; k < MAX_PRIO; k++) {
+ INIT_LIST_HEAD(array->queue + k);
+ __clear_bit(k, array->bitmap);
+ }
+ // delimiter for bitsearch
+ __set_bit(MAX_PRIO, array->bitmap);
+ }
idle->sleep_avg = 0;
idle->array = NULL;
@@ -6110,41 +6139,6 @@ int in_sched_functions(unsigned long add
void __init sched_init(void)
{
- runqueue_t *rq;
- int i, j, k;
-
- for_each_cpu(i) {
- prio_array_t *array;
-
- rq = cpu_rq(i);
- spin_lock_init(&rq->lock);
- rq->nr_running = 0;
- rq->active = rq->arrays;
- rq->expired = rq->arrays + 1;
- rq->best_expired_prio = MAX_PRIO;
-
-#ifdef CONFIG_SMP
- rq->sd = NULL;
- for (j = 1; j < 3; j++)
- rq->cpu_load[j] = 0;
- rq->active_balance = 0;
- rq->push_cpu = 0;
- rq->migration_thread = NULL;
- INIT_LIST_HEAD(&rq->migration_queue);
-#endif
- atomic_set(&rq->nr_iowait, 0);
-
- for (j = 0; j < 2; j++) {
- array = rq->arrays + j;
- for (k = 0; k < MAX_PRIO; k++) {
- INIT_LIST_HEAD(array->queue + k);
- __clear_bit(k, array->bitmap);
- }
- // delimiter for bitsearch
- __set_bit(MAX_PRIO, array->bitmap);
- }
- }
-
/*
* The boot idle thread does lazy MMU switching as well:
*/
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-08 22:31 ` Rik van Riel
2006-02-09 1:20 ` Rik van Riel
@ 2006-02-09 3:05 ` Andrew Morton
2006-02-09 3:08 ` Andrew Morton
2006-02-09 4:39 ` Eric Dumazet
2 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 3:05 UTC (permalink / raw)
To: Rik van Riel
Cc: linux-kernel, dada1, Linus Torvalds, Ingo Molnar, Andi Kleen,
Chuck Ebbert, Eric Dumazet, William Lee Irwin III, Heiko Carstens
Rik van Riel <riel@redhat.com> wrote:
>
> On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote:
>
> > [PATCH] percpu data: only iterate over possible CPUs
>
> This sched.c bit breaks Xen, and probably also other architectures
> that have CPU hotplug. I suspect the reason is that during early
> bootup only the boot CPU is online, so nothing initialises the
> runqueues for CPUs that are brought up afterwards.
>
> I suspect we can get rid of this problem quite easily by moving
> runqueue initialisation to init_idle()...
We've hit this snag with a few architectures. They're setting up
cpu_possible_map too late. It's never been clearly defined.
sched_init() is called here:
asmlinkage void __init start_kernel(void)
{
...
printk(linux_banner);
setup_arch(&command_line);
setup_per_cpu_areas();
smp_prepare_boot_cpu();
sched_init();
Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised
by the time setup_per_cpu_areas() is called, so I think it makes sense to
say "thou shalt initialise cpu_possible_map in setup_arch()".
I guess Xen isn't doing that. Can it be made to?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 3:05 ` Andrew Morton
@ 2006-02-09 3:08 ` Andrew Morton
2006-02-09 4:36 ` Eric Dumazet
0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 3:08 UTC (permalink / raw)
To: Andrew Morton
Cc: riel, linux-kernel, dada1, torvalds, mingo, ak, 76306.1226, wli,
heiko.carstens
Andrew Morton <akpm@osdl.org> wrote:
>
> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised
> by the time setup_per_cpu_areas() is called,
err, they'll need it once Eric's
dont-waste-percpu-memory-on-not-possible-CPUs patch is merged..
> so I think it makes sense to
> say "thou shalt initialise cpu_possible_map in setup_arch()".
>
> I guess Xen isn't doing that. Can it be made to?
Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch().
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 3:08 ` Andrew Morton
@ 2006-02-09 4:36 ` Eric Dumazet
2006-02-09 4:45 ` Andrew Morton
0 siblings, 1 reply; 58+ messages in thread
From: Eric Dumazet @ 2006-02-09 4:36 UTC (permalink / raw)
To: Andrew Morton
Cc: riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli,
heiko.carstens
Andrew Morton a écrit :
> Andrew Morton <akpm@osdl.org> wrote:
>> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised
>> by the time setup_per_cpu_areas() is called,
>
> err, they'll need it once Eric's
> dont-waste-percpu-memory-on-not-possible-CPUs patch is merged..
>
>> so I think it makes sense to
>> say "thou shalt initialise cpu_possible_map in setup_arch()".
>>
>> I guess Xen isn't doing that. Can it be made to?
>
> Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch().
I dont understand why this HOTPLUG stuff is problematic for Xen (or other
arch) : If CONFIG_HOTPLUG_CPU is configured, then the map should be preset to
CPU_MASK_ALL. Its even documented in line 332 of include/linux/cpumask.h
* #ifdef CONFIG_HOTPLUG_CPU
* cpu_possible_map - all NR_CPUS bits set
arch/i386/kernel/smpboot.c is doing the only sane stuff about it :
#ifdef CONFIG_HOTPLUG_CPU
cpumask_t cpu_possible_map = CPU_MASK_ALL;
#else
cpumask_t cpu_possible_map;
#endif
Some remarks :
1) These cpu_possible_map could have __read_mostly attribute.
2) cpu_possible(cpu) macro could be defined to 1 if CONFIG_HOTPLUG_CPU, or a
test against NR_CPUS
#ifdef CONFIG_HOTPLUG_CPU
#define cpu_possible(cpu) cpu_isset((cpu), cpu_possible_map)
#else
#define cpu_possible(cpu) ((unsigned)(cpu) < NR_CPUS)
#endif
Eric
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-08 22:31 ` Rik van Riel
2006-02-09 1:20 ` Rik van Riel
2006-02-09 3:05 ` Andrew Morton
@ 2006-02-09 4:39 ` Eric Dumazet
2006-02-09 4:46 ` Nick Piggin
2 siblings, 1 reply; 58+ messages in thread
From: Eric Dumazet @ 2006-02-09 4:39 UTC (permalink / raw)
To: Rik van Riel; +Cc: Linux Kernel Mailing List, Linus Torvalds
Rik van Riel a écrit :
> On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote:
>
>> [PATCH] percpu data: only iterate over possible CPUs
>
> This sched.c bit breaks Xen, and probably also other architectures
> that have CPU hotplug. I suspect the reason is that during early
> bootup only the boot CPU is online, so nothing initialises the
> runqueues for CPUs that are brought up afterwards.
>
> I suspect we can get rid of this problem quite easily by moving
> runqueue initialisation to init_idle()...
Please fix Xen to match include/linux/cpumask.h documentation that says :
/*
* The following particular system cpumasks and operations manage
* possible, present and online cpus. Each of them is a fixed size
* bitmap of size NR_CPUS.
*
* #ifdef CONFIG_HOTPLUG_CPU
* cpu_possible_map - all NR_CPUS bits set
* cpu_present_map - has bit 'cpu' set iff cpu is populated
* cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
* #else
* cpu_possible_map - has bit 'cpu' set iff cpu is populated
* cpu_present_map - copy of cpu_possible_map
* cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
* #endif
*/
Eric
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 4:36 ` Eric Dumazet
@ 2006-02-09 4:45 ` Andrew Morton
2006-02-09 4:56 ` Paul Jackson
2006-02-09 16:08 ` Nathan Lynch
0 siblings, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 4:45 UTC (permalink / raw)
To: Eric Dumazet
Cc: riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli,
heiko.carstens, Paul Jackson
Eric Dumazet <dada1@cosmosbay.com> wrote:
>
> Andrew Morton a écrit :
> > Andrew Morton <akpm@osdl.org> wrote:
> >> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised
> >> by the time setup_per_cpu_areas() is called,
> >
> > err, they'll need it once Eric's
> > dont-waste-percpu-memory-on-not-possible-CPUs patch is merged..
> >
> >> so I think it makes sense to
> >> say "thou shalt initialise cpu_possible_map in setup_arch()".
> >>
> >> I guess Xen isn't doing that. Can it be made to?
> >
> > Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch().
>
> I dont understand why this HOTPLUG stuff is problematic for Xen (or other
> arch) : If CONFIG_HOTPLUG_CPU is configured, then the map should be preset to
> CPU_MASK_ALL.
Presumably not all architectures are doing that. And some of the problems
we've had are with CONFIG_HOTPLUG_CPU=n.
> Its even documented in line 332 of include/linux/cpumask.h
>
> * #ifdef CONFIG_HOTPLUG_CPU
> * cpu_possible_map - all NR_CPUS bits set
That seems a quite bad idea. If we know which CPUs are possible we should
populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
set. Setting all the bits like that wastes memory and wastes CPU cycles.
<greps>
That comment came from the tender pinkies of pj@sgi.com, although I suspect
it was just a transliteration of then-current practice.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 4:39 ` Eric Dumazet
@ 2006-02-09 4:46 ` Nick Piggin
0 siblings, 0 replies; 58+ messages in thread
From: Nick Piggin @ 2006-02-09 4:46 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Rik van Riel, Linux Kernel Mailing List, Linus Torvalds
Eric Dumazet wrote:
> Rik van Riel a écrit :
>
>> On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote:
>>
>>> [PATCH] percpu data: only iterate over possible CPUs
>>
>>
>> This sched.c bit breaks Xen, and probably also other architectures
>> that have CPU hotplug. I suspect the reason is that during early
>> bootup only the boot CPU is online, so nothing initialises the
>> runqueues for CPUs that are brought up afterwards.
>>
>> I suspect we can get rid of this problem quite easily by moving
>> runqueue initialisation to init_idle()...
>
>
> Please fix Xen to match include/linux/cpumask.h documentation that says :
>
> /*
> * The following particular system cpumasks and operations manage
> * possible, present and online cpus. Each of them is a fixed size
> * bitmap of size NR_CPUS.
> *
> * #ifdef CONFIG_HOTPLUG_CPU
> * cpu_possible_map - all NR_CPUS bits set
Note that this shouldn't have to be all NR_CPUs if the platform
can determine all possible hotpluggable CPUs.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 4:45 ` Andrew Morton
@ 2006-02-09 4:56 ` Paul Jackson
2006-02-09 16:08 ` Nathan Lynch
1 sibling, 0 replies; 58+ messages in thread
From: Paul Jackson @ 2006-02-09 4:56 UTC (permalink / raw)
To: Andrew Morton
Cc: dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli,
heiko.carstens
Andrew wrote:
> That comment came from the tender pinkies of pj@sgi.com, although I suspect
> it was just a transliteration of then-current practice.
You give my pinkie more credit than is its due.
That comment looks bogus.
Hmmm ... what should it be ...
* #ifdef CONFIG_HOTPLUG_CPU
* cpu_possible_map - has bit 'cpu' set iff cpu could be populated
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
@ 2006-02-09 8:32 Chuck Ebbert
2006-02-09 8:55 ` Paul Jackson
2006-02-09 9:06 ` Andrew Morton
0 siblings, 2 replies; 58+ messages in thread
From: Chuck Ebbert @ 2006-02-09 8:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Jackson, heiko.carstens, wli, ak, mingo, torvalds,
linux-kernel, riel, Eric Dumazet
In-Reply-To: <20060208204502.12513ae5.akpm@osdl.org>
On Wed, 8 Feb 2006 at 20:45:02 -0800, Andrew Morton wrote:
> > Its even documented in line 332 of include/linux/cpumask.h
> >
> > * #ifdef CONFIG_HOTPLUG_CPU
> > * cpu_possible_map - all NR_CPUS bits set
>
> That seems a quite bad idea. If we know which CPUs are possible we should
> populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
> set.
I don't think that's, um, "possible." Even if you could discover how many
empty sockets there were in a system, someone might be able to hotplug
a board with more of them on it. And there's no way to tell how many CPUs
to reserve for each socket anyway, e.g. AMD has already announced quad-core
processors.
But what really surprised me is that for_each_cpu() actually walks
cpu_possible_map and not cpu_present_map as I had assumed. This violates
the Principle Of Least Surprise. Maybe renaming for_each_cpu to
for_each_possible_cpu might be a good idea?
--
Chuck
"Equations are the Devil's sentences." --Stephen Colbert
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 8:32 [PATCH] percpu data: only iterate over possible CPUs Chuck Ebbert
@ 2006-02-09 8:55 ` Paul Jackson
2006-02-09 9:06 ` Andrew Morton
1 sibling, 0 replies; 58+ messages in thread
From: Paul Jackson @ 2006-02-09 8:55 UTC (permalink / raw)
To: Chuck Ebbert
Cc: akpm, heiko.carstens, wli, ak, mingo, torvalds, linux-kernel,
riel, dada1
Chuck wrote:
> I don't think that's, um, "possible." Even if you could discover how many
> empty sockets there were in a system, someone might be able to hotplug
> a board with more of them on it.
That's going to depend on your system hardware configuration.
cpu_possible_map should be whatever is the largest set of
cpus you might possibly want to deal with plugging in.
Some hardware configurations will support more addition
of hardware cpus than others.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 8:32 [PATCH] percpu data: only iterate over possible CPUs Chuck Ebbert
2006-02-09 8:55 ` Paul Jackson
@ 2006-02-09 9:06 ` Andrew Morton
2006-02-09 9:11 ` Andrew Morton
1 sibling, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 9:06 UTC (permalink / raw)
To: Chuck Ebbert
Cc: pj, heiko.carstens, wli, ak, mingo, torvalds, linux-kernel, riel,
dada1
Chuck Ebbert <76306.1226@compuserve.com> wrote:
>
> In-Reply-To: <20060208204502.12513ae5.akpm@osdl.org>
>
> On Wed, 8 Feb 2006 at 20:45:02 -0800, Andrew Morton wrote:
>
> > > Its even documented in line 332 of include/linux/cpumask.h
> > >
> > > * #ifdef CONFIG_HOTPLUG_CPU
> > > * cpu_possible_map - all NR_CPUS bits set
> >
> > That seems a quite bad idea. If we know which CPUs are possible we should
> > populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
> > set.
>
> I don't think that's, um, "possible." Even if you could discover how many
> empty sockets there were in a system, someone might be able to hotplug
> a board with more of them on it. And there's no way to tell how many CPUs
> to reserve for each socket anyway, e.g. AMD has already announced quad-core
> processors.
Well maybe. But it's awfully presumptuous for us to say that no platform
will be capable of telling us what its maximum number of CPUs is, or even
whether certain CPUs within that range aren't possible.
<checks>
Yup, on my 2-way x86 test box, with NR_CPUS=16 we have
cpu_possible_map=0xffff.
That's just insane - the default setting for a distro kernel should be (or
will become) NR_CPUS=lots, HOTPLUG_CPU=y. All those for_each_cpu() loops
are iterating across 16 CPUs.
aargh.
> But what really surprised me is that for_each_cpu() actually walks
> cpu_possible_map and not cpu_present_map as I had assumed. This violates
> the Principle Of Least Surprise. Maybe renaming for_each_cpu to
> for_each_possible_cpu might be a good idea?
That would make sense.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 9:06 ` Andrew Morton
@ 2006-02-09 9:11 ` Andrew Morton
2006-02-09 10:08 ` Heiko Carstens
0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 9:11 UTC (permalink / raw)
To: 76306.1226, pj, heiko.carstens, wli, ak, mingo, torvalds,
linux-kernel, riel, dada1
Andrew Morton <akpm@osdl.org> wrote:
>
> aargh.
>
Actually, x86 appears to be the only arch which suffers this braindamage.
The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
CPU_MASK_NONE equals all-zeroes).
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 9:11 ` Andrew Morton
@ 2006-02-09 10:08 ` Heiko Carstens
2006-02-09 10:13 ` Andrew Morton
0 siblings, 1 reply; 58+ messages in thread
From: Heiko Carstens @ 2006-02-09 10:08 UTC (permalink / raw)
To: Andrew Morton
Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel,
dada1
> > aargh.
> Actually, x86 appears to be the only arch which suffers this braindamage.
> The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> CPU_MASK_NONE equals all-zeroes).
s390 will join, as soon as the cpu_possible_map fix is merged...
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 10:08 ` Heiko Carstens
@ 2006-02-09 10:13 ` Andrew Morton
2006-02-09 10:23 ` Heiko Carstens
0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 10:13 UTC (permalink / raw)
To: Heiko Carstens
Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel,
dada1
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > > aargh.
> > Actually, x86 appears to be the only arch which suffers this braindamage.
> > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > CPU_MASK_NONE equals all-zeroes).
>
> s390 will join, as soon as the cpu_possible_map fix is merged...
What cpu_possible_map fix?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 10:13 ` Andrew Morton
@ 2006-02-09 10:23 ` Heiko Carstens
2006-02-09 10:31 ` Andrew Morton
0 siblings, 1 reply; 58+ messages in thread
From: Heiko Carstens @ 2006-02-09 10:23 UTC (permalink / raw)
To: Andrew Morton
Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel,
dada1
> > > > aargh.
> > > Actually, x86 appears to be the only arch which suffers this braindamage.
> > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > > CPU_MASK_NONE equals all-zeroes).
> >
> > s390 will join, as soon as the cpu_possible_map fix is merged...
>
> What cpu_possible_map fix?
This one:
http://lkml.org/lkml/2006/2/8/162
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 10:23 ` Heiko Carstens
@ 2006-02-09 10:31 ` Andrew Morton
2006-02-09 11:47 ` Heiko Carstens
0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 10:31 UTC (permalink / raw)
To: Heiko Carstens
Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel,
dada1
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > > > > aargh.
> > > > Actually, x86 appears to be the only arch which suffers this braindamage.
> > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > > > CPU_MASK_NONE equals all-zeroes).
> > >
> > > s390 will join, as soon as the cpu_possible_map fix is merged...
> >
> > What cpu_possible_map fix?
>
> This one:
>
> http://lkml.org/lkml/2006/2/8/162
>
Oh, OK. Ow, I don't think you want to do that. It means that all those
for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
or not they're even possible.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 10:31 ` Andrew Morton
@ 2006-02-09 11:47 ` Heiko Carstens
2006-02-09 12:46 ` Eric Dumazet
0 siblings, 1 reply; 58+ messages in thread
From: Heiko Carstens @ 2006-02-09 11:47 UTC (permalink / raw)
To: Andrew Morton
Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel,
dada1
> > > > > Actually, x86 appears to be the only arch which suffers this braindamage.
> > > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
> > > > > CPU_MASK_NONE equals all-zeroes).
> > > >
> > > > s390 will join, as soon as the cpu_possible_map fix is merged...
> > >
> > > What cpu_possible_map fix?
> >
> > This one:
> >
> > http://lkml.org/lkml/2006/2/8/162
> >
>
> Oh, OK. Ow, I don't think you want to do that. It means that all those
> for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
> or not they're even possible.
That's ok. We're mainly running under z/VM where you can attach new virtual
cpus on the fly to the virtual machine (up to 64 cpus).
The only difference to before is that it was possible to limit the waste of
resources by passing a number with 'maxcpus'. This value was used to generate
the cpu_possible_map.
But since the map needs to be ready when we return from setup_arch, we don't
have access to max_cpus, unless we parse commandline on our own...
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 11:47 ` Heiko Carstens
@ 2006-02-09 12:46 ` Eric Dumazet
2006-02-09 13:12 ` Heiko Carstens
0 siblings, 1 reply; 58+ messages in thread
From: Eric Dumazet @ 2006-02-09 12:46 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, 76306.1226, pj, wli, ak, mingo, torvalds,
linux-kernel, riel, dada1
Heiko Carstens a écrit :
>>>>> > Actually, x86 appears to be the only arch which suffers this braindamage.
>>>>> > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that
>>>>> > CPU_MASK_NONE equals all-zeroes).
>>>>>
>>>>> s390 will join, as soon as the cpu_possible_map fix is merged...
>>>> What cpu_possible_map fix?
>>> This one:
>>>
>>> http://lkml.org/lkml/2006/2/8/162
>>>
>> Oh, OK. Ow, I don't think you want to do that. It means that all those
>> for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
>> or not they're even possible.
>
> That's ok. We're mainly running under z/VM where you can attach new virtual
> cpus on the fly to the virtual machine (up to 64 cpus).
> The only difference to before is that it was possible to limit the waste of
> resources by passing a number with 'maxcpus'. This value was used to generate
> the cpu_possible_map.
> But since the map needs to be ready when we return from setup_arch, we don't
> have access to max_cpus, unless we parse commandline on our own...
>
Then it's OK to clear bits from cpu_possible_map once you have max_cpus value
for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++)
cpu_clear(cpu, cpu_possible_map);
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 12:46 ` Eric Dumazet
@ 2006-02-09 13:12 ` Heiko Carstens
2006-02-09 13:55 ` Eric Dumazet
0 siblings, 1 reply; 58+ messages in thread
From: Heiko Carstens @ 2006-02-09 13:12 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, 76306.1226, pj, wli, ak, mingo, torvalds,
linux-kernel, riel, dada1
> >>Oh, OK. Ow, I don't think you want to do that. It means that all those
> >>for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether
> >>or not they're even possible.
> >That's ok. We're mainly running under z/VM where you can attach new virtual
> >cpus on the fly to the virtual machine (up to 64 cpus).
> >The only difference to before is that it was possible to limit the waste of
> >resources by passing a number with 'maxcpus'. This value was used to generate
> >the cpu_possible_map.
> >But since the map needs to be ready when we return from setup_arch, we don't
> >have access to max_cpus, unless we parse commandline on our own...
>
> Then it's OK to clear bits from cpu_possible_map once you have max_cpus value
>
> for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++)
> cpu_clear(cpu, cpu_possible_map);
Hmm... I don't think the semantics of cpu_possible_map allow to change it.
Also any code that uses for_each_cpu() may allocate memory _before_
cpu_possible_map is changed back to reflect a smaller number of cpus.
Doesn't look like the correct way to fix this.
Thinking a bit longer this was probably a reason why initialization of
this map was done in smp_prepare_cpus() before it silently moved to
setup_arch().
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 13:12 ` Heiko Carstens
@ 2006-02-09 13:55 ` Eric Dumazet
0 siblings, 0 replies; 58+ messages in thread
From: Eric Dumazet @ 2006-02-09 13:55 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, 76306.1226, pj, wli, ak, mingo, torvalds,
linux-kernel, riel, dada1
Heiko Carstens a écrit :
>> for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++)
>> cpu_clear(cpu, cpu_possible_map);
>
> Hmm... I don't think the semantics of cpu_possible_map allow to change it.
> Also any code that uses for_each_cpu() may allocate memory _before_
> cpu_possible_map is changed back to reflect a smaller number of cpus.
> Doesn't look like the correct way to fix this.
> Thinking a bit longer this was probably a reason why initialization of
> this map was done in smp_prepare_cpus() before it silently moved to
> setup_arch().
Hum... of course you may loose some bits of memory (percpu data for example)
but clearing a cpu in cpu_possible_map is allowed.
See arch/alpha/kernel/process.c and arch/x86_64/kernel/smpboot.c for uses of
cpu_clear()
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 4:45 ` Andrew Morton
2006-02-09 4:56 ` Paul Jackson
@ 2006-02-09 16:08 ` Nathan Lynch
2006-02-09 16:13 ` Heiko Carstens
2006-02-09 17:03 ` Ashok Raj
1 sibling, 2 replies; 58+ messages in thread
From: Nathan Lynch @ 2006-02-09 16:08 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric Dumazet, riel, linux-kernel, torvalds, mingo, ak, 76306.1226,
wli, heiko.carstens, Paul Jackson
Andrew Morton wrote:
> Eric Dumazet <dada1@cosmosbay.com> wrote:
> >
> > Andrew Morton a écrit :
> > > Andrew Morton <akpm@osdl.org> wrote:
> > >> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised
> > >> by the time setup_per_cpu_areas() is called,
> > >
> > > err, they'll need it once Eric's
> > > dont-waste-percpu-memory-on-not-possible-CPUs patch is merged..
> > >
> > >> so I think it makes sense to
> > >> say "thou shalt initialise cpu_possible_map in setup_arch()".
> > >>
> > >> I guess Xen isn't doing that. Can it be made to?
> > >
> > > Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch().
> >
> > I dont understand why this HOTPLUG stuff is problematic for Xen (or other
> > arch) : If CONFIG_HOTPLUG_CPU is configured, then the map should be preset to
> > CPU_MASK_ALL.
>
> Presumably not all architectures are doing that.
powerpc/ppc64, for instance, determines the number of possible cpus
from information exported by firmware (and I'm mystified as to why
other platforms don't do this). So it's typical to have a kernel an a
pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
> > Its even documented in line 332 of include/linux/cpumask.h
> >
> > * #ifdef CONFIG_HOTPLUG_CPU
> > * cpu_possible_map - all NR_CPUS bits set
>
> That seems a quite bad idea. If we know which CPUs are possible we should
> populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
> set. Setting all the bits like that wastes memory and wastes CPU cycles.
Yes, that comment is wrong or outdated.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 16:08 ` Nathan Lynch
@ 2006-02-09 16:13 ` Heiko Carstens
2006-02-09 16:38 ` Rik van Riel
` (2 more replies)
2006-02-09 17:03 ` Ashok Raj
1 sibling, 3 replies; 58+ messages in thread
From: Heiko Carstens @ 2006-02-09 16:13 UTC (permalink / raw)
To: Nathan Lynch
Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo,
ak, 76306.1226, wli, Paul Jackson
> > Presumably not all architectures are doing that.
> powerpc/ppc64, for instance, determines the number of possible cpus
> from information exported by firmware (and I'm mystified as to why
> other platforms don't do this). So it's typical to have a kernel an a
> pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
Simply because there is no such interface on s390. The only thing we know
for sure is that if we are running under z/VM the user is free to
configure up to 63 additional virtual cpus on the fly...
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 16:13 ` Heiko Carstens
@ 2006-02-09 16:38 ` Rik van Riel
2006-02-09 16:59 ` Nathan Lynch
2006-02-09 17:37 ` Andi Kleen
2 siblings, 0 replies; 58+ messages in thread
From: Rik van Riel @ 2006-02-09 16:38 UTC (permalink / raw)
To: Heiko Carstens
Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, linux-kernel, torvalds,
mingo, ak, 76306.1226, wli, Paul Jackson
On Thu, 9 Feb 2006, Heiko Carstens wrote:
> Simply because there is no such interface on s390. The only thing we know
> for sure is that if we are running under z/VM the user is free to
> configure up to 63 additional virtual cpus on the fly...
Xen is in a similar situation.
--
All Rights Reversed
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 16:13 ` Heiko Carstens
2006-02-09 16:38 ` Rik van Riel
@ 2006-02-09 16:59 ` Nathan Lynch
2006-02-09 17:37 ` Andi Kleen
2 siblings, 0 replies; 58+ messages in thread
From: Nathan Lynch @ 2006-02-09 16:59 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo,
ak, 76306.1226, wli, Paul Jackson
Heiko Carstens wrote:
> > > Presumably not all architectures are doing that.
> > powerpc/ppc64, for instance, determines the number of possible cpus
> > from information exported by firmware (and I'm mystified as to why
> > other platforms don't do this). So it's typical to have a kernel an a
> > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
>
> Simply because there is no such interface on s390. The only thing we know
> for sure is that if we are running under z/VM the user is free to
> configure up to 63 additional virtual cpus on the fly...
My "mystified" parenthetical was not meant as a criticism of
arch/!powerpc, but of the platform implementations themselves, for not
making such an interface available.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 16:08 ` Nathan Lynch
2006-02-09 16:13 ` Heiko Carstens
@ 2006-02-09 17:03 ` Ashok Raj
2006-02-09 17:23 ` Nathan Lynch
2006-02-09 18:04 ` Andrew Morton
1 sibling, 2 replies; 58+ messages in thread
From: Ashok Raj @ 2006-02-09 17:03 UTC (permalink / raw)
To: Nathan Lynch
Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo,
ak, 76306.1226, wli, heiko.carstens, Paul Jackson
On Thu, Feb 09, 2006 at 08:08:09AM -0800, Nathan Lynch wrote:
>
> powerpc/ppc64, for instance, determines the number of possible cpus
> from information exported by firmware (and I'm mystified as to why
> other platforms don't do this). So it's typical to have a kernel an a
> pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
>
Iam assuming in the above ase, cpu_possible_map == cpu_present_map and both
dont change after the OS is booted. v.s in platforms capable of hot-plug
cpus present could grow up to cpu_possible_map (max) when a new cpu is
newly added, that wasnt even present at boot time.
The problem was with ACPI just simply looking at the namespace doesnt
exactly give us an idea of how many processors are possible in this platform.
The reason is we could get more added via SSDT dynamically.
on x86_64, we used the number of disabled cpus reported in MADT at boot time
as an indicator to set cpu_possible_map. So if you had 4 sockets, but just
2 populated, the BIOS would set the missing CPUS with disabled flag. We
simply count the number of disabled CPUs and add to possible map.
Andi added documentation to cover that as well in
Documentation/x86_64/cpu-hotplug-spec as a guideline for BIOS folks.
--
Cheers,
Ashok Raj
- Open Source Technology Center
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 17:03 ` Ashok Raj
@ 2006-02-09 17:23 ` Nathan Lynch
2006-02-09 18:04 ` Andrew Morton
1 sibling, 0 replies; 58+ messages in thread
From: Nathan Lynch @ 2006-02-09 17:23 UTC (permalink / raw)
To: Ashok Raj
Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo,
ak, 76306.1226, wli, heiko.carstens, Paul Jackson
Ashok Raj wrote:
> On Thu, Feb 09, 2006 at 08:08:09AM -0800, Nathan Lynch wrote:
> >
> > powerpc/ppc64, for instance, determines the number of possible cpus
> > from information exported by firmware (and I'm mystified as to why
> > other platforms don't do this). So it's typical to have a kernel an a
> > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
> >
>
> Iam assuming in the above ase, cpu_possible_map == cpu_present_map and both
> dont change after the OS is booted. v.s in platforms capable of hot-plug
> cpus present could grow up to cpu_possible_map (max) when a new cpu is
> newly added, that wasnt even present at boot time.
No, cpu_present_map need not equal cpu_possible_map. Of course
cpu_possible_map doesn't change after boot, but cpu_present_map can.
Apart from that, I don't really understand what you're trying to say
here.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 16:13 ` Heiko Carstens
2006-02-09 16:38 ` Rik van Riel
2006-02-09 16:59 ` Nathan Lynch
@ 2006-02-09 17:37 ` Andi Kleen
2006-02-10 10:05 ` Heiko Carstens
2 siblings, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2006-02-09 17:37 UTC (permalink / raw)
To: Heiko Carstens
Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel,
torvalds, mingo, 76306.1226, wli, Paul Jackson
On Thu, Feb 09, 2006 at 05:13:31PM +0100, Heiko Carstens wrote:
> > > Presumably not all architectures are doing that.
> > powerpc/ppc64, for instance, determines the number of possible cpus
> > from information exported by firmware (and I'm mystified as to why
> > other platforms don't do this). So it's typical to have a kernel an a
> > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
>
> Simply because there is no such interface on s390. The only thing we know
> for sure is that if we are running under z/VM the user is free to
> configure up to 63 additional virtual cpus on the fly...
x86-64 had the same problem, but we now require that you
boot with additional_cpus=... for how many you want. Default is 0
(used to be half available CPUs but that lead to confusion)
Also the firmware has a way now to give this information automatically.
You can of course make it default to 64, but it will cost you
several MB of memory in usually unused copies of per cpu data.
-Andi
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 17:03 ` Ashok Raj
2006-02-09 17:23 ` Nathan Lynch
@ 2006-02-09 18:04 ` Andrew Morton
2006-02-09 18:52 ` Ashok Raj
2006-02-10 10:02 ` Andi Kleen
1 sibling, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 18:04 UTC (permalink / raw)
To: Ashok Raj
Cc: ntl, dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226,
wli, heiko.carstens, pj
Ashok Raj <ashok.raj@intel.com> wrote:
>
> The problem was with ACPI just simply looking at the namespace doesnt
> exactly give us an idea of how many processors are possible in this platform.
We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
NR_CPUS=lots will be appreciable.
Do any x86 platforms actually support CPU hotplug?
Does the ACPI problem which you describe occur with present-CPUs,
or only with possible-but-not-present ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 18:04 ` Andrew Morton
@ 2006-02-09 18:52 ` Ashok Raj
2006-02-09 20:37 ` Andrew Morton
2006-02-10 10:02 ` Andi Kleen
1 sibling, 1 reply; 58+ messages in thread
From: Ashok Raj @ 2006-02-09 18:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Ashok Raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, ak,
76306.1226, wli, heiko.carstens, pj
On Thu, Feb 09, 2006 at 10:04:29AM -0800, Andrew Morton wrote:
>
> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
> NR_CPUS=lots will be appreciable.
>
> Do any x86 platforms actually support CPU hotplug?
Hi Andrew,
logical cpu hotplug (i.e onlining and offlining) can be done on any
system. No hw or BIOS support is required. (this is what smp suspend/resume
folks use)
I remember Natalie from Unisys mentioned they have one system which is
ACPI based and supports physical cpu hotplug, but the BIOS is old and
didnt support the hotplug notify via ACPI. They probably have some other
sysmgmt way to interact and initiate the hotplug.
Iam aware of couple more that use ia64 NUMA type hw as well.
(I dont think i can announce for them:-) ).
>
> Does the ACPI problem which you describe occur with present-CPUs,
> or only with possible-but-not-present ones?
Describing present cpus is not problem.
Only knowing possible-but-not-present upfront is an issue.
logical-cpu-hotplug only: cpu_present_map == cpu_possible_map always
physical-cpu-hotplug: At boot, cpu_present_map is a subset of possible_map.
Think its best to NOT set cpu_present_map to MASK_ALL as its being proposed,
but let the arch/platform code figure out early enough to set possible_map
accurately for that platform. If a platform has no way to determine it,
then it could use cmdline like what x86_64 introduced (additional_cpus=)
to overcome that.
--
Cheers,
Ashok Raj
- Open Source Technology Center
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 18:52 ` Ashok Raj
@ 2006-02-09 20:37 ` Andrew Morton
2006-02-09 21:03 ` Ashok Raj
0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2006-02-09 20:37 UTC (permalink / raw)
To: Ashok Raj
Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, ak,
76306.1226, wli, heiko.carstens, pj
Ashok Raj <ashok.raj@intel.com> wrote:
>
> >
> > Does the ACPI problem which you describe occur with present-CPUs,
> > or only with possible-but-not-present ones?
>
> Describing present cpus is not problem.
>
> Only knowing possible-but-not-present upfront is an issue.
>
> logical-cpu-hotplug only: cpu_present_map == cpu_possible_map always
> physical-cpu-hotplug: At boot, cpu_present_map is a subset of possible_map.
>
> Think its best to NOT set cpu_present_map to MASK_ALL as its being proposed,
> but let the arch/platform code figure out early enough to set possible_map
> accurately for that platform. If a platform has no way to determine it,
> then it could use cmdline like what x86_64 introduced (additional_cpus=)
> to overcome that.
There is no proposal to change cpu_present_map.
The problem is cpu_possible_map. That is presently being initialised to
CPU_MASK_ALL, which adversely affects perfoermance. An NR_CPUS=16 kernel
on a 2-way presently has cpu_possible_map=0xffff, which will hurt.
The proposal is this:
From: Andrew Morton <akpm@osdl.org>
Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that
a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather
than over possible ones. That can be quite expensive.
b) Soon we'll be allocating per-cpu areas only for possible CPUs. So with
CPU_MASK_ALL, we'll be wasting memory.
I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug
case. Will that break it?
I note that parisc is also using CPU_MASK_ALL. Suggest that it stop doing
that.
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
arch/i386/kernel/smpboot.c | 4 ----
arch/i386/mach-voyager/voyager_smp.c | 2 +-
2 files changed, 1 insertion(+), 5 deletions(-)
diff -puN arch/i386/kernel/smpboot.c~x86-dont-initialise-cpu_possible_map-to-all-ones arch/i386/kernel/smpboot.c
--- devel/arch/i386/kernel/smpboot.c~x86-dont-initialise-cpu_possible_map-to-all-ones 2006-02-09 01:11:55.000000000 -0800
+++ devel-akpm/arch/i386/kernel/smpboot.c 2006-02-09 01:12:24.000000000 -0800
@@ -87,11 +87,7 @@ EXPORT_SYMBOL(cpu_online_map);
cpumask_t cpu_callin_map;
cpumask_t cpu_callout_map;
EXPORT_SYMBOL(cpu_callout_map);
-#ifdef CONFIG_HOTPLUG_CPU
-cpumask_t cpu_possible_map = CPU_MASK_ALL;
-#else
cpumask_t cpu_possible_map;
-#endif
EXPORT_SYMBOL(cpu_possible_map);
static cpumask_t smp_commenced_mask;
diff -puN arch/i386/mach-voyager/voyager_smp.c~x86-dont-initialise-cpu_possible_map-to-all-ones arch/i386/mach-voyager/voyager_smp.c
--- devel/arch/i386/mach-voyager/voyager_smp.c~x86-dont-initialise-cpu_possible_map-to-all-ones 2006-02-09 01:11:55.000000000 -0800
+++ devel-akpm/arch/i386/mach-voyager/voyager_smp.c 2006-02-09 01:12:43.000000000 -0800
@@ -240,7 +240,7 @@ static cpumask_t smp_commenced_mask = CP
cpumask_t cpu_callin_map = CPU_MASK_NONE;
cpumask_t cpu_callout_map = CPU_MASK_NONE;
EXPORT_SYMBOL(cpu_callout_map);
-cpumask_t cpu_possible_map = CPU_MASK_ALL;
+cpumask_t cpu_possible_map = CPU_MASK_NONE;
EXPORT_SYMBOL(cpu_possible_map);
/* The per processor IRQ masks (these are usually kept in sync) */
_
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 20:37 ` Andrew Morton
@ 2006-02-09 21:03 ` Ashok Raj
0 siblings, 0 replies; 58+ messages in thread
From: Ashok Raj @ 2006-02-09 21:03 UTC (permalink / raw)
To: Andrew Morton
Cc: Ashok Raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, ak,
76306.1226, wli, heiko.carstens, pj
On Thu, Feb 09, 2006 at 12:37:29PM -0800, Andrew Morton wrote:
>
> There is no proposal to change cpu_present_map.
>
> The problem is cpu_possible_map. That is presently being initialised to
> CPU_MASK_ALL, which adversely affects perfoermance. An NR_CPUS=16 kernel
> on a 2-way presently has cpu_possible_map=0xffff, which will hurt.
Think i miss typed earlier. What you proposed is the correct solution.
I will followup with similar change on ia64 as well, (currently we do this
in smp_build_cpu_map()). And add something similar to what we did for
x86_64.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 18:04 ` Andrew Morton
2006-02-09 18:52 ` Ashok Raj
@ 2006-02-10 10:02 ` Andi Kleen
2006-02-10 10:42 ` Andrew Morton
1 sibling, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2006-02-10 10:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Ashok Raj, ntl, dada1, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
On Thursday 09 February 2006 19:04, Andrew Morton wrote:
> Ashok Raj <ashok.raj@intel.com> wrote:
> >
> > The problem was with ACPI just simply looking at the namespace doesnt
> > exactly give us an idea of how many processors are possible in this platform.
>
> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
> NR_CPUS=lots will be appreciable.
What is this performance penalty exactly?
It wastes quite some memory (each possible CPU needs 32K of memory which
adds quickly up), but it shouldn't impact other CPU use.
>
> Do any x86 platforms actually support CPU hotplug?
Xen does. And it's needed for suspend/resume on normal x86 now.
-Andi
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-09 17:37 ` Andi Kleen
@ 2006-02-10 10:05 ` Heiko Carstens
2006-02-10 10:13 ` Andi Kleen
0 siblings, 1 reply; 58+ messages in thread
From: Heiko Carstens @ 2006-02-10 10:05 UTC (permalink / raw)
To: Andi Kleen
Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel,
torvalds, mingo, 76306.1226, wli, Paul Jackson
> > > powerpc/ppc64, for instance, determines the number of possible cpus
> > > from information exported by firmware (and I'm mystified as to why
> > > other platforms don't do this). So it's typical to have a kernel an a
> > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
> >
> > Simply because there is no such interface on s390. The only thing we know
> > for sure is that if we are running under z/VM the user is free to
> > configure up to 63 additional virtual cpus on the fly...
>
> x86-64 had the same problem, but we now require that you
> boot with additional_cpus=... for how many you want. Default is 0
> (used to be half available CPUs but that lead to confusion)
So introducing the additional_cpus kernel parameter seems to be the way
to go (for XEN probably too). Even though it seems to be a bit odd if the
user specifies both maxcpus=... and additional_cpus=...
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 10:05 ` Heiko Carstens
@ 2006-02-10 10:13 ` Andi Kleen
2006-02-11 14:49 ` Heiko Carstens
0 siblings, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2006-02-10 10:13 UTC (permalink / raw)
To: Heiko Carstens
Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel,
torvalds, mingo, 76306.1226, wli, Paul Jackson, jbeulich,
Keir Fraser
[putting some Xen people into cc]
On Friday 10 February 2006 11:05, Heiko Carstens wrote:
> > > > powerpc/ppc64, for instance, determines the number of possible cpus
> > > > from information exported by firmware (and I'm mystified as to why
> > > > other platforms don't do this). So it's typical to have a kernel an a
> > > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff.
> > >
> > > Simply because there is no such interface on s390. The only thing we know
> > > for sure is that if we are running under z/VM the user is free to
> > > configure up to 63 additional virtual cpus on the fly...
> >
> > x86-64 had the same problem, but we now require that you
> > boot with additional_cpus=... for how many you want. Default is 0
> > (used to be half available CPUs but that lead to confusion)
>
> So introducing the additional_cpus kernel parameter seems to be the way
> to go (for XEN probably too). Even though it seems to be a bit odd if the
> user specifies both maxcpus=... and additional_cpus=...
With additional_cpus you don't need maxcpus. They are added together.
Also for Xen it's probably not needed by default, but the hypervisor can somehow
pass it (it doesn't make sense to have more vcpus than real cpus
and that should be relatively small number, so the memory waste
shouldn't be that bad).
On x86-64 there is also a new firmware
way to specify it, but current machines don't support that yet.
-Andi
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 10:02 ` Andi Kleen
@ 2006-02-10 10:42 ` Andrew Morton
2006-02-10 11:09 ` Eric Dumazet
2006-02-10 12:10 ` Andi Kleen
0 siblings, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-10 10:42 UTC (permalink / raw)
To: Andi Kleen
Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Andi Kleen <ak@muc.de> wrote:
>
> On Thursday 09 February 2006 19:04, Andrew Morton wrote:
> > Ashok Raj <ashok.raj@intel.com> wrote:
> > >
> > > The problem was with ACPI just simply looking at the namespace doesnt
> > > exactly give us an idea of how many processors are possible in this platform.
> >
> > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
> > NR_CPUS=lots will be appreciable.
>
> What is this performance penalty exactly?
All those for_each_cpu() loops will hit NR_CPUS cachelines instead of
hweight(cpu_possible_map) cachelines.
> It wastes quite some memory (each possible CPU needs 32K of memory which
> adds quickly up), but it shouldn't impact other CPU use.
>
> >
> > Do any x86 platforms actually support CPU hotplug?
>
> Xen does.
yup.
> And it's needed for suspend/resume on normal x86 now.
True.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 10:42 ` Andrew Morton
@ 2006-02-10 11:09 ` Eric Dumazet
2006-02-10 11:23 ` Andrew Morton
2006-02-10 12:10 ` Andi Kleen
1 sibling, 1 reply; 58+ messages in thread
From: Eric Dumazet @ 2006-02-10 11:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Andi Kleen, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Andrew Morton a écrit :
> Andi Kleen <ak@muc.de> wrote:
>> On Thursday 09 February 2006 19:04, Andrew Morton wrote:
>>> Ashok Raj <ashok.raj@intel.com> wrote:
>>>> The problem was with ACPI just simply looking at the namespace doesnt
>>>> exactly give us an idea of how many processors are possible in this platform.
>>> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
>>> NR_CPUS=lots will be appreciable.
>> What is this performance penalty exactly?
>
> All those for_each_cpu() loops will hit NR_CPUS cachelines instead of
> hweight(cpu_possible_map) cachelines.
You mean NR_CPUS bits, mostly all included in a single cacheline, and even in
a single long word :) for most cases (NR_CPUS <= 32 or 64)
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 11:09 ` Eric Dumazet
@ 2006-02-10 11:23 ` Andrew Morton
2006-02-10 11:26 ` Andrew Morton
2006-02-10 14:13 ` Eric Dumazet
0 siblings, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-10 11:23 UTC (permalink / raw)
To: Eric Dumazet
Cc: ak, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Eric Dumazet <dada1@cosmosbay.com> wrote:
>
> Andrew Morton a écrit :
> > Andi Kleen <ak@muc.de> wrote:
> >> On Thursday 09 February 2006 19:04, Andrew Morton wrote:
> >>> Ashok Raj <ashok.raj@intel.com> wrote:
> >>>> The problem was with ACPI just simply looking at the namespace doesnt
> >>>> exactly give us an idea of how many processors are possible in this platform.
> >>> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
> >>> NR_CPUS=lots will be appreciable.
> >> What is this performance penalty exactly?
> >
> > All those for_each_cpu() loops will hit NR_CPUS cachelines instead of
> > hweight(cpu_possible_map) cachelines.
>
> You mean NR_CPUS bits, mostly all included in a single cacheline, and even in
> a single long word :) for most cases (NR_CPUS <= 32 or 64)
>
No, I mean cachelines:
static void recalc_bh_state(void)
{
int i;
int tot = 0;
if (__get_cpu_var(bh_accounting).ratelimit++ < 4096)
return;
__get_cpu_var(bh_accounting).ratelimit = 0;
for_each_cpu(i)
tot += per_cpu(bh_accounting, i).nr;
That's going to hit NR_CPUS cachelines even on a 2-way.
Or am I missing something really obvious here?
(Probably the most expensive ones will be get_page_state() and friends.
And argh, they're still hardwired to CPU_MASK_ALL).
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 11:23 ` Andrew Morton
@ 2006-02-10 11:26 ` Andrew Morton
2006-02-10 14:13 ` Eric Dumazet
1 sibling, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-10 11:26 UTC (permalink / raw)
To: dada1, ak, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Andrew Morton <akpm@osdl.org> wrote:
>
> (Probably the most expensive ones will be get_page_state() and friends.
> And argh, they're still hardwired to CPU_MASK_ALL).
>
No, we mask it with cpu_online_map. Phew.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 10:42 ` Andrew Morton
2006-02-10 11:09 ` Eric Dumazet
@ 2006-02-10 12:10 ` Andi Kleen
2006-02-10 19:08 ` Andrew Morton
1 sibling, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2006-02-10 12:10 UTC (permalink / raw)
To: Andrew Morton
Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
On Friday 10 February 2006 11:42, Andrew Morton wrote:
> Andi Kleen <ak@muc.de> wrote:
> >
> > On Thursday 09 February 2006 19:04, Andrew Morton wrote:
> > > Ashok Raj <ashok.raj@intel.com> wrote:
> > > >
> > > > The problem was with ACPI just simply looking at the namespace doesnt
> > > > exactly give us an idea of how many processors are possible in this platform.
> > >
> > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
> > > NR_CPUS=lots will be appreciable.
> >
> > What is this performance penalty exactly?
>
> All those for_each_cpu() loops will hit NR_CPUS cachelines instead of
> hweight(cpu_possible_map) cachelines.
But are there any in real fast paths? iirc they are mostly in initialization,
where it doesn't matter too much.
-Andi
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 11:23 ` Andrew Morton
2006-02-10 11:26 ` Andrew Morton
@ 2006-02-10 14:13 ` Eric Dumazet
2006-02-11 0:10 ` Nathan Lynch
1 sibling, 1 reply; 58+ messages in thread
From: Eric Dumazet @ 2006-02-10 14:13 UTC (permalink / raw)
To: Andrew Morton
Cc: ak, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]
Andrew Morton a écrit :
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>> Andrew Morton a écrit :
>>> Andi Kleen <ak@muc.de> wrote:
>>>> On Thursday 09 February 2006 19:04, Andrew Morton wrote:
>>>>> Ashok Raj <ashok.raj@intel.com> wrote:
>>>>>> The problem was with ACPI just simply looking at the namespace doesnt
>>>>>> exactly give us an idea of how many processors are possible in this platform.
>>>>> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
>>>>> NR_CPUS=lots will be appreciable.
>>>> What is this performance penalty exactly?
>>> All those for_each_cpu() loops will hit NR_CPUS cachelines instead of
>>> hweight(cpu_possible_map) cachelines.
>> You mean NR_CPUS bits, mostly all included in a single cacheline, and even in
>> a single long word :) for most cases (NR_CPUS <= 32 or 64)
>>
>
> No, I mean cachelines:
>
> static void recalc_bh_state(void)
> {
> int i;
> int tot = 0;
>
> if (__get_cpu_var(bh_accounting).ratelimit++ < 4096)
> return;
> __get_cpu_var(bh_accounting).ratelimit = 0;
> for_each_cpu(i)
> tot += per_cpu(bh_accounting, i).nr;
>
> That's going to hit NR_CPUS cachelines even on a 2-way.
>
> Or am I missing something really obvious here?
OK I see. This can be solved with this patch :
[PATCH] HOTPLUG_CPU : avoid hitting too many cachelines in recalc_bh_state()
Instead of using for_each_cpu(i), we can use for_each_online_cpu(i) : The
difference matters if HOTPUG_CPU=y
When a CPU goes offline (ie removed from online map), it might have a non null
bh_accounting.nr, so this patch adds a transfert of this counter to an online
CPU counter.
We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
can do this bh_accounting.nr transfert.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: buffer.patch --]
[-- Type: text/plain, Size: 651 bytes --]
--- a/fs/buffer.c 2006-02-10 15:08:21.000000000 +0100
+++ b/fs/buffer.c 2006-02-10 15:47:55.000000000 +0100
@@ -3138,7 +3138,7 @@
if (__get_cpu_var(bh_accounting).ratelimit++ < 4096)
return;
__get_cpu_var(bh_accounting).ratelimit = 0;
- for_each_cpu(i)
+ for_each_online_cpu(i)
tot += per_cpu(bh_accounting, i).nr;
buffer_heads_over_limit = (tot > max_buffer_heads);
}
@@ -3187,6 +3187,9 @@
brelse(b->bhs[i]);
b->bhs[i] = NULL;
}
+ get_cpu_var(bh_accounting).nr += per_cpu(bh_accounting, cpu).nr ;
+ per_cpu(bh_accounting, cpu).nr = 0;
+ put_cpu_var(bh_accounting);
}
static int buffer_cpu_notify(struct notifier_block *self,
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 12:10 ` Andi Kleen
@ 2006-02-10 19:08 ` Andrew Morton
0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-10 19:08 UTC (permalink / raw)
To: Andi Kleen
Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Andi Kleen <ak@muc.de> wrote:
>
> On Friday 10 February 2006 11:42, Andrew Morton wrote:
> > Andi Kleen <ak@muc.de> wrote:
> > >
> > > On Thursday 09 February 2006 19:04, Andrew Morton wrote:
> > > > Ashok Raj <ashok.raj@intel.com> wrote:
> > > > >
> > > > > The problem was with ACPI just simply looking at the namespace doesnt
> > > > > exactly give us an idea of how many processors are possible in this platform.
> > > >
> > > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y,
> > > > NR_CPUS=lots will be appreciable.
> > >
> > > What is this performance penalty exactly?
> >
> > All those for_each_cpu() loops will hit NR_CPUS cachelines instead of
> > hweight(cpu_possible_map) cachelines.
>
> But are there any in real fast paths? iirc they are mostly in initialization,
> where it doesn't matter too much.
>
Could be so.
I just added one to percpu_counters though. And it'd be a pain to
introduce a cpu notifier for each and every percpu_counter.
From: Andrew Morton <akpm@osdl.org>
Implement percpu_counter_sum(). This is a more accurate but slower version of
percpu_counter_read_positive().
We need this for Alex's speedup-ext3_statfs patch. Otherwise it would be too
inaccurate on large CPU counts.
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Cc: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
include/linux/percpu_counter.h | 6 ++++++
mm/swap.c | 25 +++++++++++++++++++++++--
2 files changed, 29 insertions(+), 2 deletions(-)
diff -puN include/linux/percpu_counter.h~percpu_counter_sum include/linux/percpu_counter.h
--- devel/include/linux/percpu_counter.h~percpu_counter_sum 2006-02-08 13:34:08.000000000 -0800
+++ devel-akpm/include/linux/percpu_counter.h 2006-02-08 13:34:08.000000000 -0800
@@ -39,6 +39,7 @@ static inline void percpu_counter_destro
}
void percpu_counter_mod(struct percpu_counter *fbc, long amount);
+long percpu_counter_sum(struct percpu_counter *fbc);
static inline long percpu_counter_read(struct percpu_counter *fbc)
{
@@ -92,6 +93,11 @@ static inline long percpu_counter_read_p
return fbc->count;
}
+static inline long percpu_counter_sum(struct percpu_counter *fbc)
+{
+ return percpu_counter_read_positive(fbc);
+}
+
#endif /* CONFIG_SMP */
static inline void percpu_counter_inc(struct percpu_counter *fbc)
diff -puN mm/swap.c~percpu_counter_sum mm/swap.c
--- devel/mm/swap.c~percpu_counter_sum 2006-02-08 13:34:08.000000000 -0800
+++ devel-akpm/mm/swap.c 2006-02-08 13:34:08.000000000 -0800
@@ -491,13 +491,34 @@ void percpu_counter_mod(struct percpu_co
if (count >= FBC_BATCH || count <= -FBC_BATCH) {
spin_lock(&fbc->lock);
fbc->count += count;
+ *pcount = 0;
spin_unlock(&fbc->lock);
- count = 0;
+ } else {
+ *pcount = count;
}
- *pcount = count;
put_cpu();
}
EXPORT_SYMBOL(percpu_counter_mod);
+
+/*
+ * Add up all the per-cpu counts, return the result. This is a more accurate
+ * but much slower version of percpu_counter_read_positive()
+ */
+long percpu_counter_sum(struct percpu_counter *fbc)
+{
+ long ret;
+ int cpu;
+
+ spin_lock(&fbc->lock);
+ ret = fbc->count;
+ for_each_cpu(cpu) {
+ long *pcount = per_cpu_ptr(fbc->counters, cpu);
+ ret += *pcount;
+ }
+ spin_unlock(&fbc->lock);
+ return ret < 0 ? 0 : ret;
+}
+EXPORT_SYMBOL(percpu_counter_sum);
#endif
/*
_
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 14:13 ` Eric Dumazet
@ 2006-02-11 0:10 ` Nathan Lynch
2006-02-11 0:25 ` Andrew Morton
0 siblings, 1 reply; 58+ messages in thread
From: Nathan Lynch @ 2006-02-11 0:10 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, ak, ashok.raj, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Eric Dumazet wrote:
>
> [PATCH] HOTPLUG_CPU : avoid hitting too many cachelines in recalc_bh_state()
>
> Instead of using for_each_cpu(i), we can use for_each_online_cpu(i) : The
> difference matters if HOTPUG_CPU=y
>
> When a CPU goes offline (ie removed from online map), it might have a non
> null bh_accounting.nr, so this patch adds a transfert of this counter to an
> online CPU counter.
>
> We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
> can do this bh_accounting.nr transfert.
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> --- a/fs/buffer.c 2006-02-10 15:08:21.000000000 +0100
> +++ b/fs/buffer.c 2006-02-10 15:47:55.000000000 +0100
> @@ -3138,7 +3138,7 @@
> if (__get_cpu_var(bh_accounting).ratelimit++ < 4096)
> return;
> __get_cpu_var(bh_accounting).ratelimit = 0;
> - for_each_cpu(i)
> + for_each_online_cpu(i)
> tot += per_cpu(bh_accounting, i).nr;
> buffer_heads_over_limit = (tot > max_buffer_heads);
> }
> @@ -3187,6 +3187,9 @@
> brelse(b->bhs[i]);
> b->bhs[i] = NULL;
> }
> + get_cpu_var(bh_accounting).nr += per_cpu(bh_accounting, cpu).nr ;
> + per_cpu(bh_accounting, cpu).nr = 0;
> + put_cpu_var(bh_accounting);
> }
But now there is a window between the time the cpu is marked offline
and the time its bh_accounting.nr is moved to another cpu. So
recalc_bh_state could fail to set buffer_heads_over_limit when it
should.
Maybe it's not worth worrying about? I don't really know this code,
just thought I would point it out.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-11 0:10 ` Nathan Lynch
@ 2006-02-11 0:25 ` Andrew Morton
0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2006-02-11 0:25 UTC (permalink / raw)
To: Nathan Lynch
Cc: dada1, ak, ashok.raj, riel, linux-kernel, torvalds, mingo,
76306.1226, wli, heiko.carstens, pj
Nathan Lynch <ntl@pobox.com> wrote:
>
> Eric Dumazet wrote:
> >
> > [PATCH] HOTPLUG_CPU : avoid hitting too many cachelines in recalc_bh_state()
> >
> > Instead of using for_each_cpu(i), we can use for_each_online_cpu(i) : The
> > difference matters if HOTPUG_CPU=y
> >
> > When a CPU goes offline (ie removed from online map), it might have a non
> > null bh_accounting.nr, so this patch adds a transfert of this counter to an
> > online CPU counter.
> >
> > We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
> > can do this bh_accounting.nr transfert.
> >
> > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>
> > --- a/fs/buffer.c 2006-02-10 15:08:21.000000000 +0100
> > +++ b/fs/buffer.c 2006-02-10 15:47:55.000000000 +0100
> > @@ -3138,7 +3138,7 @@
> > if (__get_cpu_var(bh_accounting).ratelimit++ < 4096)
> > return;
> > __get_cpu_var(bh_accounting).ratelimit = 0;
> > - for_each_cpu(i)
> > + for_each_online_cpu(i)
> > tot += per_cpu(bh_accounting, i).nr;
> > buffer_heads_over_limit = (tot > max_buffer_heads);
> > }
> > @@ -3187,6 +3187,9 @@
> > brelse(b->bhs[i]);
> > b->bhs[i] = NULL;
> > }
> > + get_cpu_var(bh_accounting).nr += per_cpu(bh_accounting, cpu).nr ;
> > + per_cpu(bh_accounting, cpu).nr = 0;
> > + put_cpu_var(bh_accounting);
> > }
>
> But now there is a window between the time the cpu is marked offline
> and the time its bh_accounting.nr is moved to another cpu. So
> recalc_bh_state could fail to set buffer_heads_over_limit when it
> should.
Yes, there is a wee race there, but the consequences of a glitch or small
inaccuracy in buffer_heads_over_limit will be transient and negligibe.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-10 10:13 ` Andi Kleen
@ 2006-02-11 14:49 ` Heiko Carstens
2006-02-11 18:04 ` Andi Kleen
0 siblings, 1 reply; 58+ messages in thread
From: Heiko Carstens @ 2006-02-11 14:49 UTC (permalink / raw)
To: Andi Kleen
Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel,
torvalds, mingo, 76306.1226, wli, Paul Jackson, jbeulich,
Keir Fraser
> > > x86-64 had the same problem, but we now require that you
> > > boot with additional_cpus=... for how many you want. Default is 0
> > > (used to be half available CPUs but that lead to confusion)
> >
> > So introducing the additional_cpus kernel parameter seems to be the way
> > to go (for XEN probably too). Even though it seems to be a bit odd if the
> > user specifies both maxcpus=... and additional_cpus=...
>
> With additional_cpus you don't need maxcpus. They are added together.
How does x86_64 manage to get 'additional_cpus' parsed early enough? As far
as I can see this is done when parse_args() in start_kernel() gets called,
but that's after you need the parameter in prefill_possible_map().
IMHO that should be an early_param and you would need to call
parse_early_param() from setup_arch(). But then again, maybe I got it all
wrong.
But the more interesting question is: what do you do if the command line
contains both additional_cpus and maxcpus. I was just trying to make some
sense of this, but the result is questionable.
I ended up with a cpu_possible_map that has 'present cpus' +
'additional_cpus' bits set. And in smp_prepare_cpus I make sure that
cpu_present_map has not more than max_cpus bits set.
At least that doesn't break the current semantics of the maxcpus parameter.
But we're still wasting memory, since it would make sense that the
cpu_possible_map shouldn't have more than max_cpus bits set.
Heiko
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
2006-02-11 14:49 ` Heiko Carstens
@ 2006-02-11 18:04 ` Andi Kleen
0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2006-02-11 18:04 UTC (permalink / raw)
To: Heiko Carstens
Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel,
torvalds, mingo, 76306.1226, wli, Paul Jackson, jbeulich,
Keir Fraser
On Sat, Feb 11, 2006 at 03:49:29PM +0100, Heiko Carstens wrote:
> > > > x86-64 had the same problem, but we now require that you
> > > > boot with additional_cpus=... for how many you want. Default is 0
> > > > (used to be half available CPUs but that lead to confusion)
> > >
> > > So introducing the additional_cpus kernel parameter seems to be the way
> > > to go (for XEN probably too). Even though it seems to be a bit odd if the
> > > user specifies both maxcpus=... and additional_cpus=...
> >
> > With additional_cpus you don't need maxcpus. They are added together.
>
> How does x86_64 manage to get 'additional_cpus' parsed early enough? As far
> as I can see this is done when parse_args() in start_kernel() gets called,
> but that's after you need the parameter in prefill_possible_map().
> IMHO that should be an early_param and you would need to call
> parse_early_param() from setup_arch(). But then again, maybe I got it all
> wrong.
Yes, you're right - it's added too late to the map right now.
I will fix that. There are no earlyparams unfortunately, except for a
big hack in setup.c
> But the more interesting question is: what do you do if the command line
> contains both additional_cpus and maxcpus. I was just trying to make some
> sense of this, but the result is questionable.
> I ended up with a cpu_possible_map that has 'present cpus' +
> 'additional_cpus' bits set. And in smp_prepare_cpus I make sure that
> cpu_present_map has not more than max_cpus bits set.
>
> At least that doesn't break the current semantics of the maxcpus parameter.
> But we're still wasting memory, since it would make sense that the
> cpu_possible_map shouldn't have more than max_cpus bits set.
Yes, maybe it should be a early parameter too. But frankly I see
maxcpus more as a debugging hack or workarouno. I don't think it matters
much if it's not as efficient as it could be.
-Andi
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2006-02-11 18:05 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-09 8:32 [PATCH] percpu data: only iterate over possible CPUs Chuck Ebbert
2006-02-09 8:55 ` Paul Jackson
2006-02-09 9:06 ` Andrew Morton
2006-02-09 9:11 ` Andrew Morton
2006-02-09 10:08 ` Heiko Carstens
2006-02-09 10:13 ` Andrew Morton
2006-02-09 10:23 ` Heiko Carstens
2006-02-09 10:31 ` Andrew Morton
2006-02-09 11:47 ` Heiko Carstens
2006-02-09 12:46 ` Eric Dumazet
2006-02-09 13:12 ` Heiko Carstens
2006-02-09 13:55 ` Eric Dumazet
[not found] <200602051959.k15JxoHK001630@hera.kernel.org>
2006-02-07 15:15 ` Heiko Carstens
2006-02-07 15:31 ` Jens Axboe
2006-02-07 16:25 ` Eric Dumazet
2006-02-07 16:42 ` Linus Torvalds
2006-02-07 17:34 ` Andrew Morton
2006-02-07 17:48 ` Linus Torvalds
2006-02-07 18:30 ` Dipankar Sarma
2006-02-07 18:43 ` Linus Torvalds
2006-02-07 18:53 ` Dipankar Sarma
2006-02-07 19:11 ` Linus Torvalds
2006-02-08 4:40 ` Heiko Carstens
2006-02-08 8:55 ` Ivan Kokshaysky
2006-02-08 22:31 ` Rik van Riel
2006-02-09 1:20 ` Rik van Riel
2006-02-09 3:05 ` Andrew Morton
2006-02-09 3:08 ` Andrew Morton
2006-02-09 4:36 ` Eric Dumazet
2006-02-09 4:45 ` Andrew Morton
2006-02-09 4:56 ` Paul Jackson
2006-02-09 16:08 ` Nathan Lynch
2006-02-09 16:13 ` Heiko Carstens
2006-02-09 16:38 ` Rik van Riel
2006-02-09 16:59 ` Nathan Lynch
2006-02-09 17:37 ` Andi Kleen
2006-02-10 10:05 ` Heiko Carstens
2006-02-10 10:13 ` Andi Kleen
2006-02-11 14:49 ` Heiko Carstens
2006-02-11 18:04 ` Andi Kleen
2006-02-09 17:03 ` Ashok Raj
2006-02-09 17:23 ` Nathan Lynch
2006-02-09 18:04 ` Andrew Morton
2006-02-09 18:52 ` Ashok Raj
2006-02-09 20:37 ` Andrew Morton
2006-02-09 21:03 ` Ashok Raj
2006-02-10 10:02 ` Andi Kleen
2006-02-10 10:42 ` Andrew Morton
2006-02-10 11:09 ` Eric Dumazet
2006-02-10 11:23 ` Andrew Morton
2006-02-10 11:26 ` Andrew Morton
2006-02-10 14:13 ` Eric Dumazet
2006-02-11 0:10 ` Nathan Lynch
2006-02-11 0:25 ` Andrew Morton
2006-02-10 12:10 ` Andi Kleen
2006-02-10 19:08 ` Andrew Morton
2006-02-09 4:39 ` Eric Dumazet
2006-02-09 4:46 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).