* Re: [PATCH] percpu data: only iterate over possible CPUs [not found] <200602051959.k15JxoHK001630@hera.kernel.org> @ 2006-02-07 15:15 ` Heiko Carstens 2006-02-07 15:31 ` Jens Axboe 2006-02-07 16:25 ` Eric Dumazet 2006-02-08 22:31 ` Rik van Riel 1 sibling, 2 replies; 58+ messages in thread From: Heiko Carstens @ 2006-02-07 15:15 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Eric Dumazet, David S. Miller, James Bottomley, Ingo Molnar, Jens Axboe, Anton Blanchard, William Irwin, Andi Kleen, Andrew Morton, Linus Torvalds > tree 8c30052a0d7fadec37c785a42a71b28d0a9c5fcf > parent cef5076987dd545ac74f4efcf1c962be8eac34b0 > author Eric Dumazet <dada1@cosmosbay.com> Sun, 05 Feb 2006 15:27:36 -0800 > committer Linus Torvalds <torvalds@g5.osdl.org> Mon, 06 Feb 2006 03:06:51 -0800 > > [PATCH] percpu data: only iterate over possible CPUs > > percpu_data blindly allocates bootmem memory to store NR_CPUS instances of > cpudata, instead of allocating memory only for possible cpus. > > As a preparation for changing that, we need to convert various 0 -> NR_CPUS > loops to use for_each_cpu(). > > (The above only applies to users of asm-generic/percpu.h. powerpc has gone it > alone and is presently only allocating memory for present CPUs, so it's > currently corrupting memory). This patch is broken since it replaces several loops that iterate NR_CPUS times with for_each_cpu before cpu_possible_map is setup: > --- a/fs/file.c > +++ b/fs/file.c > @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list > void __init files_defer_init(void) > { > int i; > - /* Really early - can't use for_each_cpu */ > - for (i = 0; i < NR_CPUS; i++) > + for_each_cpu(i) > fdtable_defer_list_init(i); > } The old comment indicates it: called before smp_prepare_cpus gets called which sets up cpu_possible_map. > diff --git a/kernel/sched.c b/kernel/sched.c > index f77f23f..839466f 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -6109,7 +6109,7 @@ void __init sched_init(void) > runqueue_t *rq; > int i, j, k; > > - for (i = 0; i < NR_CPUS; i++) { > + for_each_cpu(i) { > prio_array_t *array; Same here. I didn't check the rest, but it looks like we end up with a bit of uninitialized stuff. Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 15:15 ` [PATCH] percpu data: only iterate over possible CPUs Heiko Carstens @ 2006-02-07 15:31 ` Jens Axboe 2006-02-07 16:25 ` Eric Dumazet 1 sibling, 0 replies; 58+ messages in thread From: Jens Axboe @ 2006-02-07 15:31 UTC (permalink / raw) To: Heiko Carstens Cc: Linux Kernel Mailing List, Eric Dumazet, David S. Miller, James Bottomley, Ingo Molnar, Anton Blanchard, William Irwin, Andi Kleen, Andrew Morton, Linus Torvalds On Tue, Feb 07 2006, Heiko Carstens wrote: > > tree 8c30052a0d7fadec37c785a42a71b28d0a9c5fcf > > parent cef5076987dd545ac74f4efcf1c962be8eac34b0 > > author Eric Dumazet <dada1@cosmosbay.com> Sun, 05 Feb 2006 15:27:36 -0800 > > committer Linus Torvalds <torvalds@g5.osdl.org> Mon, 06 Feb 2006 03:06:51 -0800 > > > > [PATCH] percpu data: only iterate over possible CPUs > > > > percpu_data blindly allocates bootmem memory to store NR_CPUS instances of > > cpudata, instead of allocating memory only for possible cpus. > > > > As a preparation for changing that, we need to convert various 0 -> NR_CPUS > > loops to use for_each_cpu(). > > > > (The above only applies to users of asm-generic/percpu.h. powerpc has gone it > > alone and is presently only allocating memory for present CPUs, so it's > > currently corrupting memory). > > This patch is broken since it replaces several loops that iterate NR_CPUS > times with for_each_cpu before cpu_possible_map is setup: Hrmpf, chicking and egg - we must not initialize data for unknown CPUs, but we can't check since it's not setup. To me it sounds really broken that core structures like that are not setup before we init eg fs stuff. -- Jens Axboe ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 15:15 ` [PATCH] percpu data: only iterate over possible CPUs Heiko Carstens 2006-02-07 15:31 ` Jens Axboe @ 2006-02-07 16:25 ` Eric Dumazet 2006-02-07 16:42 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Eric Dumazet @ 2006-02-07 16:25 UTC (permalink / raw) To: Heiko Carstens Cc: Linux Kernel Mailing List, David S. Miller, James Bottomley, Ingo Molnar, Jens Axboe, Anton Blanchard, William Irwin, Andi Kleen, Andrew Morton, Linus Torvalds Heiko Carstens a écrit : >> tree 8c30052a0d7fadec37c785a42a71b28d0a9c5fcf >> parent cef5076987dd545ac74f4efcf1c962be8eac34b0 >> author Eric Dumazet <dada1@cosmosbay.com> Sun, 05 Feb 2006 15:27:36 -0800 >> committer Linus Torvalds <torvalds@g5.osdl.org> Mon, 06 Feb 2006 03:06:51 -0800 >> >> [PATCH] percpu data: only iterate over possible CPUs >> >> percpu_data blindly allocates bootmem memory to store NR_CPUS instances of >> cpudata, instead of allocating memory only for possible cpus. >> >> As a preparation for changing that, we need to convert various 0 -> NR_CPUS >> loops to use for_each_cpu(). >> >> (The above only applies to users of asm-generic/percpu.h. powerpc has gone it >> alone and is presently only allocating memory for present CPUs, so it's >> currently corrupting memory). > > This patch is broken since it replaces several loops that iterate NR_CPUS > times with for_each_cpu before cpu_possible_map is setup: This patch assumes that cpu_possible_map is setup before setup_per_cpu_areas(). That sounds a reasonable assumption, but maybe not on your architecture ? I dont think cpu_possible_map has to be filled at smp_prepare_cpus() time, but long before. On i386/x86_64/ia64, this is done from setup_arch() called from start_kernel() just before setup_per_cpu_areas() On powerpc it's done from setup_system(), called before start_kernel() Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 16:25 ` Eric Dumazet @ 2006-02-07 16:42 ` Linus Torvalds 2006-02-07 17:34 ` Andrew Morton 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2006-02-07 16:42 UTC (permalink / raw) To: Eric Dumazet Cc: Heiko Carstens, Linux Kernel Mailing List, David S. Miller, James Bottomley, Ingo Molnar, Jens Axboe, Anton Blanchard, William Irwin, Andi Kleen, Andrew Morton On Tue, 7 Feb 2006, Eric Dumazet wrote: > > This patch assumes that cpu_possible_map is setup before > setup_per_cpu_areas(). > > That sounds a reasonable assumption, but maybe not on your architecture ? I have to say, it sounds not just like a reasonable assumption, but like the only sane assumption that there _could_ be. > I dont think cpu_possible_map has to be filled at smp_prepare_cpus() time, but > long before. > > On i386/x86_64/ia64, this is done from setup_arch() called from start_kernel() > just before setup_per_cpu_areas() > > On powerpc it's done from setup_system(), called before start_kernel() It absolutely _has_ to be done from setup_arch() or earlier, as shown by the fact that "setup_per_cpu_areas()" is the very next thing that init/main.c calls (and clearly, that needs to know what CPU's are possible). ppc64 certainly calls it early enough, as does x86/x86-64/ia64. I don't see anybody else doing it too late either. Heiko, can you point to the "old comment" you mentioned in the email, or the architecture that does this wrong? Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 16:42 ` Linus Torvalds @ 2006-02-07 17:34 ` Andrew Morton 2006-02-07 17:48 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-07 17:34 UTC (permalink / raw) To: Linus Torvalds Cc: dada1, heiko.carstens, linux-kernel, davem, James.Bottomley, mingo, axboe, anton, wli, ak Linus Torvalds <torvalds@osdl.org> wrote: > > > > On Tue, 7 Feb 2006, Eric Dumazet wrote: > > > > This patch assumes that cpu_possible_map is setup before > > setup_per_cpu_areas(). > > > > That sounds a reasonable assumption, but maybe not on your architecture ? > > I have to say, it sounds not just like a reasonable assumption, but like > the only sane assumption that there _could_ be. > > > I dont think cpu_possible_map has to be filled at smp_prepare_cpus() time, but > > long before. > > > > On i386/x86_64/ia64, this is done from setup_arch() called from start_kernel() > > just before setup_per_cpu_areas() > > > > On powerpc it's done from setup_system(), called before start_kernel() > > It absolutely _has_ to be done from setup_arch() or earlier, as shown by > the fact that "setup_per_cpu_areas()" is the very next thing that > init/main.c calls (and clearly, that needs to know what CPU's are > possible). > > ppc64 certainly calls it early enough, as does x86/x86-64/ia64. I don't > see anybody else doing it too late either. > > Heiko, can you point to the "old comment" you mentioned in the email, or > the architecture that does this wrong? > This one: --- devel/fs/file.c~reduce-size-of-percpudata-and-make-sure-per_cpuobject 2006-02-04 23:27:17.000000000 -0800 +++ devel-akpm/fs/file.c 2006-02-04 23:27:17.000000000 -0800 @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list void __init files_defer_init(void) { int i; - /* Really early - can't use for_each_cpu */ - for (i = 0; i < NR_CPUS; i++) + for_each_cpu(i) fdtable_defer_list_init(i); } And yes, me too - when I saw that comment disappear I checked and decided that the comment was both wrong and undesirable. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 17:34 ` Andrew Morton @ 2006-02-07 17:48 ` Linus Torvalds 2006-02-07 18:30 ` Dipankar Sarma 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2006-02-07 17:48 UTC (permalink / raw) To: Andrew Morton Cc: dada1, heiko.carstens, linux-kernel, davem, James.Bottomley, mingo, axboe, anton, wli, ak On Tue, 7 Feb 2006, Andrew Morton wrote: > > This one: > > --- devel/fs/file.c~reduce-size-of-percpudata-and-make-sure-per_cpuobject 2006-02-04 23:27:17.000000000 -0800 > +++ devel-akpm/fs/file.c 2006-02-04 23:27:17.000000000 -0800 > @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list > void __init files_defer_init(void) > { > int i; > - /* Really early - can't use for_each_cpu */ > - for (i = 0; i < NR_CPUS; i++) > + for_each_cpu(i) > fdtable_defer_list_init(i); > } > > And yes, me too - when I saw that comment disappear I checked and decided > that the comment was both wrong and undesirable. Ahh, yes. The comment is totally incorrect, we must have done the CPU setup much too later a long long time ago ;) Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 17:48 ` Linus Torvalds @ 2006-02-07 18:30 ` Dipankar Sarma 2006-02-07 18:43 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Dipankar Sarma @ 2006-02-07 18:30 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, dada1, heiko.carstens, linux-kernel, davem, James.Bottomley, mingo, axboe, anton, wli, ak On Tue, Feb 07, 2006 at 09:48:41AM -0800, Linus Torvalds wrote: > > > On Tue, 7 Feb 2006, Andrew Morton wrote: > > > > This one: > > > > --- devel/fs/file.c~reduce-size-of-percpudata-and-make-sure-per_cpuobject 2006-02-04 23:27:17.000000000 -0800 > > +++ devel-akpm/fs/file.c 2006-02-04 23:27:17.000000000 -0800 > > @@ -379,7 +379,6 @@ static void __devinit fdtable_defer_list > > void __init files_defer_init(void) > > { > > int i; > > - /* Really early - can't use for_each_cpu */ > > - for (i = 0; i < NR_CPUS; i++) > > + for_each_cpu(i) > > fdtable_defer_list_init(i); > > } > > > > And yes, me too - when I saw that comment disappear I checked and decided > > that the comment was both wrong and undesirable. > > Ahh, yes. The comment is totally incorrect, we must have done the CPU > setup much too later a long long time ago ;) One would think so, but I recall not all archs did that. Alpha for example sets up cpu_possible_map in smp_prepare_cpus(). It however makes more sense to fix the arch then use NR_CPUS, IMO. Thanks Dipankar ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 18:30 ` Dipankar Sarma @ 2006-02-07 18:43 ` Linus Torvalds 2006-02-07 18:53 ` Dipankar Sarma 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2006-02-07 18:43 UTC (permalink / raw) To: Dipankar Sarma Cc: Andrew Morton, dada1, heiko.carstens, linux-kernel, davem, James.Bottomley, mingo, axboe, anton, wli, ak On Wed, 8 Feb 2006, Dipankar Sarma wrote: > > One would think so, but I recall not all archs did that. Alpha for > example sets up cpu_possible_map in smp_prepare_cpus(). It however > makes more sense to fix the arch then use NR_CPUS, IMO. Ehh? alpha does it in setup_smp(), which in turn is called very early from setup_arch(). Were you perhaps thinking of something else? Or am I just going blind and confused? Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 18:43 ` Linus Torvalds @ 2006-02-07 18:53 ` Dipankar Sarma 2006-02-07 19:11 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Dipankar Sarma @ 2006-02-07 18:53 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, dada1, heiko.carstens, linux-kernel, davem, James.Bottomley, mingo, axboe, anton, wli, ak On Tue, Feb 07, 2006 at 10:43:55AM -0800, Linus Torvalds wrote: > > > On Wed, 8 Feb 2006, Dipankar Sarma wrote: > > > > One would think so, but I recall not all archs did that. Alpha for > > example sets up cpu_possible_map in smp_prepare_cpus(). It however > > makes more sense to fix the arch then use NR_CPUS, IMO. > > Ehh? alpha does it in setup_smp(), which in turn is called very early from > setup_arch(). > > Were you perhaps thinking of something else? Or am I just going blind and > confused? I am looking at 2.6.16-rc1 and I don't see cpu_possible_map being set in setup_smp(). That said, it seems alpha setup_smp() probes for cpus there, so there is no reason why it cannot be set there. I think it is wrong not to set cpu_possible_map very early. Or perhaps it got fixed later on, in which case, oh well, I need to download more often. <sigh>. Thanks Dipankar ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 18:53 ` Dipankar Sarma @ 2006-02-07 19:11 ` Linus Torvalds 2006-02-08 4:40 ` Heiko Carstens 2006-02-08 8:55 ` Ivan Kokshaysky 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2006-02-07 19:11 UTC (permalink / raw) To: Dipankar Sarma, Richard Henderson, Ivan Kokshaysky Cc: Andrew Morton, dada1, heiko.carstens, Linux Kernel Mailing List, David S. Miller, James.Bottomley, Ingo Molnar, axboe, anton, wli, ak On Wed, 8 Feb 2006, Dipankar Sarma wrote: > > I am looking at 2.6.16-rc1 and I don't see cpu_possible_map > being set in setup_smp() You're right, my bad. I looked at setup_smp() and how it walked through every CPU in the firmware, but it doesn't actually ever set the possible map, it fills in just hwrpb_cpu_present_mask (which is then then only used _later_ to set cpu_possible_map for some silly reason). As far as I can tell, "hwrpb_cpu_present_mask" is just wrong, and the code _should_ be using cpu_possible_map. rth? Ivan? Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 19:11 ` Linus Torvalds @ 2006-02-08 4:40 ` Heiko Carstens 2006-02-08 8:55 ` Ivan Kokshaysky 1 sibling, 0 replies; 58+ messages in thread From: Heiko Carstens @ 2006-02-08 4:40 UTC (permalink / raw) To: Linus Torvalds Cc: Dipankar Sarma, Richard Henderson, Ivan Kokshaysky, Andrew Morton, dada1, Linux Kernel Mailing List, David S. Miller, James.Bottomley, Ingo Molnar, axboe, anton, wli, ak > > I am looking at 2.6.16-rc1 and I don't see cpu_possible_map > > being set in setup_smp() > > You're right, my bad. I looked at setup_smp() and how it walked through > every CPU in the firmware, but it doesn't actually ever set the possible > map, it fills in just hwrpb_cpu_present_mask (which is then then only used > _later_ to set cpu_possible_map for some silly reason). > > As far as I can tell, "hwrpb_cpu_present_mask" is just wrong, and the code > _should_ be using cpu_possible_map. We still have this one in init/main.c: /* Sets up cpus_possible() */ smp_prepare_cpus(max_cpus); That is actually why s390 is doing this in smp_prepare_cpus. We also use the passed value of max_cpus to set the number of bits in cpu_possible_map accordingly. This isn't possible anymore if this should be done in setup_arch. So it looks like we have to switch to setup_arch and set NR_CPUS bits in cpu_possible_map on s390. Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-07 19:11 ` Linus Torvalds 2006-02-08 4:40 ` Heiko Carstens @ 2006-02-08 8:55 ` Ivan Kokshaysky 1 sibling, 0 replies; 58+ messages in thread From: Ivan Kokshaysky @ 2006-02-08 8:55 UTC (permalink / raw) To: Linus Torvalds Cc: Dipankar Sarma, Richard Henderson, Andrew Morton, dada1, heiko.carstens, Linux Kernel Mailing List, David S. Miller, James.Bottomley, Ingo Molnar, axboe, anton, wli, ak On Tue, Feb 07, 2006 at 11:11:56AM -0800, Linus Torvalds wrote: > You're right, my bad. I looked at setup_smp() and how it walked through > every CPU in the firmware, but it doesn't actually ever set the possible > map, it fills in just hwrpb_cpu_present_mask (which is then then only used > _later_ to set cpu_possible_map for some silly reason). > > As far as I can tell, "hwrpb_cpu_present_mask" is just wrong, and the code > _should_ be using cpu_possible_map. Yep, it seems that we can get rid of hwrpb_cpu_present_mask. The appended patch is only minimally tested (works on UP with SMP kernel). Ivan. --- linux/arch/alpha/kernel/smp.c.orig Wed Jan 18 02:09:27 2006 +++ linux/arch/alpha/kernel/smp.c Wed Feb 8 02:38:46 2006 @@ -73,9 +73,6 @@ cpumask_t cpu_online_map; EXPORT_SYMBOL(cpu_online_map); -/* cpus reported in the hwrpb */ -static unsigned long hwrpb_cpu_present_mask __initdata = 0; - int smp_num_probed; /* Internal processor count */ int smp_num_cpus = 1; /* Number that came online. */ @@ -442,7 +439,7 @@ setup_smp(void) if ((cpu->flags & 0x1cc) == 0x1cc) { smp_num_probed++; /* Assume here that "whami" == index */ - hwrpb_cpu_present_mask |= (1UL << i); + cpu_set(i, cpu_possible_map); cpu->pal_revision = boot_cpu_palrev; } @@ -453,12 +450,12 @@ setup_smp(void) } } else { smp_num_probed = 1; - hwrpb_cpu_present_mask = (1UL << boot_cpuid); + cpu_set(boot_cpuid, cpu_possible_map); } cpu_present_mask = cpumask_of_cpu(boot_cpuid); printk(KERN_INFO "SMP: %d CPUs probed -- cpu_present_mask = %lx\n", - smp_num_probed, hwrpb_cpu_present_mask); + smp_num_probed, cpu_possible_map.bits[0]); } /* @@ -467,8 +464,6 @@ setup_smp(void) void __init smp_prepare_cpus(unsigned int max_cpus) { - int cpu_count, i; - /* Take care of some initial bookkeeping. */ memset(ipi_data, 0, sizeof(ipi_data)); @@ -486,19 +481,7 @@ smp_prepare_cpus(unsigned int max_cpus) printk(KERN_INFO "SMP starting up secondaries.\n"); - cpu_count = 1; - for (i = 0; (i < NR_CPUS) && (cpu_count < max_cpus); i++) { - if (i == boot_cpuid) - continue; - - if (((hwrpb_cpu_present_mask >> i) & 1) == 0) - continue; - - cpu_set(i, cpu_possible_map); - cpu_count++; - } - - smp_num_cpus = cpu_count; + smp_num_cpus = smp_num_probed; } void __devinit ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs [not found] <200602051959.k15JxoHK001630@hera.kernel.org> 2006-02-07 15:15 ` [PATCH] percpu data: only iterate over possible CPUs Heiko Carstens @ 2006-02-08 22:31 ` Rik van Riel 2006-02-09 1:20 ` Rik van Riel ` (2 more replies) 1 sibling, 3 replies; 58+ messages in thread From: Rik van Riel @ 2006-02-08 22:31 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Eric Dumazet, Linus Torvalds On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote: > [PATCH] percpu data: only iterate over possible CPUs This sched.c bit breaks Xen, and probably also other architectures that have CPU hotplug. I suspect the reason is that during early bootup only the boot CPU is online, so nothing initialises the runqueues for CPUs that are brought up afterwards. I suspect we can get rid of this problem quite easily by moving runqueue initialisation to init_idle()... > diff --git a/kernel/sched.c b/kernel/sched.c > index f77f23f..839466f 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -6109,7 +6109,7 @@ void __init sched_init(void) > runqueue_t *rq; > int i, j, k; > > - for (i = 0; i < NR_CPUS; i++) { > + for_each_cpu(i) { > prio_array_t *array; > > rq = cpu_rq(i); -- All Rights Reversed ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-08 22:31 ` Rik van Riel @ 2006-02-09 1:20 ` Rik van Riel 2006-02-09 3:05 ` Andrew Morton 2006-02-09 4:39 ` Eric Dumazet 2 siblings, 0 replies; 58+ messages in thread From: Rik van Riel @ 2006-02-09 1:20 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Eric Dumazet, Linus Torvalds, xen-devel On Wed, 8 Feb 2006, Rik van Riel wrote: > On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote: > > > [PATCH] percpu data: only iterate over possible CPUs > > The sched.c bit breaks Xen, and probably also other architectures > that have CPU hotplug. I suspect the reason is that during early > bootup only the boot CPU is online, so nothing initialises the > runqueues for CPUs that are brought up afterwards. > > I suspect we can get rid of this problem quite easily by moving > runqueue initialisation to init_idle()... Well, it works. This (fairly trivial) patch makes hotplug cpu work again, by ensuring that the runqueues of a newly brought up CPU are initialized just before they are needed. Without this patch the "spin_lock_irqsave(&rq->lock, flags);" in init_idle() would oops if CONFIG_DEBUG_SPINLOCK was set. With this patch, things just work. Signed-off-by: Rik van Riel <riel@redhat.com> --- linux-2.6.15.i686/kernel/sched.c.idle_init 2006-02-08 17:56:50.000000000 -0500 +++ linux-2.6.15.i686/kernel/sched.c 2006-02-08 17:58:57.000000000 -0500 @@ -4437,6 +4437,35 @@ void __devinit init_idle(task_t *idle, i { runqueue_t *rq = cpu_rq(cpu); unsigned long flags; + prio_array_t *array; + int j, k; + + spin_lock_init(&rq->lock); + rq->nr_running = 0; + rq->active = rq->arrays; + rq->expired = rq->arrays + 1; + rq->best_expired_prio = MAX_PRIO; + +#ifdef CONFIG_SMP + rq->sd = NULL; + for (j = 1; j < 3; j++) + rq->cpu_load[j] = 0; + rq->active_balance = 0; + rq->push_cpu = 0; + rq->migration_thread = NULL; + INIT_LIST_HEAD(&rq->migration_queue); +#endif + atomic_set(&rq->nr_iowait, 0); + + for (j = 0; j < 2; j++) { + array = rq->arrays + j; + for (k = 0; k < MAX_PRIO; k++) { + INIT_LIST_HEAD(array->queue + k); + __clear_bit(k, array->bitmap); + } + // delimiter for bitsearch + __set_bit(MAX_PRIO, array->bitmap); + } idle->sleep_avg = 0; idle->array = NULL; @@ -6110,41 +6139,6 @@ int in_sched_functions(unsigned long add void __init sched_init(void) { - runqueue_t *rq; - int i, j, k; - - for_each_cpu(i) { - prio_array_t *array; - - rq = cpu_rq(i); - spin_lock_init(&rq->lock); - rq->nr_running = 0; - rq->active = rq->arrays; - rq->expired = rq->arrays + 1; - rq->best_expired_prio = MAX_PRIO; - -#ifdef CONFIG_SMP - rq->sd = NULL; - for (j = 1; j < 3; j++) - rq->cpu_load[j] = 0; - rq->active_balance = 0; - rq->push_cpu = 0; - rq->migration_thread = NULL; - INIT_LIST_HEAD(&rq->migration_queue); -#endif - atomic_set(&rq->nr_iowait, 0); - - for (j = 0; j < 2; j++) { - array = rq->arrays + j; - for (k = 0; k < MAX_PRIO; k++) { - INIT_LIST_HEAD(array->queue + k); - __clear_bit(k, array->bitmap); - } - // delimiter for bitsearch - __set_bit(MAX_PRIO, array->bitmap); - } - } - /* * The boot idle thread does lazy MMU switching as well: */ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-08 22:31 ` Rik van Riel 2006-02-09 1:20 ` Rik van Riel @ 2006-02-09 3:05 ` Andrew Morton 2006-02-09 3:08 ` Andrew Morton 2006-02-09 4:39 ` Eric Dumazet 2 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 3:05 UTC (permalink / raw) To: Rik van Riel Cc: linux-kernel, dada1, Linus Torvalds, Ingo Molnar, Andi Kleen, Chuck Ebbert, Eric Dumazet, William Lee Irwin III, Heiko Carstens Rik van Riel <riel@redhat.com> wrote: > > On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote: > > > [PATCH] percpu data: only iterate over possible CPUs > > This sched.c bit breaks Xen, and probably also other architectures > that have CPU hotplug. I suspect the reason is that during early > bootup only the boot CPU is online, so nothing initialises the > runqueues for CPUs that are brought up afterwards. > > I suspect we can get rid of this problem quite easily by moving > runqueue initialisation to init_idle()... We've hit this snag with a few architectures. They're setting up cpu_possible_map too late. It's never been clearly defined. sched_init() is called here: asmlinkage void __init start_kernel(void) { ... printk(linux_banner); setup_arch(&command_line); setup_per_cpu_areas(); smp_prepare_boot_cpu(); sched_init(); Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised by the time setup_per_cpu_areas() is called, so I think it makes sense to say "thou shalt initialise cpu_possible_map in setup_arch()". I guess Xen isn't doing that. Can it be made to? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 3:05 ` Andrew Morton @ 2006-02-09 3:08 ` Andrew Morton 2006-02-09 4:36 ` Eric Dumazet 0 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 3:08 UTC (permalink / raw) To: Andrew Morton Cc: riel, linux-kernel, dada1, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens Andrew Morton <akpm@osdl.org> wrote: > > Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised > by the time setup_per_cpu_areas() is called, err, they'll need it once Eric's dont-waste-percpu-memory-on-not-possible-CPUs patch is merged.. > so I think it makes sense to > say "thou shalt initialise cpu_possible_map in setup_arch()". > > I guess Xen isn't doing that. Can it be made to? Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch(). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 3:08 ` Andrew Morton @ 2006-02-09 4:36 ` Eric Dumazet 2006-02-09 4:45 ` Andrew Morton 0 siblings, 1 reply; 58+ messages in thread From: Eric Dumazet @ 2006-02-09 4:36 UTC (permalink / raw) To: Andrew Morton Cc: riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens Andrew Morton a écrit : > Andrew Morton <akpm@osdl.org> wrote: >> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised >> by the time setup_per_cpu_areas() is called, > > err, they'll need it once Eric's > dont-waste-percpu-memory-on-not-possible-CPUs patch is merged.. > >> so I think it makes sense to >> say "thou shalt initialise cpu_possible_map in setup_arch()". >> >> I guess Xen isn't doing that. Can it be made to? > > Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch(). I dont understand why this HOTPLUG stuff is problematic for Xen (or other arch) : If CONFIG_HOTPLUG_CPU is configured, then the map should be preset to CPU_MASK_ALL. Its even documented in line 332 of include/linux/cpumask.h * #ifdef CONFIG_HOTPLUG_CPU * cpu_possible_map - all NR_CPUS bits set arch/i386/kernel/smpboot.c is doing the only sane stuff about it : #ifdef CONFIG_HOTPLUG_CPU cpumask_t cpu_possible_map = CPU_MASK_ALL; #else cpumask_t cpu_possible_map; #endif Some remarks : 1) These cpu_possible_map could have __read_mostly attribute. 2) cpu_possible(cpu) macro could be defined to 1 if CONFIG_HOTPLUG_CPU, or a test against NR_CPUS #ifdef CONFIG_HOTPLUG_CPU #define cpu_possible(cpu) cpu_isset((cpu), cpu_possible_map) #else #define cpu_possible(cpu) ((unsigned)(cpu) < NR_CPUS) #endif Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 4:36 ` Eric Dumazet @ 2006-02-09 4:45 ` Andrew Morton 2006-02-09 4:56 ` Paul Jackson 2006-02-09 16:08 ` Nathan Lynch 0 siblings, 2 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-09 4:45 UTC (permalink / raw) To: Eric Dumazet Cc: riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, Paul Jackson Eric Dumazet <dada1@cosmosbay.com> wrote: > > Andrew Morton a écrit : > > Andrew Morton <akpm@osdl.org> wrote: > >> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised > >> by the time setup_per_cpu_areas() is called, > > > > err, they'll need it once Eric's > > dont-waste-percpu-memory-on-not-possible-CPUs patch is merged.. > > > >> so I think it makes sense to > >> say "thou shalt initialise cpu_possible_map in setup_arch()". > >> > >> I guess Xen isn't doing that. Can it be made to? > > > > Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch(). > > I dont understand why this HOTPLUG stuff is problematic for Xen (or other > arch) : If CONFIG_HOTPLUG_CPU is configured, then the map should be preset to > CPU_MASK_ALL. Presumably not all architectures are doing that. And some of the problems we've had are with CONFIG_HOTPLUG_CPU=n. > Its even documented in line 332 of include/linux/cpumask.h > > * #ifdef CONFIG_HOTPLUG_CPU > * cpu_possible_map - all NR_CPUS bits set That seems a quite bad idea. If we know which CPUs are possible we should populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is set. Setting all the bits like that wastes memory and wastes CPU cycles. <greps> That comment came from the tender pinkies of pj@sgi.com, although I suspect it was just a transliteration of then-current practice. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 4:45 ` Andrew Morton @ 2006-02-09 4:56 ` Paul Jackson 2006-02-09 16:08 ` Nathan Lynch 1 sibling, 0 replies; 58+ messages in thread From: Paul Jackson @ 2006-02-09 4:56 UTC (permalink / raw) To: Andrew Morton Cc: dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens Andrew wrote: > That comment came from the tender pinkies of pj@sgi.com, although I suspect > it was just a transliteration of then-current practice. You give my pinkie more credit than is its due. That comment looks bogus. Hmmm ... what should it be ... * #ifdef CONFIG_HOTPLUG_CPU * cpu_possible_map - has bit 'cpu' set iff cpu could be populated -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 4:45 ` Andrew Morton 2006-02-09 4:56 ` Paul Jackson @ 2006-02-09 16:08 ` Nathan Lynch 2006-02-09 16:13 ` Heiko Carstens 2006-02-09 17:03 ` Ashok Raj 1 sibling, 2 replies; 58+ messages in thread From: Nathan Lynch @ 2006-02-09 16:08 UTC (permalink / raw) To: Andrew Morton Cc: Eric Dumazet, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, Paul Jackson Andrew Morton wrote: > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > > Andrew Morton a écrit : > > > Andrew Morton <akpm@osdl.org> wrote: > > >> Users of __GENERIC_PER_CPU definitely need cpu_possible_map to be initialised > > >> by the time setup_per_cpu_areas() is called, > > > > > > err, they'll need it once Eric's > > > dont-waste-percpu-memory-on-not-possible-CPUs patch is merged.. > > > > > >> so I think it makes sense to > > >> say "thou shalt initialise cpu_possible_map in setup_arch()". > > >> > > >> I guess Xen isn't doing that. Can it be made to? > > > > > > Lame fix: cpu_possible_map = (1<<NR_CPUS)-1 in setup_arch(). > > > > I dont understand why this HOTPLUG stuff is problematic for Xen (or other > > arch) : If CONFIG_HOTPLUG_CPU is configured, then the map should be preset to > > CPU_MASK_ALL. > > Presumably not all architectures are doing that. powerpc/ppc64, for instance, determines the number of possible cpus from information exported by firmware (and I'm mystified as to why other platforms don't do this). So it's typical to have a kernel an a pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > > Its even documented in line 332 of include/linux/cpumask.h > > > > * #ifdef CONFIG_HOTPLUG_CPU > > * cpu_possible_map - all NR_CPUS bits set > > That seems a quite bad idea. If we know which CPUs are possible we should > populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is > set. Setting all the bits like that wastes memory and wastes CPU cycles. Yes, that comment is wrong or outdated. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 16:08 ` Nathan Lynch @ 2006-02-09 16:13 ` Heiko Carstens 2006-02-09 16:38 ` Rik van Riel ` (2 more replies) 2006-02-09 17:03 ` Ashok Raj 1 sibling, 3 replies; 58+ messages in thread From: Heiko Carstens @ 2006-02-09 16:13 UTC (permalink / raw) To: Nathan Lynch Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, Paul Jackson > > Presumably not all architectures are doing that. > powerpc/ppc64, for instance, determines the number of possible cpus > from information exported by firmware (and I'm mystified as to why > other platforms don't do this). So it's typical to have a kernel an a > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. Simply because there is no such interface on s390. The only thing we know for sure is that if we are running under z/VM the user is free to configure up to 63 additional virtual cpus on the fly... Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 16:13 ` Heiko Carstens @ 2006-02-09 16:38 ` Rik van Riel 2006-02-09 16:59 ` Nathan Lynch 2006-02-09 17:37 ` Andi Kleen 2 siblings, 0 replies; 58+ messages in thread From: Rik van Riel @ 2006-02-09 16:38 UTC (permalink / raw) To: Heiko Carstens Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, Paul Jackson On Thu, 9 Feb 2006, Heiko Carstens wrote: > Simply because there is no such interface on s390. The only thing we know > for sure is that if we are running under z/VM the user is free to > configure up to 63 additional virtual cpus on the fly... Xen is in a similar situation. -- All Rights Reversed ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 16:13 ` Heiko Carstens 2006-02-09 16:38 ` Rik van Riel @ 2006-02-09 16:59 ` Nathan Lynch 2006-02-09 17:37 ` Andi Kleen 2 siblings, 0 replies; 58+ messages in thread From: Nathan Lynch @ 2006-02-09 16:59 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, Paul Jackson Heiko Carstens wrote: > > > Presumably not all architectures are doing that. > > powerpc/ppc64, for instance, determines the number of possible cpus > > from information exported by firmware (and I'm mystified as to why > > other platforms don't do this). So it's typical to have a kernel an a > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > > Simply because there is no such interface on s390. The only thing we know > for sure is that if we are running under z/VM the user is free to > configure up to 63 additional virtual cpus on the fly... My "mystified" parenthetical was not meant as a criticism of arch/!powerpc, but of the platform implementations themselves, for not making such an interface available. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 16:13 ` Heiko Carstens 2006-02-09 16:38 ` Rik van Riel 2006-02-09 16:59 ` Nathan Lynch @ 2006-02-09 17:37 ` Andi Kleen 2006-02-10 10:05 ` Heiko Carstens 2 siblings, 1 reply; 58+ messages in thread From: Andi Kleen @ 2006-02-09 17:37 UTC (permalink / raw) To: Heiko Carstens Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, Paul Jackson On Thu, Feb 09, 2006 at 05:13:31PM +0100, Heiko Carstens wrote: > > > Presumably not all architectures are doing that. > > powerpc/ppc64, for instance, determines the number of possible cpus > > from information exported by firmware (and I'm mystified as to why > > other platforms don't do this). So it's typical to have a kernel an a > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > > Simply because there is no such interface on s390. The only thing we know > for sure is that if we are running under z/VM the user is free to > configure up to 63 additional virtual cpus on the fly... x86-64 had the same problem, but we now require that you boot with additional_cpus=... for how many you want. Default is 0 (used to be half available CPUs but that lead to confusion) Also the firmware has a way now to give this information automatically. You can of course make it default to 64, but it will cost you several MB of memory in usually unused copies of per cpu data. -Andi ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 17:37 ` Andi Kleen @ 2006-02-10 10:05 ` Heiko Carstens 2006-02-10 10:13 ` Andi Kleen 0 siblings, 1 reply; 58+ messages in thread From: Heiko Carstens @ 2006-02-10 10:05 UTC (permalink / raw) To: Andi Kleen Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, Paul Jackson > > > powerpc/ppc64, for instance, determines the number of possible cpus > > > from information exported by firmware (and I'm mystified as to why > > > other platforms don't do this). So it's typical to have a kernel an a > > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > > > > Simply because there is no such interface on s390. The only thing we know > > for sure is that if we are running under z/VM the user is free to > > configure up to 63 additional virtual cpus on the fly... > > x86-64 had the same problem, but we now require that you > boot with additional_cpus=... for how many you want. Default is 0 > (used to be half available CPUs but that lead to confusion) So introducing the additional_cpus kernel parameter seems to be the way to go (for XEN probably too). Even though it seems to be a bit odd if the user specifies both maxcpus=... and additional_cpus=... Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 10:05 ` Heiko Carstens @ 2006-02-10 10:13 ` Andi Kleen 2006-02-11 14:49 ` Heiko Carstens 0 siblings, 1 reply; 58+ messages in thread From: Andi Kleen @ 2006-02-10 10:13 UTC (permalink / raw) To: Heiko Carstens Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, Paul Jackson, jbeulich, Keir Fraser [putting some Xen people into cc] On Friday 10 February 2006 11:05, Heiko Carstens wrote: > > > > powerpc/ppc64, for instance, determines the number of possible cpus > > > > from information exported by firmware (and I'm mystified as to why > > > > other platforms don't do this). So it's typical to have a kernel an a > > > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > > > > > > Simply because there is no such interface on s390. The only thing we know > > > for sure is that if we are running under z/VM the user is free to > > > configure up to 63 additional virtual cpus on the fly... > > > > x86-64 had the same problem, but we now require that you > > boot with additional_cpus=... for how many you want. Default is 0 > > (used to be half available CPUs but that lead to confusion) > > So introducing the additional_cpus kernel parameter seems to be the way > to go (for XEN probably too). Even though it seems to be a bit odd if the > user specifies both maxcpus=... and additional_cpus=... With additional_cpus you don't need maxcpus. They are added together. Also for Xen it's probably not needed by default, but the hypervisor can somehow pass it (it doesn't make sense to have more vcpus than real cpus and that should be relatively small number, so the memory waste shouldn't be that bad). On x86-64 there is also a new firmware way to specify it, but current machines don't support that yet. -Andi ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 10:13 ` Andi Kleen @ 2006-02-11 14:49 ` Heiko Carstens 2006-02-11 18:04 ` Andi Kleen 0 siblings, 1 reply; 58+ messages in thread From: Heiko Carstens @ 2006-02-11 14:49 UTC (permalink / raw) To: Andi Kleen Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, Paul Jackson, jbeulich, Keir Fraser > > > x86-64 had the same problem, but we now require that you > > > boot with additional_cpus=... for how many you want. Default is 0 > > > (used to be half available CPUs but that lead to confusion) > > > > So introducing the additional_cpus kernel parameter seems to be the way > > to go (for XEN probably too). Even though it seems to be a bit odd if the > > user specifies both maxcpus=... and additional_cpus=... > > With additional_cpus you don't need maxcpus. They are added together. How does x86_64 manage to get 'additional_cpus' parsed early enough? As far as I can see this is done when parse_args() in start_kernel() gets called, but that's after you need the parameter in prefill_possible_map(). IMHO that should be an early_param and you would need to call parse_early_param() from setup_arch(). But then again, maybe I got it all wrong. But the more interesting question is: what do you do if the command line contains both additional_cpus and maxcpus. I was just trying to make some sense of this, but the result is questionable. I ended up with a cpu_possible_map that has 'present cpus' + 'additional_cpus' bits set. And in smp_prepare_cpus I make sure that cpu_present_map has not more than max_cpus bits set. At least that doesn't break the current semantics of the maxcpus parameter. But we're still wasting memory, since it would make sense that the cpu_possible_map shouldn't have more than max_cpus bits set. Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-11 14:49 ` Heiko Carstens @ 2006-02-11 18:04 ` Andi Kleen 0 siblings, 0 replies; 58+ messages in thread From: Andi Kleen @ 2006-02-11 18:04 UTC (permalink / raw) To: Heiko Carstens Cc: Nathan Lynch, Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, Paul Jackson, jbeulich, Keir Fraser On Sat, Feb 11, 2006 at 03:49:29PM +0100, Heiko Carstens wrote: > > > > x86-64 had the same problem, but we now require that you > > > > boot with additional_cpus=... for how many you want. Default is 0 > > > > (used to be half available CPUs but that lead to confusion) > > > > > > So introducing the additional_cpus kernel parameter seems to be the way > > > to go (for XEN probably too). Even though it seems to be a bit odd if the > > > user specifies both maxcpus=... and additional_cpus=... > > > > With additional_cpus you don't need maxcpus. They are added together. > > How does x86_64 manage to get 'additional_cpus' parsed early enough? As far > as I can see this is done when parse_args() in start_kernel() gets called, > but that's after you need the parameter in prefill_possible_map(). > IMHO that should be an early_param and you would need to call > parse_early_param() from setup_arch(). But then again, maybe I got it all > wrong. Yes, you're right - it's added too late to the map right now. I will fix that. There are no earlyparams unfortunately, except for a big hack in setup.c > But the more interesting question is: what do you do if the command line > contains both additional_cpus and maxcpus. I was just trying to make some > sense of this, but the result is questionable. > I ended up with a cpu_possible_map that has 'present cpus' + > 'additional_cpus' bits set. And in smp_prepare_cpus I make sure that > cpu_present_map has not more than max_cpus bits set. > > At least that doesn't break the current semantics of the maxcpus parameter. > But we're still wasting memory, since it would make sense that the > cpu_possible_map shouldn't have more than max_cpus bits set. Yes, maybe it should be a early parameter too. But frankly I see maxcpus more as a debugging hack or workarouno. I don't think it matters much if it's not as efficient as it could be. -Andi ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 16:08 ` Nathan Lynch 2006-02-09 16:13 ` Heiko Carstens @ 2006-02-09 17:03 ` Ashok Raj 2006-02-09 17:23 ` Nathan Lynch 2006-02-09 18:04 ` Andrew Morton 1 sibling, 2 replies; 58+ messages in thread From: Ashok Raj @ 2006-02-09 17:03 UTC (permalink / raw) To: Nathan Lynch Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, Paul Jackson On Thu, Feb 09, 2006 at 08:08:09AM -0800, Nathan Lynch wrote: > > powerpc/ppc64, for instance, determines the number of possible cpus > from information exported by firmware (and I'm mystified as to why > other platforms don't do this). So it's typical to have a kernel an a > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > Iam assuming in the above ase, cpu_possible_map == cpu_present_map and both dont change after the OS is booted. v.s in platforms capable of hot-plug cpus present could grow up to cpu_possible_map (max) when a new cpu is newly added, that wasnt even present at boot time. The problem was with ACPI just simply looking at the namespace doesnt exactly give us an idea of how many processors are possible in this platform. The reason is we could get more added via SSDT dynamically. on x86_64, we used the number of disabled cpus reported in MADT at boot time as an indicator to set cpu_possible_map. So if you had 4 sockets, but just 2 populated, the BIOS would set the missing CPUS with disabled flag. We simply count the number of disabled CPUs and add to possible map. Andi added documentation to cover that as well in Documentation/x86_64/cpu-hotplug-spec as a guideline for BIOS folks. -- Cheers, Ashok Raj - Open Source Technology Center ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 17:03 ` Ashok Raj @ 2006-02-09 17:23 ` Nathan Lynch 2006-02-09 18:04 ` Andrew Morton 1 sibling, 0 replies; 58+ messages in thread From: Nathan Lynch @ 2006-02-09 17:23 UTC (permalink / raw) To: Ashok Raj Cc: Andrew Morton, Eric Dumazet, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, Paul Jackson Ashok Raj wrote: > On Thu, Feb 09, 2006 at 08:08:09AM -0800, Nathan Lynch wrote: > > > > powerpc/ppc64, for instance, determines the number of possible cpus > > from information exported by firmware (and I'm mystified as to why > > other platforms don't do this). So it's typical to have a kernel an a > > pSeries partition with NR_CPUS=128, but cpu_possible_map = 0xff. > > > > Iam assuming in the above ase, cpu_possible_map == cpu_present_map and both > dont change after the OS is booted. v.s in platforms capable of hot-plug > cpus present could grow up to cpu_possible_map (max) when a new cpu is > newly added, that wasnt even present at boot time. No, cpu_present_map need not equal cpu_possible_map. Of course cpu_possible_map doesn't change after boot, but cpu_present_map can. Apart from that, I don't really understand what you're trying to say here. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 17:03 ` Ashok Raj 2006-02-09 17:23 ` Nathan Lynch @ 2006-02-09 18:04 ` Andrew Morton 2006-02-09 18:52 ` Ashok Raj 2006-02-10 10:02 ` Andi Kleen 1 sibling, 2 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-09 18:04 UTC (permalink / raw) To: Ashok Raj Cc: ntl, dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, pj Ashok Raj <ashok.raj@intel.com> wrote: > > The problem was with ACPI just simply looking at the namespace doesnt > exactly give us an idea of how many processors are possible in this platform. We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, NR_CPUS=lots will be appreciable. Do any x86 platforms actually support CPU hotplug? Does the ACPI problem which you describe occur with present-CPUs, or only with possible-but-not-present ones? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 18:04 ` Andrew Morton @ 2006-02-09 18:52 ` Ashok Raj 2006-02-09 20:37 ` Andrew Morton 2006-02-10 10:02 ` Andi Kleen 1 sibling, 1 reply; 58+ messages in thread From: Ashok Raj @ 2006-02-09 18:52 UTC (permalink / raw) To: Andrew Morton Cc: Ashok Raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, pj On Thu, Feb 09, 2006 at 10:04:29AM -0800, Andrew Morton wrote: > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, > NR_CPUS=lots will be appreciable. > > Do any x86 platforms actually support CPU hotplug? Hi Andrew, logical cpu hotplug (i.e onlining and offlining) can be done on any system. No hw or BIOS support is required. (this is what smp suspend/resume folks use) I remember Natalie from Unisys mentioned they have one system which is ACPI based and supports physical cpu hotplug, but the BIOS is old and didnt support the hotplug notify via ACPI. They probably have some other sysmgmt way to interact and initiate the hotplug. Iam aware of couple more that use ia64 NUMA type hw as well. (I dont think i can announce for them:-) ). > > Does the ACPI problem which you describe occur with present-CPUs, > or only with possible-but-not-present ones? Describing present cpus is not problem. Only knowing possible-but-not-present upfront is an issue. logical-cpu-hotplug only: cpu_present_map == cpu_possible_map always physical-cpu-hotplug: At boot, cpu_present_map is a subset of possible_map. Think its best to NOT set cpu_present_map to MASK_ALL as its being proposed, but let the arch/platform code figure out early enough to set possible_map accurately for that platform. If a platform has no way to determine it, then it could use cmdline like what x86_64 introduced (additional_cpus=) to overcome that. -- Cheers, Ashok Raj - Open Source Technology Center ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 18:52 ` Ashok Raj @ 2006-02-09 20:37 ` Andrew Morton 2006-02-09 21:03 ` Ashok Raj 0 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 20:37 UTC (permalink / raw) To: Ashok Raj Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, pj Ashok Raj <ashok.raj@intel.com> wrote: > > > > > Does the ACPI problem which you describe occur with present-CPUs, > > or only with possible-but-not-present ones? > > Describing present cpus is not problem. > > Only knowing possible-but-not-present upfront is an issue. > > logical-cpu-hotplug only: cpu_present_map == cpu_possible_map always > physical-cpu-hotplug: At boot, cpu_present_map is a subset of possible_map. > > Think its best to NOT set cpu_present_map to MASK_ALL as its being proposed, > but let the arch/platform code figure out early enough to set possible_map > accurately for that platform. If a platform has no way to determine it, > then it could use cmdline like what x86_64 introduced (additional_cpus=) > to overcome that. There is no proposal to change cpu_present_map. The problem is cpu_possible_map. That is presently being initialised to CPU_MASK_ALL, which adversely affects perfoermance. An NR_CPUS=16 kernel on a 2-way presently has cpu_possible_map=0xffff, which will hurt. The proposal is this: From: Andrew Morton <akpm@osdl.org> Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather than over possible ones. That can be quite expensive. b) Soon we'll be allocating per-cpu areas only for possible CPUs. So with CPU_MASK_ALL, we'll be wasting memory. I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug case. Will that break it? I note that parisc is also using CPU_MASK_ALL. Suggest that it stop doing that. Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Jackson <pj@sgi.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> --- arch/i386/kernel/smpboot.c | 4 ---- arch/i386/mach-voyager/voyager_smp.c | 2 +- 2 files changed, 1 insertion(+), 5 deletions(-) diff -puN arch/i386/kernel/smpboot.c~x86-dont-initialise-cpu_possible_map-to-all-ones arch/i386/kernel/smpboot.c --- devel/arch/i386/kernel/smpboot.c~x86-dont-initialise-cpu_possible_map-to-all-ones 2006-02-09 01:11:55.000000000 -0800 +++ devel-akpm/arch/i386/kernel/smpboot.c 2006-02-09 01:12:24.000000000 -0800 @@ -87,11 +87,7 @@ EXPORT_SYMBOL(cpu_online_map); cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; EXPORT_SYMBOL(cpu_callout_map); -#ifdef CONFIG_HOTPLUG_CPU -cpumask_t cpu_possible_map = CPU_MASK_ALL; -#else cpumask_t cpu_possible_map; -#endif EXPORT_SYMBOL(cpu_possible_map); static cpumask_t smp_commenced_mask; diff -puN arch/i386/mach-voyager/voyager_smp.c~x86-dont-initialise-cpu_possible_map-to-all-ones arch/i386/mach-voyager/voyager_smp.c --- devel/arch/i386/mach-voyager/voyager_smp.c~x86-dont-initialise-cpu_possible_map-to-all-ones 2006-02-09 01:11:55.000000000 -0800 +++ devel-akpm/arch/i386/mach-voyager/voyager_smp.c 2006-02-09 01:12:43.000000000 -0800 @@ -240,7 +240,7 @@ static cpumask_t smp_commenced_mask = CP cpumask_t cpu_callin_map = CPU_MASK_NONE; cpumask_t cpu_callout_map = CPU_MASK_NONE; EXPORT_SYMBOL(cpu_callout_map); -cpumask_t cpu_possible_map = CPU_MASK_ALL; +cpumask_t cpu_possible_map = CPU_MASK_NONE; EXPORT_SYMBOL(cpu_possible_map); /* The per processor IRQ masks (these are usually kept in sync) */ _ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 20:37 ` Andrew Morton @ 2006-02-09 21:03 ` Ashok Raj 0 siblings, 0 replies; 58+ messages in thread From: Ashok Raj @ 2006-02-09 21:03 UTC (permalink / raw) To: Andrew Morton Cc: Ashok Raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, ak, 76306.1226, wli, heiko.carstens, pj On Thu, Feb 09, 2006 at 12:37:29PM -0800, Andrew Morton wrote: > > There is no proposal to change cpu_present_map. > > The problem is cpu_possible_map. That is presently being initialised to > CPU_MASK_ALL, which adversely affects perfoermance. An NR_CPUS=16 kernel > on a 2-way presently has cpu_possible_map=0xffff, which will hurt. Think i miss typed earlier. What you proposed is the correct solution. I will followup with similar change on ia64 as well, (currently we do this in smp_build_cpu_map()). And add something similar to what we did for x86_64. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 18:04 ` Andrew Morton 2006-02-09 18:52 ` Ashok Raj @ 2006-02-10 10:02 ` Andi Kleen 2006-02-10 10:42 ` Andrew Morton 1 sibling, 1 reply; 58+ messages in thread From: Andi Kleen @ 2006-02-10 10:02 UTC (permalink / raw) To: Andrew Morton Cc: Ashok Raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj On Thursday 09 February 2006 19:04, Andrew Morton wrote: > Ashok Raj <ashok.raj@intel.com> wrote: > > > > The problem was with ACPI just simply looking at the namespace doesnt > > exactly give us an idea of how many processors are possible in this platform. > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, > NR_CPUS=lots will be appreciable. What is this performance penalty exactly? It wastes quite some memory (each possible CPU needs 32K of memory which adds quickly up), but it shouldn't impact other CPU use. > > Do any x86 platforms actually support CPU hotplug? Xen does. And it's needed for suspend/resume on normal x86 now. -Andi ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 10:02 ` Andi Kleen @ 2006-02-10 10:42 ` Andrew Morton 2006-02-10 11:09 ` Eric Dumazet 2006-02-10 12:10 ` Andi Kleen 0 siblings, 2 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-10 10:42 UTC (permalink / raw) To: Andi Kleen Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Andi Kleen <ak@muc.de> wrote: > > On Thursday 09 February 2006 19:04, Andrew Morton wrote: > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > The problem was with ACPI just simply looking at the namespace doesnt > > > exactly give us an idea of how many processors are possible in this platform. > > > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, > > NR_CPUS=lots will be appreciable. > > What is this performance penalty exactly? All those for_each_cpu() loops will hit NR_CPUS cachelines instead of hweight(cpu_possible_map) cachelines. > It wastes quite some memory (each possible CPU needs 32K of memory which > adds quickly up), but it shouldn't impact other CPU use. > > > > > Do any x86 platforms actually support CPU hotplug? > > Xen does. yup. > And it's needed for suspend/resume on normal x86 now. True. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 10:42 ` Andrew Morton @ 2006-02-10 11:09 ` Eric Dumazet 2006-02-10 11:23 ` Andrew Morton 2006-02-10 12:10 ` Andi Kleen 1 sibling, 1 reply; 58+ messages in thread From: Eric Dumazet @ 2006-02-10 11:09 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Andrew Morton a écrit : > Andi Kleen <ak@muc.de> wrote: >> On Thursday 09 February 2006 19:04, Andrew Morton wrote: >>> Ashok Raj <ashok.raj@intel.com> wrote: >>>> The problem was with ACPI just simply looking at the namespace doesnt >>>> exactly give us an idea of how many processors are possible in this platform. >>> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, >>> NR_CPUS=lots will be appreciable. >> What is this performance penalty exactly? > > All those for_each_cpu() loops will hit NR_CPUS cachelines instead of > hweight(cpu_possible_map) cachelines. You mean NR_CPUS bits, mostly all included in a single cacheline, and even in a single long word :) for most cases (NR_CPUS <= 32 or 64) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 11:09 ` Eric Dumazet @ 2006-02-10 11:23 ` Andrew Morton 2006-02-10 11:26 ` Andrew Morton 2006-02-10 14:13 ` Eric Dumazet 0 siblings, 2 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-10 11:23 UTC (permalink / raw) To: Eric Dumazet Cc: ak, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Eric Dumazet <dada1@cosmosbay.com> wrote: > > Andrew Morton a écrit : > > Andi Kleen <ak@muc.de> wrote: > >> On Thursday 09 February 2006 19:04, Andrew Morton wrote: > >>> Ashok Raj <ashok.raj@intel.com> wrote: > >>>> The problem was with ACPI just simply looking at the namespace doesnt > >>>> exactly give us an idea of how many processors are possible in this platform. > >>> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, > >>> NR_CPUS=lots will be appreciable. > >> What is this performance penalty exactly? > > > > All those for_each_cpu() loops will hit NR_CPUS cachelines instead of > > hweight(cpu_possible_map) cachelines. > > You mean NR_CPUS bits, mostly all included in a single cacheline, and even in > a single long word :) for most cases (NR_CPUS <= 32 or 64) > No, I mean cachelines: static void recalc_bh_state(void) { int i; int tot = 0; if (__get_cpu_var(bh_accounting).ratelimit++ < 4096) return; __get_cpu_var(bh_accounting).ratelimit = 0; for_each_cpu(i) tot += per_cpu(bh_accounting, i).nr; That's going to hit NR_CPUS cachelines even on a 2-way. Or am I missing something really obvious here? (Probably the most expensive ones will be get_page_state() and friends. And argh, they're still hardwired to CPU_MASK_ALL). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 11:23 ` Andrew Morton @ 2006-02-10 11:26 ` Andrew Morton 2006-02-10 14:13 ` Eric Dumazet 1 sibling, 0 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-10 11:26 UTC (permalink / raw) To: dada1, ak, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Andrew Morton <akpm@osdl.org> wrote: > > (Probably the most expensive ones will be get_page_state() and friends. > And argh, they're still hardwired to CPU_MASK_ALL). > No, we mask it with cpu_online_map. Phew. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 11:23 ` Andrew Morton 2006-02-10 11:26 ` Andrew Morton @ 2006-02-10 14:13 ` Eric Dumazet 2006-02-11 0:10 ` Nathan Lynch 1 sibling, 1 reply; 58+ messages in thread From: Eric Dumazet @ 2006-02-10 14:13 UTC (permalink / raw) To: Andrew Morton Cc: ak, ashok.raj, ntl, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj [-- Attachment #1: Type: text/plain, Size: 1794 bytes --] Andrew Morton a écrit : > Eric Dumazet <dada1@cosmosbay.com> wrote: >> Andrew Morton a écrit : >>> Andi Kleen <ak@muc.de> wrote: >>>> On Thursday 09 February 2006 19:04, Andrew Morton wrote: >>>>> Ashok Raj <ashok.raj@intel.com> wrote: >>>>>> The problem was with ACPI just simply looking at the namespace doesnt >>>>>> exactly give us an idea of how many processors are possible in this platform. >>>>> We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, >>>>> NR_CPUS=lots will be appreciable. >>>> What is this performance penalty exactly? >>> All those for_each_cpu() loops will hit NR_CPUS cachelines instead of >>> hweight(cpu_possible_map) cachelines. >> You mean NR_CPUS bits, mostly all included in a single cacheline, and even in >> a single long word :) for most cases (NR_CPUS <= 32 or 64) >> > > No, I mean cachelines: > > static void recalc_bh_state(void) > { > int i; > int tot = 0; > > if (__get_cpu_var(bh_accounting).ratelimit++ < 4096) > return; > __get_cpu_var(bh_accounting).ratelimit = 0; > for_each_cpu(i) > tot += per_cpu(bh_accounting, i).nr; > > That's going to hit NR_CPUS cachelines even on a 2-way. > > Or am I missing something really obvious here? OK I see. This can be solved with this patch : [PATCH] HOTPLUG_CPU : avoid hitting too many cachelines in recalc_bh_state() Instead of using for_each_cpu(i), we can use for_each_online_cpu(i) : The difference matters if HOTPUG_CPU=y When a CPU goes offline (ie removed from online map), it might have a non null bh_accounting.nr, so this patch adds a transfert of this counter to an online CPU counter. We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we can do this bh_accounting.nr transfert. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: buffer.patch --] [-- Type: text/plain, Size: 651 bytes --] --- a/fs/buffer.c 2006-02-10 15:08:21.000000000 +0100 +++ b/fs/buffer.c 2006-02-10 15:47:55.000000000 +0100 @@ -3138,7 +3138,7 @@ if (__get_cpu_var(bh_accounting).ratelimit++ < 4096) return; __get_cpu_var(bh_accounting).ratelimit = 0; - for_each_cpu(i) + for_each_online_cpu(i) tot += per_cpu(bh_accounting, i).nr; buffer_heads_over_limit = (tot > max_buffer_heads); } @@ -3187,6 +3187,9 @@ brelse(b->bhs[i]); b->bhs[i] = NULL; } + get_cpu_var(bh_accounting).nr += per_cpu(bh_accounting, cpu).nr ; + per_cpu(bh_accounting, cpu).nr = 0; + put_cpu_var(bh_accounting); } static int buffer_cpu_notify(struct notifier_block *self, ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 14:13 ` Eric Dumazet @ 2006-02-11 0:10 ` Nathan Lynch 2006-02-11 0:25 ` Andrew Morton 0 siblings, 1 reply; 58+ messages in thread From: Nathan Lynch @ 2006-02-11 0:10 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, ak, ashok.raj, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Eric Dumazet wrote: > > [PATCH] HOTPLUG_CPU : avoid hitting too many cachelines in recalc_bh_state() > > Instead of using for_each_cpu(i), we can use for_each_online_cpu(i) : The > difference matters if HOTPUG_CPU=y > > When a CPU goes offline (ie removed from online map), it might have a non > null bh_accounting.nr, so this patch adds a transfert of this counter to an > online CPU counter. > > We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we > can do this bh_accounting.nr transfert. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > --- a/fs/buffer.c 2006-02-10 15:08:21.000000000 +0100 > +++ b/fs/buffer.c 2006-02-10 15:47:55.000000000 +0100 > @@ -3138,7 +3138,7 @@ > if (__get_cpu_var(bh_accounting).ratelimit++ < 4096) > return; > __get_cpu_var(bh_accounting).ratelimit = 0; > - for_each_cpu(i) > + for_each_online_cpu(i) > tot += per_cpu(bh_accounting, i).nr; > buffer_heads_over_limit = (tot > max_buffer_heads); > } > @@ -3187,6 +3187,9 @@ > brelse(b->bhs[i]); > b->bhs[i] = NULL; > } > + get_cpu_var(bh_accounting).nr += per_cpu(bh_accounting, cpu).nr ; > + per_cpu(bh_accounting, cpu).nr = 0; > + put_cpu_var(bh_accounting); > } But now there is a window between the time the cpu is marked offline and the time its bh_accounting.nr is moved to another cpu. So recalc_bh_state could fail to set buffer_heads_over_limit when it should. Maybe it's not worth worrying about? I don't really know this code, just thought I would point it out. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-11 0:10 ` Nathan Lynch @ 2006-02-11 0:25 ` Andrew Morton 0 siblings, 0 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-11 0:25 UTC (permalink / raw) To: Nathan Lynch Cc: dada1, ak, ashok.raj, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Nathan Lynch <ntl@pobox.com> wrote: > > Eric Dumazet wrote: > > > > [PATCH] HOTPLUG_CPU : avoid hitting too many cachelines in recalc_bh_state() > > > > Instead of using for_each_cpu(i), we can use for_each_online_cpu(i) : The > > difference matters if HOTPUG_CPU=y > > > > When a CPU goes offline (ie removed from online map), it might have a non > > null bh_accounting.nr, so this patch adds a transfert of this counter to an > > online CPU counter. > > > > We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we > > can do this bh_accounting.nr transfert. > > > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > > --- a/fs/buffer.c 2006-02-10 15:08:21.000000000 +0100 > > +++ b/fs/buffer.c 2006-02-10 15:47:55.000000000 +0100 > > @@ -3138,7 +3138,7 @@ > > if (__get_cpu_var(bh_accounting).ratelimit++ < 4096) > > return; > > __get_cpu_var(bh_accounting).ratelimit = 0; > > - for_each_cpu(i) > > + for_each_online_cpu(i) > > tot += per_cpu(bh_accounting, i).nr; > > buffer_heads_over_limit = (tot > max_buffer_heads); > > } > > @@ -3187,6 +3187,9 @@ > > brelse(b->bhs[i]); > > b->bhs[i] = NULL; > > } > > + get_cpu_var(bh_accounting).nr += per_cpu(bh_accounting, cpu).nr ; > > + per_cpu(bh_accounting, cpu).nr = 0; > > + put_cpu_var(bh_accounting); > > } > > But now there is a window between the time the cpu is marked offline > and the time its bh_accounting.nr is moved to another cpu. So > recalc_bh_state could fail to set buffer_heads_over_limit when it > should. Yes, there is a wee race there, but the consequences of a glitch or small inaccuracy in buffer_heads_over_limit will be transient and negligibe. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 10:42 ` Andrew Morton 2006-02-10 11:09 ` Eric Dumazet @ 2006-02-10 12:10 ` Andi Kleen 2006-02-10 19:08 ` Andrew Morton 1 sibling, 1 reply; 58+ messages in thread From: Andi Kleen @ 2006-02-10 12:10 UTC (permalink / raw) To: Andrew Morton Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj On Friday 10 February 2006 11:42, Andrew Morton wrote: > Andi Kleen <ak@muc.de> wrote: > > > > On Thursday 09 February 2006 19:04, Andrew Morton wrote: > > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > > > The problem was with ACPI just simply looking at the namespace doesnt > > > > exactly give us an idea of how many processors are possible in this platform. > > > > > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, > > > NR_CPUS=lots will be appreciable. > > > > What is this performance penalty exactly? > > All those for_each_cpu() loops will hit NR_CPUS cachelines instead of > hweight(cpu_possible_map) cachelines. But are there any in real fast paths? iirc they are mostly in initialization, where it doesn't matter too much. -Andi ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-10 12:10 ` Andi Kleen @ 2006-02-10 19:08 ` Andrew Morton 0 siblings, 0 replies; 58+ messages in thread From: Andrew Morton @ 2006-02-10 19:08 UTC (permalink / raw) To: Andi Kleen Cc: ashok.raj, ntl, dada1, riel, linux-kernel, torvalds, mingo, 76306.1226, wli, heiko.carstens, pj Andi Kleen <ak@muc.de> wrote: > > On Friday 10 February 2006 11:42, Andrew Morton wrote: > > Andi Kleen <ak@muc.de> wrote: > > > > > > On Thursday 09 February 2006 19:04, Andrew Morton wrote: > > > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > > > > > The problem was with ACPI just simply looking at the namespace doesnt > > > > > exactly give us an idea of how many processors are possible in this platform. > > > > > > > > We need to fix this asap - the performance penalty for HOTPLUG_CPU=y, > > > > NR_CPUS=lots will be appreciable. > > > > > > What is this performance penalty exactly? > > > > All those for_each_cpu() loops will hit NR_CPUS cachelines instead of > > hweight(cpu_possible_map) cachelines. > > But are there any in real fast paths? iirc they are mostly in initialization, > where it doesn't matter too much. > Could be so. I just added one to percpu_counters though. And it'd be a pain to introduce a cpu notifier for each and every percpu_counter. From: Andrew Morton <akpm@osdl.org> Implement percpu_counter_sum(). This is a more accurate but slower version of percpu_counter_read_positive(). We need this for Alex's speedup-ext3_statfs patch. Otherwise it would be too inaccurate on large CPU counts. Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> --- include/linux/percpu_counter.h | 6 ++++++ mm/swap.c | 25 +++++++++++++++++++++++-- 2 files changed, 29 insertions(+), 2 deletions(-) diff -puN include/linux/percpu_counter.h~percpu_counter_sum include/linux/percpu_counter.h --- devel/include/linux/percpu_counter.h~percpu_counter_sum 2006-02-08 13:34:08.000000000 -0800 +++ devel-akpm/include/linux/percpu_counter.h 2006-02-08 13:34:08.000000000 -0800 @@ -39,6 +39,7 @@ static inline void percpu_counter_destro } void percpu_counter_mod(struct percpu_counter *fbc, long amount); +long percpu_counter_sum(struct percpu_counter *fbc); static inline long percpu_counter_read(struct percpu_counter *fbc) { @@ -92,6 +93,11 @@ static inline long percpu_counter_read_p return fbc->count; } +static inline long percpu_counter_sum(struct percpu_counter *fbc) +{ + return percpu_counter_read_positive(fbc); +} + #endif /* CONFIG_SMP */ static inline void percpu_counter_inc(struct percpu_counter *fbc) diff -puN mm/swap.c~percpu_counter_sum mm/swap.c --- devel/mm/swap.c~percpu_counter_sum 2006-02-08 13:34:08.000000000 -0800 +++ devel-akpm/mm/swap.c 2006-02-08 13:34:08.000000000 -0800 @@ -491,13 +491,34 @@ void percpu_counter_mod(struct percpu_co if (count >= FBC_BATCH || count <= -FBC_BATCH) { spin_lock(&fbc->lock); fbc->count += count; + *pcount = 0; spin_unlock(&fbc->lock); - count = 0; + } else { + *pcount = count; } - *pcount = count; put_cpu(); } EXPORT_SYMBOL(percpu_counter_mod); + +/* + * Add up all the per-cpu counts, return the result. This is a more accurate + * but much slower version of percpu_counter_read_positive() + */ +long percpu_counter_sum(struct percpu_counter *fbc) +{ + long ret; + int cpu; + + spin_lock(&fbc->lock); + ret = fbc->count; + for_each_cpu(cpu) { + long *pcount = per_cpu_ptr(fbc->counters, cpu); + ret += *pcount; + } + spin_unlock(&fbc->lock); + return ret < 0 ? 0 : ret; +} +EXPORT_SYMBOL(percpu_counter_sum); #endif /* _ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-08 22:31 ` Rik van Riel 2006-02-09 1:20 ` Rik van Riel 2006-02-09 3:05 ` Andrew Morton @ 2006-02-09 4:39 ` Eric Dumazet 2006-02-09 4:46 ` Nick Piggin 2 siblings, 1 reply; 58+ messages in thread From: Eric Dumazet @ 2006-02-09 4:39 UTC (permalink / raw) To: Rik van Riel; +Cc: Linux Kernel Mailing List, Linus Torvalds Rik van Riel a écrit : > On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote: > >> [PATCH] percpu data: only iterate over possible CPUs > > This sched.c bit breaks Xen, and probably also other architectures > that have CPU hotplug. I suspect the reason is that during early > bootup only the boot CPU is online, so nothing initialises the > runqueues for CPUs that are brought up afterwards. > > I suspect we can get rid of this problem quite easily by moving > runqueue initialisation to init_idle()... Please fix Xen to match include/linux/cpumask.h documentation that says : /* * The following particular system cpumasks and operations manage * possible, present and online cpus. Each of them is a fixed size * bitmap of size NR_CPUS. * * #ifdef CONFIG_HOTPLUG_CPU * cpu_possible_map - all NR_CPUS bits set * cpu_present_map - has bit 'cpu' set iff cpu is populated * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler * #else * cpu_possible_map - has bit 'cpu' set iff cpu is populated * cpu_present_map - copy of cpu_possible_map * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler * #endif */ Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 4:39 ` Eric Dumazet @ 2006-02-09 4:46 ` Nick Piggin 0 siblings, 0 replies; 58+ messages in thread From: Nick Piggin @ 2006-02-09 4:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: Rik van Riel, Linux Kernel Mailing List, Linus Torvalds Eric Dumazet wrote: > Rik van Riel a écrit : > >> On Sun, 5 Feb 2006, Linux Kernel Mailing List wrote: >> >>> [PATCH] percpu data: only iterate over possible CPUs >> >> >> This sched.c bit breaks Xen, and probably also other architectures >> that have CPU hotplug. I suspect the reason is that during early >> bootup only the boot CPU is online, so nothing initialises the >> runqueues for CPUs that are brought up afterwards. >> >> I suspect we can get rid of this problem quite easily by moving >> runqueue initialisation to init_idle()... > > > Please fix Xen to match include/linux/cpumask.h documentation that says : > > /* > * The following particular system cpumasks and operations manage > * possible, present and online cpus. Each of them is a fixed size > * bitmap of size NR_CPUS. > * > * #ifdef CONFIG_HOTPLUG_CPU > * cpu_possible_map - all NR_CPUS bits set Note that this shouldn't have to be all NR_CPUs if the platform can determine all possible hotpluggable CPUs. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs
@ 2006-02-09 8:32 Chuck Ebbert
2006-02-09 8:55 ` Paul Jackson
2006-02-09 9:06 ` Andrew Morton
0 siblings, 2 replies; 58+ messages in thread
From: Chuck Ebbert @ 2006-02-09 8:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Jackson, heiko.carstens, wli, ak, mingo, torvalds,
linux-kernel, riel, Eric Dumazet
In-Reply-To: <20060208204502.12513ae5.akpm@osdl.org>
On Wed, 8 Feb 2006 at 20:45:02 -0800, Andrew Morton wrote:
> > Its even documented in line 332 of include/linux/cpumask.h
> >
> > * #ifdef CONFIG_HOTPLUG_CPU
> > * cpu_possible_map - all NR_CPUS bits set
>
> That seems a quite bad idea. If we know which CPUs are possible we should
> populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is
> set.
I don't think that's, um, "possible." Even if you could discover how many
empty sockets there were in a system, someone might be able to hotplug
a board with more of them on it. And there's no way to tell how many CPUs
to reserve for each socket anyway, e.g. AMD has already announced quad-core
processors.
But what really surprised me is that for_each_cpu() actually walks
cpu_possible_map and not cpu_present_map as I had assumed. This violates
the Principle Of Least Surprise. Maybe renaming for_each_cpu to
for_each_possible_cpu might be a good idea?
--
Chuck
"Equations are the Devil's sentences." --Stephen Colbert
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 8:32 Chuck Ebbert @ 2006-02-09 8:55 ` Paul Jackson 2006-02-09 9:06 ` Andrew Morton 1 sibling, 0 replies; 58+ messages in thread From: Paul Jackson @ 2006-02-09 8:55 UTC (permalink / raw) To: Chuck Ebbert Cc: akpm, heiko.carstens, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Chuck wrote: > I don't think that's, um, "possible." Even if you could discover how many > empty sockets there were in a system, someone might be able to hotplug > a board with more of them on it. That's going to depend on your system hardware configuration. cpu_possible_map should be whatever is the largest set of cpus you might possibly want to deal with plugging in. Some hardware configurations will support more addition of hardware cpus than others. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 8:32 Chuck Ebbert 2006-02-09 8:55 ` Paul Jackson @ 2006-02-09 9:06 ` Andrew Morton 2006-02-09 9:11 ` Andrew Morton 1 sibling, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 9:06 UTC (permalink / raw) To: Chuck Ebbert Cc: pj, heiko.carstens, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Chuck Ebbert <76306.1226@compuserve.com> wrote: > > In-Reply-To: <20060208204502.12513ae5.akpm@osdl.org> > > On Wed, 8 Feb 2006 at 20:45:02 -0800, Andrew Morton wrote: > > > > Its even documented in line 332 of include/linux/cpumask.h > > > > > > * #ifdef CONFIG_HOTPLUG_CPU > > > * cpu_possible_map - all NR_CPUS bits set > > > > That seems a quite bad idea. If we know which CPUs are possible we should > > populate cpu_possible_map correctly, whether or not CONFIG_HOTPLUG_CPU is > > set. > > I don't think that's, um, "possible." Even if you could discover how many > empty sockets there were in a system, someone might be able to hotplug > a board with more of them on it. And there's no way to tell how many CPUs > to reserve for each socket anyway, e.g. AMD has already announced quad-core > processors. Well maybe. But it's awfully presumptuous for us to say that no platform will be capable of telling us what its maximum number of CPUs is, or even whether certain CPUs within that range aren't possible. <checks> Yup, on my 2-way x86 test box, with NR_CPUS=16 we have cpu_possible_map=0xffff. That's just insane - the default setting for a distro kernel should be (or will become) NR_CPUS=lots, HOTPLUG_CPU=y. All those for_each_cpu() loops are iterating across 16 CPUs. aargh. > But what really surprised me is that for_each_cpu() actually walks > cpu_possible_map and not cpu_present_map as I had assumed. This violates > the Principle Of Least Surprise. Maybe renaming for_each_cpu to > for_each_possible_cpu might be a good idea? That would make sense. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 9:06 ` Andrew Morton @ 2006-02-09 9:11 ` Andrew Morton 2006-02-09 10:08 ` Heiko Carstens 0 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 9:11 UTC (permalink / raw) To: 76306.1226, pj, heiko.carstens, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Andrew Morton <akpm@osdl.org> wrote: > > aargh. > Actually, x86 appears to be the only arch which suffers this braindamage. The rest use CPU_MASK_NONE (or just forget to initialise it and hope that CPU_MASK_NONE equals all-zeroes). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 9:11 ` Andrew Morton @ 2006-02-09 10:08 ` Heiko Carstens 2006-02-09 10:13 ` Andrew Morton 0 siblings, 1 reply; 58+ messages in thread From: Heiko Carstens @ 2006-02-09 10:08 UTC (permalink / raw) To: Andrew Morton Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 > > aargh. > Actually, x86 appears to be the only arch which suffers this braindamage. > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that > CPU_MASK_NONE equals all-zeroes). s390 will join, as soon as the cpu_possible_map fix is merged... Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 10:08 ` Heiko Carstens @ 2006-02-09 10:13 ` Andrew Morton 2006-02-09 10:23 ` Heiko Carstens 0 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 10:13 UTC (permalink / raw) To: Heiko Carstens Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > aargh. > > Actually, x86 appears to be the only arch which suffers this braindamage. > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that > > CPU_MASK_NONE equals all-zeroes). > > s390 will join, as soon as the cpu_possible_map fix is merged... What cpu_possible_map fix? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 10:13 ` Andrew Morton @ 2006-02-09 10:23 ` Heiko Carstens 2006-02-09 10:31 ` Andrew Morton 0 siblings, 1 reply; 58+ messages in thread From: Heiko Carstens @ 2006-02-09 10:23 UTC (permalink / raw) To: Andrew Morton Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 > > > > aargh. > > > Actually, x86 appears to be the only arch which suffers this braindamage. > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that > > > CPU_MASK_NONE equals all-zeroes). > > > > s390 will join, as soon as the cpu_possible_map fix is merged... > > What cpu_possible_map fix? This one: http://lkml.org/lkml/2006/2/8/162 Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 10:23 ` Heiko Carstens @ 2006-02-09 10:31 ` Andrew Morton 2006-02-09 11:47 ` Heiko Carstens 0 siblings, 1 reply; 58+ messages in thread From: Andrew Morton @ 2006-02-09 10:31 UTC (permalink / raw) To: Heiko Carstens Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > > > aargh. > > > > Actually, x86 appears to be the only arch which suffers this braindamage. > > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that > > > > CPU_MASK_NONE equals all-zeroes). > > > > > > s390 will join, as soon as the cpu_possible_map fix is merged... > > > > What cpu_possible_map fix? > > This one: > > http://lkml.org/lkml/2006/2/8/162 > Oh, OK. Ow, I don't think you want to do that. It means that all those for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether or not they're even possible. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 10:31 ` Andrew Morton @ 2006-02-09 11:47 ` Heiko Carstens 2006-02-09 12:46 ` Eric Dumazet 0 siblings, 1 reply; 58+ messages in thread From: Heiko Carstens @ 2006-02-09 11:47 UTC (permalink / raw) To: Andrew Morton Cc: 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 > > > > > Actually, x86 appears to be the only arch which suffers this braindamage. > > > > > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that > > > > > CPU_MASK_NONE equals all-zeroes). > > > > > > > > s390 will join, as soon as the cpu_possible_map fix is merged... > > > > > > What cpu_possible_map fix? > > > > This one: > > > > http://lkml.org/lkml/2006/2/8/162 > > > > Oh, OK. Ow, I don't think you want to do that. It means that all those > for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether > or not they're even possible. That's ok. We're mainly running under z/VM where you can attach new virtual cpus on the fly to the virtual machine (up to 64 cpus). The only difference to before is that it was possible to limit the waste of resources by passing a number with 'maxcpus'. This value was used to generate the cpu_possible_map. But since the map needs to be ready when we return from setup_arch, we don't have access to max_cpus, unless we parse commandline on our own... Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 11:47 ` Heiko Carstens @ 2006-02-09 12:46 ` Eric Dumazet 2006-02-09 13:12 ` Heiko Carstens 0 siblings, 1 reply; 58+ messages in thread From: Eric Dumazet @ 2006-02-09 12:46 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Heiko Carstens a écrit : >>>>> > Actually, x86 appears to be the only arch which suffers this braindamage. >>>>> > The rest use CPU_MASK_NONE (or just forget to initialise it and hope that >>>>> > CPU_MASK_NONE equals all-zeroes). >>>>> >>>>> s390 will join, as soon as the cpu_possible_map fix is merged... >>>> What cpu_possible_map fix? >>> This one: >>> >>> http://lkml.org/lkml/2006/2/8/162 >>> >> Oh, OK. Ow, I don't think you want to do that. It means that all those >> for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether >> or not they're even possible. > > That's ok. We're mainly running under z/VM where you can attach new virtual > cpus on the fly to the virtual machine (up to 64 cpus). > The only difference to before is that it was possible to limit the waste of > resources by passing a number with 'maxcpus'. This value was used to generate > the cpu_possible_map. > But since the map needs to be ready when we return from setup_arch, we don't > have access to max_cpus, unless we parse commandline on our own... > Then it's OK to clear bits from cpu_possible_map once you have max_cpus value for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++) cpu_clear(cpu, cpu_possible_map); ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 12:46 ` Eric Dumazet @ 2006-02-09 13:12 ` Heiko Carstens 2006-02-09 13:55 ` Eric Dumazet 0 siblings, 1 reply; 58+ messages in thread From: Heiko Carstens @ 2006-02-09 13:12 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 > >>Oh, OK. Ow, I don't think you want to do that. It means that all those > >>for_each_cpu() loops will now be iterating over all NR_CPUS cpus, whether > >>or not they're even possible. > >That's ok. We're mainly running under z/VM where you can attach new virtual > >cpus on the fly to the virtual machine (up to 64 cpus). > >The only difference to before is that it was possible to limit the waste of > >resources by passing a number with 'maxcpus'. This value was used to generate > >the cpu_possible_map. > >But since the map needs to be ready when we return from setup_arch, we don't > >have access to max_cpus, unless we parse commandline on our own... > > Then it's OK to clear bits from cpu_possible_map once you have max_cpus value > > for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++) > cpu_clear(cpu, cpu_possible_map); Hmm... I don't think the semantics of cpu_possible_map allow to change it. Also any code that uses for_each_cpu() may allocate memory _before_ cpu_possible_map is changed back to reflect a smaller number of cpus. Doesn't look like the correct way to fix this. Thinking a bit longer this was probably a reason why initialization of this map was done in smp_prepare_cpus() before it silently moved to setup_arch(). Heiko ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] percpu data: only iterate over possible CPUs 2006-02-09 13:12 ` Heiko Carstens @ 2006-02-09 13:55 ` Eric Dumazet 0 siblings, 0 replies; 58+ messages in thread From: Eric Dumazet @ 2006-02-09 13:55 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, 76306.1226, pj, wli, ak, mingo, torvalds, linux-kernel, riel, dada1 Heiko Carstens a écrit : >> for (cpu = max_cpus ; cpu < NR_CPUS ; cpu++) >> cpu_clear(cpu, cpu_possible_map); > > Hmm... I don't think the semantics of cpu_possible_map allow to change it. > Also any code that uses for_each_cpu() may allocate memory _before_ > cpu_possible_map is changed back to reflect a smaller number of cpus. > Doesn't look like the correct way to fix this. > Thinking a bit longer this was probably a reason why initialization of > this map was done in smp_prepare_cpus() before it silently moved to > setup_arch(). Hum... of course you may loose some bits of memory (percpu data for example) but clearing a cpu in cpu_possible_map is allowed. See arch/alpha/kernel/process.c and arch/x86_64/kernel/smpboot.c for uses of cpu_clear() ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2006-02-11 18:05 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200602051959.k15JxoHK001630@hera.kernel.org>
2006-02-07 15:15 ` [PATCH] percpu data: only iterate over possible CPUs Heiko Carstens
2006-02-07 15:31 ` Jens Axboe
2006-02-07 16:25 ` Eric Dumazet
2006-02-07 16:42 ` Linus Torvalds
2006-02-07 17:34 ` Andrew Morton
2006-02-07 17:48 ` Linus Torvalds
2006-02-07 18:30 ` Dipankar Sarma
2006-02-07 18:43 ` Linus Torvalds
2006-02-07 18:53 ` Dipankar Sarma
2006-02-07 19:11 ` Linus Torvalds
2006-02-08 4:40 ` Heiko Carstens
2006-02-08 8:55 ` Ivan Kokshaysky
2006-02-08 22:31 ` Rik van Riel
2006-02-09 1:20 ` Rik van Riel
2006-02-09 3:05 ` Andrew Morton
2006-02-09 3:08 ` Andrew Morton
2006-02-09 4:36 ` Eric Dumazet
2006-02-09 4:45 ` Andrew Morton
2006-02-09 4:56 ` Paul Jackson
2006-02-09 16:08 ` Nathan Lynch
2006-02-09 16:13 ` Heiko Carstens
2006-02-09 16:38 ` Rik van Riel
2006-02-09 16:59 ` Nathan Lynch
2006-02-09 17:37 ` Andi Kleen
2006-02-10 10:05 ` Heiko Carstens
2006-02-10 10:13 ` Andi Kleen
2006-02-11 14:49 ` Heiko Carstens
2006-02-11 18:04 ` Andi Kleen
2006-02-09 17:03 ` Ashok Raj
2006-02-09 17:23 ` Nathan Lynch
2006-02-09 18:04 ` Andrew Morton
2006-02-09 18:52 ` Ashok Raj
2006-02-09 20:37 ` Andrew Morton
2006-02-09 21:03 ` Ashok Raj
2006-02-10 10:02 ` Andi Kleen
2006-02-10 10:42 ` Andrew Morton
2006-02-10 11:09 ` Eric Dumazet
2006-02-10 11:23 ` Andrew Morton
2006-02-10 11:26 ` Andrew Morton
2006-02-10 14:13 ` Eric Dumazet
2006-02-11 0:10 ` Nathan Lynch
2006-02-11 0:25 ` Andrew Morton
2006-02-10 12:10 ` Andi Kleen
2006-02-10 19:08 ` Andrew Morton
2006-02-09 4:39 ` Eric Dumazet
2006-02-09 4:46 ` Nick Piggin
2006-02-09 8:32 Chuck Ebbert
2006-02-09 8:55 ` Paul Jackson
2006-02-09 9:06 ` Andrew Morton
2006-02-09 9:11 ` Andrew Morton
2006-02-09 10:08 ` Heiko Carstens
2006-02-09 10:13 ` Andrew Morton
2006-02-09 10:23 ` Heiko Carstens
2006-02-09 10:31 ` Andrew Morton
2006-02-09 11:47 ` Heiko Carstens
2006-02-09 12:46 ` Eric Dumazet
2006-02-09 13:12 ` Heiko Carstens
2006-02-09 13:55 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).