* RE: [Linux-ia64] [patch] logical CPU numbering
2003-03-20 19:03 [Linux-ia64] [patch] logical CPU numbering Martin Hicks
@ 2003-03-20 20:07 ` Wichmann, Mats D
2003-03-25 18:03 ` Martin Hicks
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Wichmann, Mats D @ 2003-03-20 20:07 UTC (permalink / raw)
To: linux-ia64
> If a CPU fails to start during smp_boot_cpus(), then the logical CPU
> numbering gets will have a "hole". Using the number of booted CPU's
> instead of the loop index will correct this.
Just curious, does having a hole cause a problem?
It might well; I don't know.
I consider some future scenario where if cpu X
didn't come up, it can be hot-replaced, at which
point one might actually want to have that "slot"
reserved for it.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Linux-ia64] [patch] logical CPU numbering
2003-03-20 19:03 [Linux-ia64] [patch] logical CPU numbering Martin Hicks
2003-03-20 20:07 ` Wichmann, Mats D
@ 2003-03-25 18:03 ` Martin Hicks
2003-04-03 18:31 ` Martin Hicks
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Martin Hicks @ 2003-03-25 18:03 UTC (permalink / raw)
To: linux-ia64
On Thu, Mar 20, 2003 at 12:07:32PM -0800, Wichmann, Mats D wrote:
>
> > If a CPU fails to start during smp_boot_cpus(), then the logical CPU
> > numbering gets will have a "hole". Using the number of booted CPU's
> > instead of the loop index will correct this.
>
> Just curious, does having a hole cause a problem?
> It might well; I don't know.
>
> I consider some future scenario where if cpu X
> didn't come up, it can be hot-replaced, at which
> point one might actually want to have that "slot"
> reserved for it.
I thought there was some code that depended on having logical cpus
numbered without holes. I can't find any evidence of it anymore, so
perhaps its not a bug.
mh
--
Wild Open Source Inc. mort@wildopensource.com
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Linux-ia64] [patch] logical CPU numbering
2003-03-20 19:03 [Linux-ia64] [patch] logical CPU numbering Martin Hicks
2003-03-20 20:07 ` Wichmann, Mats D
2003-03-25 18:03 ` Martin Hicks
@ 2003-04-03 18:31 ` Martin Hicks
2003-04-17 19:13 ` Bjorn Helgaas
2003-04-17 20:25 ` Martin Hicks
4 siblings, 0 replies; 6+ messages in thread
From: Martin Hicks @ 2003-04-03 18:31 UTC (permalink / raw)
To: linux-ia64
On Thu, Mar 20, 2003 at 02:03:57PM -0500, Martin Hicks wrote:
>
> Hello,
>
> If a CPU fails to start during smp_boot_cpus(), then the logical CPU
> numbering gets will have a "hole". Using the number of booted CPU's
> instead of the loop index will correct this.
>
> This patch is against 2.4.21-pre5.
This patch should be applied. I finally got around to doing some more
testing with it. If a CPU fails to start, currently we get messages
like the following for subsequent CPU's:
CPU 17: nasid 18, slice 0, cnode 9
CPU 17: base freq 0.000MHz, ITC ratio\x10/2, ITC freq\x1000.000MHz
Calibrating delay loop... 1494.72 BogoMIPS
phys CPU#17 (0x12) not responding - cannot use it. <<-BOGUS
The patch below fixes this problem. This is the same patch as before,
reposted just to make things easier. It is against
2.4.21-pre5-ia64-0303012.
Thanks,
mh
--
Wild Open Source Inc. mort@wildopensource.com
--- linux-2.4.21-pre5-ia64-030312.pristine/arch/ia64/kernel/smpboot.c Sun Mar 16 10:18:53 2003
+++ linux-2.4.21-pre5-ia64-030312/arch/ia64/kernel/smpboot.c Thu Mar 20 10:47:07 2003
@@ -522,7 +522,7 @@
/*
* Make sure we unmap all failed CPUs
*/
- if (ia64_cpu_to_sapicid[cpu] = -1)
+ if (ia64_cpu_to_sapicid[cpucount] = -1)
printk("phys CPU#%d not responding - cannot use it.\n", cpu);
}
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Linux-ia64] [patch] logical CPU numbering
2003-03-20 19:03 [Linux-ia64] [patch] logical CPU numbering Martin Hicks
` (2 preceding siblings ...)
2003-04-03 18:31 ` Martin Hicks
@ 2003-04-17 19:13 ` Bjorn Helgaas
2003-04-17 20:25 ` Martin Hicks
4 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2003-04-17 19:13 UTC (permalink / raw)
To: linux-ia64
On Thursday 03 April 2003 11:31 am, Martin Hicks wrote:
> This patch should be applied. I finally got around to doing some more
> testing with it. If a CPU fails to start, currently we get messages
> like the following for subsequent CPU's:
>
>
> CPU 17: nasid 18, slice 0, cnode 9
> CPU 17: base freq 0.000MHz, ITC ratio\x10/2, ITC freq\x1000.000MHz
> Calibrating delay loop... 1494.72 BogoMIPS
> phys CPU#17 (0x12) not responding - cannot use it. <<-BOGUS
I don't quite understand how this works. The current code is clearly
wrong, but if the AP in a 2-CPU system fails to start, won't the new
code print "phys CPU#0 not responding"? That doesn't seem accurate.
If we really need this printk, it seems like the logical place to put it
would be in do_boot_cpu(), where we already print the "Processor X/Y
is stuck" message. 2.5 seems to have just removed the "not responding"
printk, though, and I'd be inclined to do the same. Any objections?
> --- linux-2.4.21-pre5-ia64-030312.pristine/arch/ia64/kernel/smpboot.c Sun Mar 16 10:18:53 2003
> +++ linux-2.4.21-pre5-ia64-030312/arch/ia64/kernel/smpboot.c Thu Mar 20 10:47:07 2003
> @@ -522,7 +522,7 @@
> /*
> * Make sure we unmap all failed CPUs
> */
> - if (ia64_cpu_to_sapicid[cpu] = -1)
> + if (ia64_cpu_to_sapicid[cpucount] = -1)
> printk("phys CPU#%d not responding - cannot use it.\n", cpu);
> }
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Linux-ia64] [patch] logical CPU numbering
2003-03-20 19:03 [Linux-ia64] [patch] logical CPU numbering Martin Hicks
` (3 preceding siblings ...)
2003-04-17 19:13 ` Bjorn Helgaas
@ 2003-04-17 20:25 ` Martin Hicks
4 siblings, 0 replies; 6+ messages in thread
From: Martin Hicks @ 2003-04-17 20:25 UTC (permalink / raw)
To: linux-ia64
On Thu, Apr 17, 2003 at 01:13:44PM -0600, Bjorn Helgaas wrote:
> On Thursday 03 April 2003 11:31 am, Martin Hicks wrote:
> > This patch should be applied. I finally got around to doing some more
> > testing with it. If a CPU fails to start, currently we get messages
> > like the following for subsequent CPU's:
> >
> >
> > CPU 17: nasid 18, slice 0, cnode 9
> > CPU 17: base freq 0.000MHz, ITC ratio\x10/2, ITC freq\x1000.000MHz
> > Calibrating delay loop... 1494.72 BogoMIPS
> > phys CPU#17 (0x12) not responding - cannot use it. <<-BOGUS
>
> I don't quite understand how this works. The current code is clearly
> wrong, but if the AP in a 2-CPU system fails to start, won't the new
> code print "phys CPU#0 not responding"? That doesn't seem accurate.
>
> If we really need this printk, it seems like the logical place to put it
> would be in do_boot_cpu(), where we already print the "Processor X/Y
> is stuck" message. 2.5 seems to have just removed the "not responding"
> printk, though, and I'd be inclined to do the same. Any objections?
I have no objections.
mh
--
Wild Open Source Inc. mort@wildopensource.com
^ permalink raw reply [flat|nested] 6+ messages in thread