* [BUG] safe_smp_process_id() uses apicid which exceeds NR_CPUs in array
@ 2006-06-12 22:38 Doug Thompson
2006-06-13 4:03 ` Andi Kleen
0 siblings, 1 reply; 2+ messages in thread
From: Doug Thompson @ 2006-06-12 22:38 UTC (permalink / raw)
To: Andi Kleen, linux-kernel
with 2.6.15 kernel running on a Tyan S4881 quad processor board (with factory BIOS)
using Opterons 254s, I received the following MCEs:
CPU 18: Machine Check Exception: 4 Bank 0: b601a00000000833
TSC 9c3799943459 ADDR 4eee07800
CPU 18: Machine Check Exception: 4 Bank 2: d000400000000863
TSC 9c3799943d01
CPU 18: Machine Check Exception: 4 Bank 4: d42dc00100000813
TSC 9c379994422d ADDR 4eee05708
It was later determined to be a bad memory stick, but the problem was
'CPU 18'. Running the same hardware with 2.6.17-rc6 produced MCEs with:
'CPU 2' messages instead
as the output. Thought problem fixed, BUT.....
looking at 2.6.17-rc6 safe_smp_processor_id() in arch/x86_64/kernel/smp.c (This
function is called by the MCE handler code):
int safe_smp_processor_id(void)
{
int apicid, i;
if (disable_apic)
return 0;
apicid = hard_smp_processor_id();
-----> if (x86_cpu_to_apicid[apicid] == apicid)
return apicid;
for (i = 0; i < NR_CPUS; ++i) {
if (x86_cpu_to_apicid[i] == apicid)
return i;
}
/* No entries in x86_cpu_to_apicid? Either no MPS|ACPI,
* or called too early. Either way, we must be CPU 0. */
if (x86_cpu_to_apicid[0] == BAD_APICID)
return 0;
return 0; /* Should not happen */
}
I noticed the: if (x86_cpu_to_apicid[apicid] == apicid)
above.
NR_CPUS was 4 and apicid could be: 16, 17 18, or 19
definitely an out-of-bounds reference.
doug thompson
portion of boot.mesg follows:
SRAT: PXM 0 -> APIC 16 -> Node 0
SRAT: PXM 1 -> APIC 17 -> Node 1
SRAT: PXM 2 -> APIC 18 -> Node 2
SRAT: PXM 3 -> APIC 19 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 0-d0000000
SRAT: Node 0 PXM 0 0-230000000
SRAT: Node 1 PXM 1 230000000-430000000
SRAT: Node 2 PXM 2 430000000-630000000
SRAT: Node 3 PXM 3 630000000-830000000
NUMA: Using 28 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000230000000
Bootmem setup node 1 0000000230000000-0000000430000000
Bootmem setup node 2 0000000430000000-0000000630000000
Bootmem setup node 3 0000000630000000-0000000830000000
On node 0 totalpages: 2063996
DMA zone: 2596 pages, LIFO batch:0
DMA32 zone: 833240 pages, LIFO batch:31
Normal zone: 1228160 pages, LIFO batch:31
On node 1 totalpages: 2068480
Normal zone: 2068480 pages, LIFO batch:31
On node 2 totalpages: 2068480
Normal zone: 2068480 pages, LIFO batch:31
On node 3 totalpages: 2068480
Normal zone: 2068480 pages, LIFO batch:31
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] enabled)
Processor #16 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x11] enabled)
Processor #17 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x12] enabled)
Processor #18 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x13] enabled)
Processor #19 15:5 APIC version 16
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [BUG] safe_smp_process_id() uses apicid which exceeds NR_CPUs in array
2006-06-12 22:38 [BUG] safe_smp_process_id() uses apicid which exceeds NR_CPUs in array Doug Thompson
@ 2006-06-13 4:03 ` Andi Kleen
0 siblings, 0 replies; 2+ messages in thread
From: Andi Kleen @ 2006-06-13 4:03 UTC (permalink / raw)
To: Doug Thompson; +Cc: linux-kernel
>
> I noticed the: if (x86_cpu_to_apicid[apicid] == apicid)
> above.
You're right - the fast check should either check for >= NR_CPUS
or just be removed and let it be done by the loop. I came up
with this patch.
Thanks.
-Andi
Fix fast check in safe_smp_processor_id
The APIC ID returned by hard_smp_processor_id can be beyond
NR_CPUS and then overflow the x86_cpu_to_apic[] array.
Add a check for overflow. If it happens then the slow loop below
will catch.
Bug pointed out by Doug Thompson
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux/arch/x86_64/kernel/smp.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smp.c
+++ linux/arch/x86_64/kernel/smp.c
@@ -520,13 +520,13 @@ asmlinkage void smp_call_function_interr
int safe_smp_processor_id(void)
{
- int apicid, i;
+ unsigned apicid, i;
if (disable_apic)
return 0;
apicid = hard_smp_processor_id();
- if (x86_cpu_to_apicid[apicid] == apicid)
+ if (apicid < NR_CPUS && x86_cpu_to_apicid[apicid] == apicid)
return apicid;
for (i = 0; i < NR_CPUS; ++i) {
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-06-13 4:04 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12 22:38 [BUG] safe_smp_process_id() uses apicid which exceeds NR_CPUs in array Doug Thompson
2006-06-13 4:03 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox