* [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 @ 2006-09-07 18:20 keith mannthey 2006-09-08 0:28 ` keith mannthey 0 siblings, 1 reply; 4+ messages in thread From: keith mannthey @ 2006-09-07 18:20 UTC (permalink / raw) To: lkml; +Cc: mel, andrew Hello, I was booting rc4-mm3. With rc5-mm1 I am hanging early... Mel I don't know if this is related to your code but I will soon know. (I don't get your debug info in early console.) I was working on patches for the reserve based memory hot add path in srat.c (the initial error is fixed by Mels patches but there is more to do) and was just moving to rc5-mm1 to sync up and then more trouble. This is with reserve based hot-add not enabled at the command line. Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Wed Sep 6 21:04:22 EDT 2006 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000098400 (usable) BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable) BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data) BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000470000000 (usable) BIOS-e820: 0000001070000000 - 0000001160000000 (usable) end_pfn_map = 18219008 kernel direct mapping tables up to 1160000000 @ 8000-4f000 DMI 2.3 present. SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 0 -> APIC 1 -> Node 0 SRAT: PXM 0 -> APIC 2 -> Node 0 SRAT: PXM 0 -> APIC 3 -> Node 0 SRAT: PXM 0 -> APIC 38 -> Node 0 SRAT: PXM 0 -> APIC 39 -> Node 0 SRAT: PXM 0 -> APIC 36 -> Node 0 SRAT: PXM 0 -> APIC 37 -> Node 0 SRAT: PXM 1 -> APIC 64 -> Node 1 SRAT: PXM 1 -> APIC 65 -> Node 1 SRAT: PXM 1 -> APIC 66 -> Node 1 SRAT: PXM 1 -> APIC 67 -> Node 1 SRAT: PXM 1 -> APIC 102 -> Node 1 SRAT: PXM 1 -> APIC 103 -> Node 1 SRAT: PXM 1 -> APIC 100 -> Node 1 SRAT: PXM 1 -> APIC 101 -> Node 1 SRAT: Node 0 PXM 0 0-80000000 SRAT: Node 0 PXM 0 0-470000000 SRAT: Node 1 PXM 1 1070000000-1160000000 Bootmem setup node 0 0000000000000000-0000000470000000 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 2006-09-07 18:20 [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 keith mannthey @ 2006-09-08 0:28 ` keith mannthey 2006-09-08 10:40 ` Mel Gorman 0 siblings, 1 reply; 4+ messages in thread From: keith mannthey @ 2006-09-08 0:28 UTC (permalink / raw) To: lkml; +Cc: mel gorman, andrew, Andi Kleen On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote: > Hello, > I was booting rc4-mm3. With rc5-mm1 I am hanging early... Mel I don't > know if this is related to your code but I will soon know. (I don't get > your debug info in early console.) > I was working on patches for the reserve based memory hot add path in > srat.c (the initial error is fixed by Mels patches but there is more to > do) and was just moving to rc5-mm1 to sync up and then more trouble. > This is with reserve based hot-add not enabled at the command line. Well this isn't fully adding up but here is what I found. If I drop x86_64-mm-drop-640k-reservation.patch x86_64-mm-remove-e820-fallback.patch and x86_64-mm-remove-e820-fallback-fix.patch I build and boot. All files in the series upto x86_64-mm-drop-640k- reservation.patch work just fine. Dropping this patch makes things better. The e820 patches were removed to make the rest of the series apply. It is not clear what changes would cause me to die setting up the bootmem allocator on my first node... I know x86_64-mm-drop-640k-reservation.patch has been around for a while. any ideas? Thanks, Keith (from a working boot) disabling early console Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0 (SUSE Linux)) #13 SMP Thu Sep 7 19:15:00 EDT 2006 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000098400 (usable) BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable) BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data) BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000470000000 (usable) BIOS-e820: 0000001070000000 - 0000001160000000 (usable) Entering add_active_range(0, 0, 152) 0 entries of 3200 used Entering add_active_range(0, 256, 524165) 1 entries of 3200 used Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used end_pfn_map = 18219008 DMI 2.3 present. ACPI: RSDP (v000 IBM ) @ 0x00000000000fdcf0 ACPI: RSDT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x000000007ff98800 ACPI: FADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x000000007ff98780 ACPI: MADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x000000007ff98600 ACPI: SRAT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x000000007ff983c0 ACPI: HPET (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x000000007ff98380 ACPI: SSDT (v001 IBM VIGSSDT0 0x00001000 INTL 0x20030122) @ 0x000000007ff90780 ACPI: SSDT (v001 IBM VIGSSDT1 0x00001000 INTL 0x20030122) @ 0x000000007ff88bc0 ACPI: DSDT (v001 IBM EXA01ZEU 0x00001000 INTL 0x20030122) @ 0x0000000000000000 SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 0 -> APIC 1 -> Node 0 SRAT: PXM 0 -> APIC 2 -> Node 0 SRAT: PXM 0 -> APIC 3 -> Node 0 SRAT: PXM 0 -> APIC 38 -> Node 0 SRAT: PXM 0 -> APIC 39 -> Node 0 SRAT: PXM 0 -> APIC 36 -> Node 0 SRAT: PXM 0 -> APIC 37 -> Node 0 SRAT: PXM 1 -> APIC 64 -> Node 1 SRAT: PXM 1 -> APIC 65 -> Node 1 SRAT: PXM 1 -> APIC 66 -> Node 1 SRAT: PXM 1 -> APIC 67 -> Node 1 SRAT: PXM 1 -> APIC 102 -> Node 1 SRAT: PXM 1 -> APIC 103 -> Node 1 SRAT: PXM 1 -> APIC 100 -> Node 1 SRAT: PXM 1 -> APIC 101 -> Node 1 SRAT: Node 0 PXM 0 0-80000000 Entering add_active_range(0, 0, 152) 0 entries of 3200 used Entering add_active_range(0, 256, 524165) 1 entries of 3200 used SRAT: Node 0 PXM 0 0-470000000 Entering add_active_range(0, 0, 152) 2 entries of 3200 used Entering add_active_range(0, 256, 524165) 2 entries of 3200 used Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used SRAT: Node 1 PXM 1 1070000000-1160000000 Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used NUMA: Using 36 for the hash shift. Bootmem setup node 0 0000000000000000-0000000470000000 Bootmem setup node 1 0000001070000000-0000001160000000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 18219008 early_node_map[4] active PFN ranges 0: 0 -> 152 0: 256 -> 524165 0: 1048576 -> 4653056 1: 17235968 -> 18219008 On node 0 totalpages: 4128541 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 2006-09-08 0:28 ` keith mannthey @ 2006-09-08 10:40 ` Mel Gorman 2006-09-09 13:41 ` Andi Kleen 0 siblings, 1 reply; 4+ messages in thread From: Mel Gorman @ 2006-09-08 10:40 UTC (permalink / raw) To: keith mannthey; +Cc: lkml, andrew, Andi Kleen On Thu, 7 Sep 2006, keith mannthey wrote: > On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote: >> Hello, >> I was booting rc4-mm3. With rc5-mm1 I am hanging early... Mel I don't >> know if this is related to your code but I will soon know. (I don't get >> your debug info in early console.) >> I was working on patches for the reserve based memory hot add path in >> srat.c (the initial error is fixed by Mels patches but there is more to >> do) That is some good news at least. > and was just moving to rc5-mm1 to sync up and then more trouble. >> This is with reserve based hot-add not enabled at the command line. > > > Well this isn't fully adding up but here is what I found. > > If I drop > x86_64-mm-drop-640k-reservation.patch > x86_64-mm-remove-e820-fallback.patch > and > x86_64-mm-remove-e820-fallback-fix.patch > > I build and boot. All files in the series upto x86_64-mm-drop-640k- > reservation.patch work just fine. Dropping this patch makes things > better. The e820 patches were removed to make the rest of the series > apply. > I am having trouble reproducing this. However, I recently got access to a machine similar to yours. I can say that sometimes the stability of 2.6.18-rc4-mm3 and 2.6.18-rc5-mm1 was totally useless (but the symptons different to yours) and the box would easily crash for reasons I could not pin down. As stability problems had been reported on the machine earlier by other users, I was inclined to blame the hardware. Now I'm not sure. > It is not clear what changes would cause me to die setting up the > bootmem allocator on my first node... > Unless your machine really has something special in the low 640K that is required and bad things happen if it's written to at a bad time. > I know x86_64-mm-drop-640k-reservation.patch has been around for a > while. > > any ideas? > None so far, I'll keep hitting the machine I have to see if I can find something more useful but I'm not very optimistic I'll pin it down. > Thanks, > Keith > > (from a working boot) > > disabling early console > Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0 > (SUSE Linux)) #13 SMP Thu Sep 7 19:15:00 EDT 2006 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 0000000000098400 (usable) > BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable) > BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data) > BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved) > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) > BIOS-e820: 0000000100000000 - 0000000470000000 (usable) > BIOS-e820: 0000001070000000 - 0000001160000000 (usable) > Entering add_active_range(0, 0, 152) 0 entries of 3200 used > Entering add_active_range(0, 256, 524165) 1 entries of 3200 used > Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used > Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used > end_pfn_map = 18219008 > DMI 2.3 present. > ACPI: RSDP (v000 IBM ) @ > 0x00000000000fdcf0 > ACPI: RSDT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ > 0x000000007ff98800 > ACPI: FADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ > 0x000000007ff98780 > ACPI: MADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ > 0x000000007ff98600 > ACPI: SRAT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ > 0x000000007ff983c0 > ACPI: HPET (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ > 0x000000007ff98380 > ACPI: SSDT (v001 IBM VIGSSDT0 0x00001000 INTL 0x20030122) @ > 0x000000007ff90780 > ACPI: SSDT (v001 IBM VIGSSDT1 0x00001000 INTL 0x20030122) @ > 0x000000007ff88bc0 > ACPI: DSDT (v001 IBM EXA01ZEU 0x00001000 INTL 0x20030122) @ > 0x0000000000000000 > SRAT: PXM 0 -> APIC 0 -> Node 0 > SRAT: PXM 0 -> APIC 1 -> Node 0 > SRAT: PXM 0 -> APIC 2 -> Node 0 > SRAT: PXM 0 -> APIC 3 -> Node 0 > SRAT: PXM 0 -> APIC 38 -> Node 0 > SRAT: PXM 0 -> APIC 39 -> Node 0 > SRAT: PXM 0 -> APIC 36 -> Node 0 > SRAT: PXM 0 -> APIC 37 -> Node 0 > SRAT: PXM 1 -> APIC 64 -> Node 1 > SRAT: PXM 1 -> APIC 65 -> Node 1 > SRAT: PXM 1 -> APIC 66 -> Node 1 > SRAT: PXM 1 -> APIC 67 -> Node 1 > SRAT: PXM 1 -> APIC 102 -> Node 1 > SRAT: PXM 1 -> APIC 103 -> Node 1 > SRAT: PXM 1 -> APIC 100 -> Node 1 > SRAT: PXM 1 -> APIC 101 -> Node 1 > SRAT: Node 0 PXM 0 0-80000000 > Entering add_active_range(0, 0, 152) 0 entries of 3200 used > Entering add_active_range(0, 256, 524165) 1 entries of 3200 used > SRAT: Node 0 PXM 0 0-470000000 > Entering add_active_range(0, 0, 152) 2 entries of 3200 used > Entering add_active_range(0, 256, 524165) 2 entries of 3200 used > Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used > SRAT: Node 1 PXM 1 1070000000-1160000000 > Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used > NUMA: Using 36 for the hash shift. > Bootmem setup node 0 0000000000000000-0000000470000000 > Bootmem setup node 1 0000001070000000-0000001160000000 > Zone PFN ranges: > DMA 0 -> 4096 > DMA32 4096 -> 1048576 > Normal 1048576 -> 18219008 > early_node_map[4] active PFN ranges > 0: 0 -> 152 > 0: 256 -> 524165 > 0: 1048576 -> 4653056 > 1: 17235968 -> 18219008 > On node 0 totalpages: 4128541 > > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 2006-09-08 10:40 ` Mel Gorman @ 2006-09-09 13:41 ` Andi Kleen 0 siblings, 0 replies; 4+ messages in thread From: Andi Kleen @ 2006-09-09 13:41 UTC (permalink / raw) To: Mel Gorman; +Cc: keith mannthey, lkml, andrew On Friday 08 September 2006 12:40, Mel Gorman wrote: > On Thu, 7 Sep 2006, keith mannthey wrote: > > On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote: > >> Hello, > >> I was booting rc4-mm3. With rc5-mm1 I am hanging early... Mel I don't > >> know if this is related to your code but I will soon know. (I don't get > >> your debug info in early console.) > >> I was working on patches for the reserve based memory hot add path in > >> srat.c (the initial error is fixed by Mels patches but there is more to > >> do) > > That is some good news at least. > > > and was just moving to rc5-mm1 to sync up and then more trouble. > > > >> This is with reserve based hot-add not enabled at the command line. > > > > Well this isn't fully adding up but here is what I found. > > > > If I drop > > x86_64-mm-drop-640k-reservation.patch > > x86_64-mm-remove-e820-fallback.patch > > and > > x86_64-mm-remove-e820-fallback-fix.patch > > > > I build and boot. All files in the series upto x86_64-mm-drop-640k- > > reservation.patch work just fine. Dropping this patch makes things > > better. The e820 patches were removed to make the rest of the series > > apply. > > I am having trouble reproducing this. However, I recently got access to a > machine similar to yours. I can say that sometimes the stability of > 2.6.18-rc4-mm3 and 2.6.18-rc5-mm1 was totally useless (but the symptons > different to yours) and the box would easily crash for reasons I could not > pin down. As stability problems had been reported on the machine earlier > by other users, I was inclined to blame the hardware. Now I'm not sure. > > > It is not clear what changes would cause me to die setting up the > > bootmem allocator on my first node... > > Unless your machine really has something special in the low 640K that is > required and bad things happen if it's written to at a bad time. That would be a BIOS bug then. If anything is there it has to be reserved. But maybe it just breaks something that only worked by accident before. -Andi ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-09-09 14:46 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-07 18:20 [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 keith mannthey 2006-09-08 0:28 ` keith mannthey 2006-09-08 10:40 ` Mel Gorman 2006-09-09 13:41 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox