public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Bug] [2.6.18-rc5-mm1] system no boot early death  x86_64
@ 2006-09-07 18:20 keith mannthey
  2006-09-08  0:28 ` keith mannthey
  0 siblings, 1 reply; 4+ messages in thread
From: keith mannthey @ 2006-09-07 18:20 UTC (permalink / raw)
  To: lkml; +Cc: mel, andrew

Hello,
  I was booting rc4-mm3.  With rc5-mm1 I am hanging early... Mel I don't
know if this is related to your code but I will soon know. (I don't get
your debug info in early console.)  
  I was working on patches for the reserve based memory hot add path in
srat.c (the initial error is fixed by Mels patches but there is more to
do) and was just moving to rc5-mm1 to sync up and then more trouble.
This is with reserve based hot-add not enabled at the command line. 


Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0
(SUSE Linux)) #2 SMP Wed Sep 6 21:04:22 EDT 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
 BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
 BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
 BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
 BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
end_pfn_map = 18219008
kernel direct mapping tables up to 1160000000 @ 8000-4f000
DMI 2.3 present.
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 1 -> APIC 64 -> Node 1
SRAT: PXM 1 -> APIC 65 -> Node 1
SRAT: PXM 1 -> APIC 66 -> Node 1
SRAT: PXM 1 -> APIC 67 -> Node 1
SRAT: PXM 1 -> APIC 102 -> Node 1
SRAT: PXM 1 -> APIC 103 -> Node 1
SRAT: PXM 1 -> APIC 100 -> Node 1
SRAT: PXM 1 -> APIC 101 -> Node 1
SRAT: Node 0 PXM 0 0-80000000
SRAT: Node 0 PXM 0 0-470000000
SRAT: Node 1 PXM 1 1070000000-1160000000
Bootmem setup node 0 0000000000000000-0000000470000000




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug] [2.6.18-rc5-mm1] system no boot early death  x86_64
  2006-09-07 18:20 [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 keith mannthey
@ 2006-09-08  0:28 ` keith mannthey
  2006-09-08 10:40   ` Mel Gorman
  0 siblings, 1 reply; 4+ messages in thread
From: keith mannthey @ 2006-09-08  0:28 UTC (permalink / raw)
  To: lkml; +Cc: mel gorman, andrew, Andi Kleen

On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote:
> Hello,
>   I was booting rc4-mm3.  With rc5-mm1 I am hanging early... Mel I don't
> know if this is related to your code but I will soon know. (I don't get
> your debug info in early console.)  
>   I was working on patches for the reserve based memory hot add path in
> srat.c (the initial error is fixed by Mels patches but there is more to
> do) and was just moving to rc5-mm1 to sync up and then more trouble.
> This is with reserve based hot-add not enabled at the command line. 


Well this isn't fully adding up but here is what I found. 

If I drop 
x86_64-mm-drop-640k-reservation.patch
x86_64-mm-remove-e820-fallback.patch
and 
x86_64-mm-remove-e820-fallback-fix.patch

I build and boot.  All files in the series upto x86_64-mm-drop-640k-
reservation.patch work just fine.  Dropping this patch makes things
better. The e820 patches were removed to make the rest of the series
apply.  

It is not clear what changes would cause me to die setting up the
bootmem allocator on my first node... 

I know x86_64-mm-drop-640k-reservation.patch has been around for a
while.  

any ideas?

Thanks,
  Keith 

(from a working boot)

disabling early console
Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0
(SUSE Linux)) #13 SMP Thu Sep 7 19:15:00 EDT 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
 BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
 BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
 BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
 BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
end_pfn_map = 18219008
DMI 2.3 present.
ACPI: RSDP (v000 IBM                                   ) @
0x00000000000fdcf0
ACPI: RSDT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98800
ACPI: FADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98780
ACPI: MADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98600
ACPI: SRAT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff983c0
ACPI: HPET (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
0x000000007ff98380
ACPI: SSDT (v001 IBM    VIGSSDT0 0x00001000 INTL 0x20030122) @
0x000000007ff90780
ACPI: SSDT (v001 IBM    VIGSSDT1 0x00001000 INTL 0x20030122) @
0x000000007ff88bc0
ACPI: DSDT (v001 IBM    EXA01ZEU 0x00001000 INTL 0x20030122) @
0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 1 -> APIC 64 -> Node 1
SRAT: PXM 1 -> APIC 65 -> Node 1
SRAT: PXM 1 -> APIC 66 -> Node 1
SRAT: PXM 1 -> APIC 67 -> Node 1
SRAT: PXM 1 -> APIC 102 -> Node 1
SRAT: PXM 1 -> APIC 103 -> Node 1
SRAT: PXM 1 -> APIC 100 -> Node 1
SRAT: PXM 1 -> APIC 101 -> Node 1
SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-1160000000
Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
NUMA: Using 36 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000470000000
Bootmem setup node 1 0000001070000000-0000001160000000
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 -> 18219008
early_node_map[4] active PFN ranges
    0:        0 ->      152
    0:      256 ->   524165
    0:  1048576 ->  4653056
    1: 17235968 -> 18219008
On node 0 totalpages: 4128541




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug] [2.6.18-rc5-mm1] system no boot early death  x86_64
  2006-09-08  0:28 ` keith mannthey
@ 2006-09-08 10:40   ` Mel Gorman
  2006-09-09 13:41     ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Mel Gorman @ 2006-09-08 10:40 UTC (permalink / raw)
  To: keith mannthey; +Cc: lkml, andrew, Andi Kleen

On Thu, 7 Sep 2006, keith mannthey wrote:

> On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote:
>> Hello,
>>   I was booting rc4-mm3.  With rc5-mm1 I am hanging early... Mel I don't
>> know if this is related to your code but I will soon know. (I don't get
>> your debug info in early console.)
>>   I was working on patches for the reserve based memory hot add path in
>> srat.c (the initial error is fixed by Mels patches but there is more to
>> do)

That is some good news at least.

> and was just moving to rc5-mm1 to sync up and then more trouble.
>> This is with reserve based hot-add not enabled at the command line.
>
>
> Well this isn't fully adding up but here is what I found.
>
> If I drop
> x86_64-mm-drop-640k-reservation.patch
> x86_64-mm-remove-e820-fallback.patch
> and
> x86_64-mm-remove-e820-fallback-fix.patch
>
> I build and boot.  All files in the series upto x86_64-mm-drop-640k-
> reservation.patch work just fine.  Dropping this patch makes things
> better. The e820 patches were removed to make the rest of the series
> apply.
>

I am having trouble reproducing this. However, I recently got access to a 
machine similar to yours. I can say that sometimes the stability of 
2.6.18-rc4-mm3 and 2.6.18-rc5-mm1 was totally useless (but the symptons 
different to yours) and the box would easily crash for reasons I could not 
pin down. As stability problems had been reported on the machine earlier 
by other users, I was inclined to blame the hardware. Now I'm not sure.

> It is not clear what changes would cause me to die setting up the
> bootmem allocator on my first node...
>

Unless your machine really has something special in the low 640K that is 
required and bad things happen if it's written to at a bad time.

> I know x86_64-mm-drop-640k-reservation.patch has been around for a
> while.
>
> any ideas?
>

None so far, I'll keep hitting the machine I have to see if I can find 
something more useful but I'm not very optimistic I'll pin it down.

> Thanks,
>  Keith
>
> (from a working boot)
>
> disabling early console
> Linux version 2.6.18-rc5-mm1-smp (root@elm3a153) (gcc version 4.1.0
> (SUSE Linux)) #13 SMP Thu Sep 7 19:15:00 EDT 2006
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
> BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
> BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
> BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
> BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
> Entering add_active_range(0, 0, 152) 0 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
> Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
> Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
> end_pfn_map = 18219008
> DMI 2.3 present.
> ACPI: RSDP (v000 IBM                                   ) @
> 0x00000000000fdcf0
> ACPI: RSDT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
> 0x000000007ff98800
> ACPI: FADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
> 0x000000007ff98780
> ACPI: MADT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
> 0x000000007ff98600
> ACPI: SRAT (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
> 0x000000007ff983c0
> ACPI: HPET (v001 IBM    EXA01ZEU 0x00001000 IBM  0x45444f43) @
> 0x000000007ff98380
> ACPI: SSDT (v001 IBM    VIGSSDT0 0x00001000 INTL 0x20030122) @
> 0x000000007ff90780
> ACPI: SSDT (v001 IBM    VIGSSDT1 0x00001000 INTL 0x20030122) @
> 0x000000007ff88bc0
> ACPI: DSDT (v001 IBM    EXA01ZEU 0x00001000 INTL 0x20030122) @
> 0x0000000000000000
> SRAT: PXM 0 -> APIC 0 -> Node 0
> SRAT: PXM 0 -> APIC 1 -> Node 0
> SRAT: PXM 0 -> APIC 2 -> Node 0
> SRAT: PXM 0 -> APIC 3 -> Node 0
> SRAT: PXM 0 -> APIC 38 -> Node 0
> SRAT: PXM 0 -> APIC 39 -> Node 0
> SRAT: PXM 0 -> APIC 36 -> Node 0
> SRAT: PXM 0 -> APIC 37 -> Node 0
> SRAT: PXM 1 -> APIC 64 -> Node 1
> SRAT: PXM 1 -> APIC 65 -> Node 1
> SRAT: PXM 1 -> APIC 66 -> Node 1
> SRAT: PXM 1 -> APIC 67 -> Node 1
> SRAT: PXM 1 -> APIC 102 -> Node 1
> SRAT: PXM 1 -> APIC 103 -> Node 1
> SRAT: PXM 1 -> APIC 100 -> Node 1
> SRAT: PXM 1 -> APIC 101 -> Node 1
> SRAT: Node 0 PXM 0 0-80000000
> Entering add_active_range(0, 0, 152) 0 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
> SRAT: Node 0 PXM 0 0-470000000
> Entering add_active_range(0, 0, 152) 2 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
> Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
> SRAT: Node 1 PXM 1 1070000000-1160000000
> Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
> NUMA: Using 36 for the hash shift.
> Bootmem setup node 0 0000000000000000-0000000470000000
> Bootmem setup node 1 0000001070000000-0000001160000000
> Zone PFN ranges:
>  DMA             0 ->     4096
>  DMA32        4096 ->  1048576
>  Normal    1048576 -> 18219008
> early_node_map[4] active PFN ranges
>    0:        0 ->      152
>    0:      256 ->   524165
>    0:  1048576 ->  4653056
>    1: 17235968 -> 18219008
> On node 0 totalpages: 4128541
>
>
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug] [2.6.18-rc5-mm1] system no boot early death  x86_64
  2006-09-08 10:40   ` Mel Gorman
@ 2006-09-09 13:41     ` Andi Kleen
  0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2006-09-09 13:41 UTC (permalink / raw)
  To: Mel Gorman; +Cc: keith mannthey, lkml, andrew

On Friday 08 September 2006 12:40, Mel Gorman wrote:
> On Thu, 7 Sep 2006, keith mannthey wrote:
> > On Thu, 2006-09-07 at 11:20 -0700, keith mannthey wrote:
> >> Hello,
> >>   I was booting rc4-mm3.  With rc5-mm1 I am hanging early... Mel I don't
> >> know if this is related to your code but I will soon know. (I don't get
> >> your debug info in early console.)
> >>   I was working on patches for the reserve based memory hot add path in
> >> srat.c (the initial error is fixed by Mels patches but there is more to
> >> do)
>
> That is some good news at least.
>
> > and was just moving to rc5-mm1 to sync up and then more trouble.
> >
> >> This is with reserve based hot-add not enabled at the command line.
> >
> > Well this isn't fully adding up but here is what I found.
> >
> > If I drop
> > x86_64-mm-drop-640k-reservation.patch
> > x86_64-mm-remove-e820-fallback.patch
> > and
> > x86_64-mm-remove-e820-fallback-fix.patch
> >
> > I build and boot.  All files in the series upto x86_64-mm-drop-640k-
> > reservation.patch work just fine.  Dropping this patch makes things
> > better. The e820 patches were removed to make the rest of the series
> > apply.
>
> I am having trouble reproducing this. However, I recently got access to a
> machine similar to yours. I can say that sometimes the stability of
> 2.6.18-rc4-mm3 and 2.6.18-rc5-mm1 was totally useless (but the symptons
> different to yours) and the box would easily crash for reasons I could not
> pin down. As stability problems had been reported on the machine earlier
> by other users, I was inclined to blame the hardware. Now I'm not sure.
>
> > It is not clear what changes would cause me to die setting up the
> > bootmem allocator on my first node...
>
> Unless your machine really has something special in the low 640K that is
> required and bad things happen if it's written to at a bad time.

That would be a BIOS bug then.  If anything is there it has to be reserved.

But maybe it just breaks something that only worked by accident before.

-Andi

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-09-09 14:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-07 18:20 [Bug] [2.6.18-rc5-mm1] system no boot early death x86_64 keith mannthey
2006-09-08  0:28 ` keith mannthey
2006-09-08 10:40   ` Mel Gorman
2006-09-09 13:41     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox