public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM
@ 2006-09-12 22:32 Robin Lee Powell
  2006-09-14 19:05 ` Same MCE on 4 working machines (was Re: Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM) Robin Lee Powell
  2006-09-18  7:50 ` Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM Andi Kleen
  0 siblings, 2 replies; 24+ messages in thread
From: Robin Lee Powell @ 2006-09-12 22:32 UTC (permalink / raw)
  To: linux-kernel


Please cc me on replies, as I'm not on the list.

I have some moderately old hosts that hang on boot, very early on,
with any kernel newer than 2.6.3.  Important basic facts about the
box are dual opteron 244s, 16gb of RAM, and it's a 64-bit build of
the kernel.  We've tried both MK8 and CONFIG_MK8 and
CONFIG_GENERIC_CPU to no avail.  Hell, I think I've tried just about
everything at this point.

In most cases the stopping point is:

[snip]
    Initializing CPU#0
[snip]
    CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
    CPU: L2 Cache: 1024K (64 bytes/line)
    CPU 0/0(1) -> Node 0 -> Core 0

(see below for complete version)

Whether that last line shows up or not is based on BIOS configs,
AFAICT.

I've been working on these for several days now, and I'm totally
stumped.  I'll generate any data any of you ask for that might be
useful, and I'll be appending a lot of data here.

The kernel I'm actually trying to get working is 2.6.17.11, but I've
tried all of the following:

linux-2.6.2
linux-2.6.3
linux-2.6.4
linux-2.6.6
linux-2.6.10
linux-2.6.17.11

Only the first two work.

For all of them except 2.6.17.11, the config was just the config
from the known-working 2.6.2, make oldconfig, hold down the return
button.  2.6.17.11 I played with a bit more.

The original 2.6.2 working .config is
http://teddyb.org/~rlpowell/media/regular/lkml/orig.config.txt

The current 2.6.17.11 .config is
http://teddyb.org/~rlpowell/media/regular/lkml/current.config.txt

The full boot with apci on is
http://teddyb.org/~rlpowell/media/regular/lkml/apci_boot.txt

The full boot with apci off is
http://teddyb.org/~rlpowell/media/regular/lkml/noapci_boot.txt

This version is rather different, as it ends in:

    HARDWARE ERROR
    CPU 0: Machine Check Exception:                7 Bank 3: b40000000000083b
    RIP 10:<ffffffff80446e3e> {pci_conf1_read+0xbe/0xf0}
    TSC 2e7932dbf8 ADDR fdfc000cfc
    This is not a software problem!
    Run through mcelog --ascii to decode and contact your hardware vendor
    Kernel panic - not syncing: Uncorrected machine check

We believe this to be spurious, as both machines of this type we've
tested showed the same error, and both of them have been running
with 2.6.2 for years.

The motherboard is, apparently, a RioWorks Rhapsody HDAMA, whatever that is.
We are not on the latest BIOS revision, but then the changes between 1.84
(which we're on) and 1.89 don't look relevant.

BIOS information is
http://teddyb.org/~rlpowell/media/regular/lkml/bios_info.txt

lspci -v (generated while up on the 2.6.2 kernel that's been running
for years) is
http://teddyb.org/~rlpowell/media/regular/lkml/lspci_v.txt

dmidecode is
http://teddyb.org/~rlpowell/media/regular/lkml/dmidecode.txt

-Robin

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-09-19 21:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-12 22:32 Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM Robin Lee Powell
2006-09-14 19:05 ` Same MCE on 4 working machines (was Re: Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM) Robin Lee Powell
2006-09-14 19:14   ` Lee Revell
2006-09-14 19:15     ` Robin Lee Powell
2006-09-15  5:42       ` Bharath Ramesh
2006-09-15 17:47         ` Robin Lee Powell
2006-09-15 18:07           ` Robin Lee Powell
2006-09-15 18:22           ` Alan Cox
2006-09-15 18:12             ` Robin Lee Powell
2006-09-18  7:52         ` Andi Kleen
2006-09-18 18:59           ` Robin Lee Powell
2006-09-15 11:45   ` Alan Cox
2006-09-15 18:29     ` Robin Lee Powell
2006-09-15 20:50       ` Alan Cox
2006-09-15 20:31         ` Robin Lee Powell
2006-09-15 23:18           ` Robin Lee Powell
2006-09-18  7:50 ` Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM Andi Kleen
2006-09-18 19:06   ` Robin Lee Powell
2006-09-18 23:58   ` Robin Lee Powell
2006-09-19  6:04     ` Andi Kleen
2006-09-19  6:28       ` Robin Lee Powell
2006-09-19  6:39         ` Andi Kleen
2006-09-19 17:46           ` Robin Lee Powell
2006-09-19 21:07             ` Robin Lee Powell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox