public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Opteron fatal machine check during PCI probe
@ 2004-06-17  0:06 Tim Hockin
  2004-06-17  0:23 ` Andi Kleen
  0 siblings, 1 reply; 2+ messages in thread
From: Tim Hockin @ 2004-06-17  0:06 UTC (permalink / raw)
  To: Linux Kernel mailing list; +Cc: ak

Hey all,

I have a couple dual Opteron boxen that consistently gets an MCE during
PCI probing. This is from linux-2.6.6, but the EXACT same scenario happens
on a 2.4.x kernel.

The MCE shows that the error is an IO read, with the address 0xfdfc000cfe.
The RIP points to pci_conf1_read(), when we try to inw() from the PCI data
register.

This is called during the PCI probing, and stops the kernel dead in it's
tracks.  The disassembly of the surrounding code is:

ffffffff802822c5:	89 ca                	mov    %ecx,%edx
ffffffff802822c7:	83 e2 02             	and    $0x2,%edx
ffffffff802822ca:	66 81 c2 fc 0c       	add    $0xcfc,%dx
ffffffff802822cf:	66 ed                	in     (%dx),%ax

This all seems legit to me.

What is interesting is that the address 0xfdfc000cfe is correct in the
low-order 16 bits.  The extra 0xfdfc000000 is what is puzzling to me, or
maybe it's a red herring.

I added a show_registers() to the MCE handler, and %rdx *really* is all
zeros, other than the 0xcfe.

If I disable MCE, then the system boot fine, and runs fine.

Anyone have any ideas?

Tim

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-06-17  0:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-17  0:06 Opteron fatal machine check during PCI probe Tim Hockin
2004-06-17  0:23 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox