public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Opteron fatal machine check during PCI probe
@ 2004-06-17  0:06 Tim Hockin
  2004-06-17  0:23 ` Andi Kleen
  0 siblings, 1 reply; 2+ messages in thread
From: Tim Hockin @ 2004-06-17  0:06 UTC (permalink / raw)
  To: Linux Kernel mailing list; +Cc: ak

Hey all,

I have a couple dual Opteron boxen that consistently gets an MCE during
PCI probing. This is from linux-2.6.6, but the EXACT same scenario happens
on a 2.4.x kernel.

The MCE shows that the error is an IO read, with the address 0xfdfc000cfe.
The RIP points to pci_conf1_read(), when we try to inw() from the PCI data
register.

This is called during the PCI probing, and stops the kernel dead in it's
tracks.  The disassembly of the surrounding code is:

ffffffff802822c5:	89 ca                	mov    %ecx,%edx
ffffffff802822c7:	83 e2 02             	and    $0x2,%edx
ffffffff802822ca:	66 81 c2 fc 0c       	add    $0xcfc,%dx
ffffffff802822cf:	66 ed                	in     (%dx),%ax

This all seems legit to me.

What is interesting is that the address 0xfdfc000cfe is correct in the
low-order 16 bits.  The extra 0xfdfc000000 is what is puzzling to me, or
maybe it's a red herring.

I added a show_registers() to the MCE handler, and %rdx *really* is all
zeros, other than the 0xcfe.

If I disable MCE, then the system boot fine, and runs fine.

Anyone have any ideas?

Tim

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Opteron fatal machine check during PCI probe
  2004-06-17  0:06 Opteron fatal machine check during PCI probe Tim Hockin
@ 2004-06-17  0:23 ` Andi Kleen
  0 siblings, 0 replies; 2+ messages in thread
From: Andi Kleen @ 2004-06-17  0:23 UTC (permalink / raw)
  To: Tim Hockin; +Cc: linux-kernel

On Wed, 16 Jun 2004 17:06:02 -0700
Tim Hockin <thockin@hockin.org> wrote:

> Hey all,
> 
> I have a couple dual Opteron boxen that consistently gets an MCE during
> PCI probing. This is from linux-2.6.6, but the EXACT same scenario happens
> on a 2.4.x kernel.

> The MCE shows that the error is an IO read, with the address 0xfdfc000cfe.
> The RIP points to pci_conf1_read(), when we try to inw() from the PCI data
> register.

Is it an master abort (0x100 set in MC4_STATUS) ?
If yes it's an BIOS issue, the BIOS are supposed to disable that one.

> This is called during the PCI probing, and stops the kernel dead in it's
> tracks.  The disassembly of the surrounding code is:
> 
> ffffffff802822c5:	89 ca                	mov    %ecx,%edx
> ffffffff802822c7:	83 e2 02             	and    $0x2,%edx
> ffffffff802822ca:	66 81 c2 fc 0c       	add    $0xcfc,%dx
> ffffffff802822cf:	66 ed                	in     (%dx),%ax
> 
> This all seems legit to me.
> 
> What is interesting is that the address 0xfdfc000cfe is correct in the
> low-order 16 bits.  The extra 0xfdfc000000 is what is puzzling to me, or
> maybe it's a red herring.

It is. in only uses 16 bits of its operand.


-Andi

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-06-17  0:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-17  0:06 Opteron fatal machine check during PCI probe Tim Hockin
2004-06-17  0:23 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox