All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frank van Maarseveen <frankvm@frankvm.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Machine check exception with a kernel dependency
Date: Fri, 15 Feb 2008 15:50:07 +0100	[thread overview]
Message-ID: <20080215145007.GA18341@janus> (raw)
In-Reply-To: <20080215132241.23823d43@core>

On Fri, Feb 15, 2008 at 01:22:41PM +0000, Alan Cox wrote:
> On Wed, 13 Feb 2008 17:25:28 +0100
> Frank van Maarseveen <frankvm@frankvm.com> wrote:
> 
> > On at least two Dell optiplex 755 systems with a Core 2 Duo I get
> > 
> > Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0000000000000004 
> > Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0000000000000005 
> > Feb 13 15:14:01 inari Bank 0: b200004000000800 
> > Feb 13 15:14:01 inari Bank 5: b200221024080400 
> > 
> > 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
> > it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
> > didn't help either.
> 
> If you run the MCE numbers through a decoder what do you get back ?

I've some trouble decoding these in a convincing way. mcelog --core2
--ascii reports "MCG status:RIPV MCIP" for 0000000000000005 and "MCG
status:MCIP" for 0000000000000004.

I've collected several Bank # output lines:

#  text
---------------------------
26 Bank 0: b200004000000800
10 Bank 5: b200121014040400
 8 Bank 5: b200121020080400
 4 Bank 5: b200221010040400
 4 Bank 5: b200221024080400

but mcelog expects lines of the format

	CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx

(they got broken by netconsole) so I made these up:

CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400

result:

CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout)
STATUS b200004000000800 MCGSTATUS 4

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121014040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121020080400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5


The problem also exists on an entirely different Xeon system with 4 cores:

cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           X3210  @ 2.13GHz
stepping        : 11


-- 
Frank

      reply	other threads:[~2008-02-15 14:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-13 16:25 Machine check exception with a kernel dependency Frank van Maarseveen
2008-02-14 14:54 ` 2.6.24 sysprof induced MCE on Core 2 Duo (was: Machine check exception with a kernel dependency) Frank van Maarseveen
2008-02-15 13:22 ` Machine check exception with a kernel dependency Alan Cox
2008-02-15 14:50   ` Frank van Maarseveen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080215145007.GA18341@janus \
    --to=frankvm@frankvm.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.