public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthias Graf <matthias.graf@st.ovgu.de>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: PROBLEM: Fatal Machine Check >= 3.13.5-101.fc19.x86_64
Date: Fri, 21 Mar 2014 20:49:51 +0100	[thread overview]
Message-ID: <532C97DF.9010201@st.ovgu.de> (raw)
In-Reply-To: <20140321172742.GA2846@pd.tnic>


[-- Attachment #1.1: Type: text/plain, Size: 4299 bytes --]

(Please CC me on all replies)

mcelog output for all mces:



Hardware event. This is not a software error.
CPU 3 BANK 0
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5


Hardware event. This is not a software error.
CPU 3 BANK 5
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220024080400 MCGSTATUS 5


Hardware event. This is not a software error.
CPU 1 BANK 0
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4


Hardware event. This is not a software error.
CPU 1 BANK 5
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220010040400 MCGSTATUS 4


Hardware event. This is not a software error.
CPU 2 BANK 0
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4


Hardware event. This is not a software error.
CPU 2 BANK 5
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 4

Hardware event. This is not a software error.
CPU 0 BANK 5
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5


Hardware event. This is not a software error.
CPU 0 BANK 0
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5



Am 21.03.2014 18:27, schrieb Borislav Petkov:
> On Fri, Mar 21, 2014 at 06:10:23PM +0100, Matthias Graf wrote:
>> Please CC me on replies.
>>
>> [1.] Kernel panic: Fatal Machine Check after booting >=
>> 3.13.5-101.fc19.x86_64; 3.12.11-201.fc19.x86_64 works fine!
>> [2.] Screen freezes a few seconds after Gnome appears. The error message
>> (see attachement) is seldom still printed to the screen. Booting
>> 3.12.11-201 with otherwise the same setup, I do not see the panic.
>> Booting on different hardware (my laptop) does not produce the panic. I
>> also notice low frames per seconds after gnome started up, right before
>> the panic occures. I therefore suppose this is graphics hardware related.
>> [3.] Fatal Machine Check Exception, RIP Inexact, apic_timer_interrupt,
>> Kernel panic
>> [4.] 3.13.6-100.fc19.x86_64 && 3.13.5-103.fc19.x86 && 3.13.5-101.fc19.x86_64
>> [5.] OCRed: (see Attachement for photo)
>>
>> Started Accounts Service.
>> [ 34.348483] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 8: bZ88884888888888
>> [ 44.468168] mce: [Hardware Error]: HIP ?IHEXfiCT? 18:<ffffffff816881f8> {apicgtimer_interrupt+8x8/8x88}
>> I 44.468168] mce: [Hardware Error]: TSC 36S??8ad8c
>> f 44.468168] mce: [Hardware Error]: PROCESSOR 8:6fb TIM 138471666? SOCKET 8 HPIC 2 microcode ba
>> I 44.468168] mce: [Hardware Error]: Run the above through 'mcelog ~~ascii’
> 
> This looks like you had some text recognition done on the jpeg. :-)
> 
> Please correct the error message to be exactly as in the jpeg and run it
> through mcelog --ascii to see what that bank 8 is trying to tell us.
> 
> Thanks.
> 

[-- Attachment #1.2: mce.txt --]
[-- Type: text/plain, Size: 3215 bytes --]

[ 34.348483] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779ad0c
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 2 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: b200220024080400
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779ad0c
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 2 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 1: Machine Check Exception: 4 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: TSC 365779ad42
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 3 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 1: Machine Check Exception: 4 Bank 5: b200220010040400
[ 44.468168] mce: [Hardware Error]: TSC 365779ad42
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 3 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: TSC 365779aeaa
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 1 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 5: b200221010040400
[ 44.468168] mce: [Hardware Error]: TSC 365779aeaa
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 1 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: b200221024080400
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779aece
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 0 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779aece
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 0 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 44.468168] Kernel panic — not syncing: Fatal Machine check
[ 44.468168] drm_kms_helper: panic occurred, switching back to text console
[ 44.468168] Rebooting in 30 seconds..

[-- Attachment #1.3: mcelog.txt --]
[-- Type: text/plain, Size: 2486 bytes --]

Hardware event. This is not a software error.
CPU 3 BANK 0 
MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5


Hardware event. This is not a software error.
CPU 3 BANK 5 
MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220024080400 MCGSTATUS 5


Hardware event. This is not a software error.
CPU 1 BANK 0 
MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4


Hardware event. This is not a software error.
CPU 1 BANK 5 
MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220010040400 MCGSTATUS 4


Hardware event. This is not a software error.
CPU 2 BANK 0 
MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4


Hardware event. This is not a software error.
CPU 2 BANK 5 
MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 4

Hardware event. This is not a software error.
CPU 0 BANK 5 
MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5


Hardware event. This is not a software error.
CPU 0 BANK 0 
MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

  reply	other threads:[~2014-03-21 19:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-21 17:10 PROBLEM: Fatal Machine Check >= 3.13.5-101.fc19.x86_64 Matthias Graf
2014-03-21 17:27 ` Borislav Petkov
2014-03-21 19:49   ` Matthias Graf [this message]
2014-03-21 20:13     ` Borislav Petkov
2014-03-21 20:35       ` Matthias Graf
2014-03-21 20:37       ` Tony Luck
2014-03-24 17:22       ` Matthias Graf
2014-04-02 14:14         ` Matthias Graf
2014-04-16 14:22           ` Borislav Petkov
2014-04-17  6:25             ` Matthias Graf
2014-04-17 13:02               ` Borislav Petkov
2014-04-18  9:17                 ` Matthias Graf
2014-04-18  9:45                   ` Borislav Petkov
2014-04-18 11:45                     ` Matthias Graf
2014-04-18 12:40                       ` Borislav Petkov
2014-04-18 13:08                       ` Deucher, Alexander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=532C97DF.9010201@st.ovgu.de \
    --to=matthias.graf@st.ovgu.de \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox