From: Matthias Graf <matthias.graf@st.ovgu.de>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: PROBLEM: Fatal Machine Check >= 3.13.5-101.fc19.x86_64
Date: Fri, 21 Mar 2014 20:49:51 +0100 [thread overview]
Message-ID: <532C97DF.9010201@st.ovgu.de> (raw)
In-Reply-To: <20140321172742.GA2846@pd.tnic>
[-- Attachment #1.1: Type: text/plain, Size: 4299 bytes --]
(Please CC me on all replies)
mcelog output for all mces:
Hardware event. This is not a software error.
CPU 3 BANK 0
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5
Hardware event. This is not a software error.
CPU 3 BANK 5
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220024080400 MCGSTATUS 5
Hardware event. This is not a software error.
CPU 1 BANK 0
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 1 BANK 5
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220010040400 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 2 BANK 0
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 2 BANK 5
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 0 BANK 5
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5
Hardware event. This is not a software error.
CPU 0 BANK 0
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access
Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5
Am 21.03.2014 18:27, schrieb Borislav Petkov:
> On Fri, Mar 21, 2014 at 06:10:23PM +0100, Matthias Graf wrote:
>> Please CC me on replies.
>>
>> [1.] Kernel panic: Fatal Machine Check after booting >=
>> 3.13.5-101.fc19.x86_64; 3.12.11-201.fc19.x86_64 works fine!
>> [2.] Screen freezes a few seconds after Gnome appears. The error message
>> (see attachement) is seldom still printed to the screen. Booting
>> 3.12.11-201 with otherwise the same setup, I do not see the panic.
>> Booting on different hardware (my laptop) does not produce the panic. I
>> also notice low frames per seconds after gnome started up, right before
>> the panic occures. I therefore suppose this is graphics hardware related.
>> [3.] Fatal Machine Check Exception, RIP Inexact, apic_timer_interrupt,
>> Kernel panic
>> [4.] 3.13.6-100.fc19.x86_64 && 3.13.5-103.fc19.x86 && 3.13.5-101.fc19.x86_64
>> [5.] OCRed: (see Attachement for photo)
>>
>> Started Accounts Service.
>> [ 34.348483] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 8: bZ88884888888888
>> [ 44.468168] mce: [Hardware Error]: HIP ?IHEXfiCT? 18:<ffffffff816881f8> {apicgtimer_interrupt+8x8/8x88}
>> I 44.468168] mce: [Hardware Error]: TSC 36S??8ad8c
>> f 44.468168] mce: [Hardware Error]: PROCESSOR 8:6fb TIM 138471666? SOCKET 8 HPIC 2 microcode ba
>> I 44.468168] mce: [Hardware Error]: Run the above through 'mcelog ~~ascii’
>
> This looks like you had some text recognition done on the jpeg. :-)
>
> Please correct the error message to be exactly as in the jpeg and run it
> through mcelog --ascii to see what that bank 8 is trying to tell us.
>
> Thanks.
>
[-- Attachment #1.2: mce.txt --]
[-- Type: text/plain, Size: 3215 bytes --]
[ 34.348483] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779ad0c
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 2 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: b200220024080400
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779ad0c
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 2 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 1: Machine Check Exception: 4 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: TSC 365779ad42
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 3 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 1: Machine Check Exception: 4 Bank 5: b200220010040400
[ 44.468168] mce: [Hardware Error]: TSC 365779ad42
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 3 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: TSC 365779aeaa
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 1 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 5: b200221010040400
[ 44.468168] mce: [Hardware Error]: TSC 365779aeaa
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 1 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: b200221024080400
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779aece
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 0 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 0: b200004000000800
[ 44.468168] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff816901f0> {apic_timer_interrupt+0x0/0x80}
[ 44.468168] mce: [Hardware Error]: TSC 365779aece
[ 44.468168] mce: [Hardware Error]: PROCESSOR 0:6fb TIME 1394716667 SOCKET 0 APIC 0 microcode ba
[ 44.468168] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 44.468168] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 44.468168] Kernel panic — not syncing: Fatal Machine check
[ 44.468168] drm_kms_helper: panic occurred, switching back to text console
[ 44.468168] Rebooting in 30 seconds..
[-- Attachment #1.3: mcelog.txt --]
[-- Type: text/plain, Size: 2486 bytes --]
Hardware event. This is not a software error.
CPU 3 BANK 0
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5
Hardware event. This is not a software error.
CPU 3 BANK 5
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220024080400 MCGSTATUS 5
Hardware event. This is not a software error.
CPU 1 BANK 0
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 1 BANK 5
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200220010040400 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 2 BANK 0
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 2 BANK 5
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 4
Hardware event. This is not a software error.
CPU 0 BANK 5
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5
Hardware event. This is not a software error.
CPU 0 BANK 0
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
STATUS b200004000000800 MCGSTATUS 5
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]
next prev parent reply other threads:[~2014-03-21 19:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-21 17:10 PROBLEM: Fatal Machine Check >= 3.13.5-101.fc19.x86_64 Matthias Graf
2014-03-21 17:27 ` Borislav Petkov
2014-03-21 19:49 ` Matthias Graf [this message]
2014-03-21 20:13 ` Borislav Petkov
2014-03-21 20:35 ` Matthias Graf
2014-03-21 20:37 ` Tony Luck
2014-03-24 17:22 ` Matthias Graf
2014-04-02 14:14 ` Matthias Graf
2014-04-16 14:22 ` Borislav Petkov
2014-04-17 6:25 ` Matthias Graf
2014-04-17 13:02 ` Borislav Petkov
2014-04-18 9:17 ` Matthias Graf
2014-04-18 9:45 ` Borislav Petkov
2014-04-18 11:45 ` Matthias Graf
2014-04-18 12:40 ` Borislav Petkov
2014-04-18 13:08 ` Deucher, Alexander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=532C97DF.9010201@st.ovgu.de \
--to=matthias.graf@st.ovgu.de \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.