* How to interpret MCE messages?
@ 2006-11-08 16:20 martin f krafft
2006-11-08 16:29 ` Alan Cox
2006-11-09 4:11 ` Valdis.Kletnieks
0 siblings, 2 replies; 7+ messages in thread
From: martin f krafft @ 2006-11-08 16:20 UTC (permalink / raw)
To: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]
Thanks to mcelog, I am now regularly seeing messages like this on an
amd64 machine:
kernel: Machine check events logged
bit46 = corrected ecc error
Data cache ECC error (syndrome 5b)
memory/cache error 'data read mem transaction, data transaction, level 2'
ADDR 38ed9200
CPU 0 0 data cache TSC fe4f9128ade
MCE 0
STATUS 942dc00000000136 MCGSTATUS 0
The RAM modules are *not* ECC modules, nor does the Asus K8V Deluxe
motherboard support ECC to my knowledge. I've turned ECC support on
and off in the Bios without any effect.
I've already run memtest86+ for hours without finding any problems,
and I've removed each of the two memory modules for a while, but
I still saw these errors appearing.
Before I go out and buy a new motherboard (as I assume that it's
a L1/L2 cache problem), I'd like to know how I am to interpret these
MCE dumps and how I could use them to actually pinpoint the source
of the problem.
Cheers,
--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
spamtraps: madduck.bogus@madduck.net
"america may be unique in being a country which has leapt
from barbarism to decadence without touching civilization."
-- john o'hara
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to interpret MCE messages?
2006-11-08 16:20 How to interpret MCE messages? martin f krafft
@ 2006-11-08 16:29 ` Alan Cox
2006-11-08 23:12 ` martin f krafft
2006-11-15 9:27 ` martin f krafft
2006-11-09 4:11 ` Valdis.Kletnieks
1 sibling, 2 replies; 7+ messages in thread
From: Alan Cox @ 2006-11-08 16:29 UTC (permalink / raw)
To: martin f krafft; +Cc: linux kernel mailing list
Ar Mer, 2006-11-08 am 17:20 +0100, ysgrifennodd martin f krafft:
> Thanks to mcelog, I am now regularly seeing messages like this on an
> amd64 machine:
>
> kernel: Machine check events logged
> bit46 = corrected ecc error
> Data cache ECC error (syndrome 5b)
Cache.. not memory
> memory/cache error 'data read mem transaction, data transaction, level 2'
L2 Cache
> Before I go out and buy a new motherboard (as I assume that it's
> a L1/L2 cache problem),
L1/L2 cache are on the CPU these days. Double check with the processor
docs and vendor but I think mcelog is actually trying to tell you that
the CPU wants to be warranty returned. It might also of course be a heat
problem.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to interpret MCE messages?
2006-11-08 16:29 ` Alan Cox
@ 2006-11-08 23:12 ` martin f krafft
2006-11-15 9:27 ` martin f krafft
1 sibling, 0 replies; 7+ messages in thread
From: martin f krafft @ 2006-11-08 23:12 UTC (permalink / raw)
To: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]
also sprach Alan Cox <alan@lxorguk.ukuu.org.uk> [2006.11.08.1729 +0100]:
> > memory/cache error 'data read mem transaction, data
> > transaction, level 2'
>
> L2 Cache
Gosh, I must be blind. Somehow there was too much information in
that dump. Thanks Alan!
> > Before I go out and buy a new motherboard (as I assume that it's
> > a L1/L2 cache problem),
>
> L1/L2 cache are on the CPU these days. Double check with the processor
> docs and vendor but I think mcelog is actually trying to tell you that
> the CPU wants to be warranty returned. It might also of course be a heat
> problem.
I am afraid the CPU might be out of warranty, but I'll try; I doubt
it's a heat problem since there are plenty fans and the machine's
interior is actually of quite agreeable temperature.
I'll check the CPU. Again, thanks.
--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
spamtraps: madduck.bogus@madduck.net
if you find a spelling mistake in the above, you get to keep it.
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to interpret MCE messages?
2006-11-08 16:20 How to interpret MCE messages? martin f krafft
2006-11-08 16:29 ` Alan Cox
@ 2006-11-09 4:11 ` Valdis.Kletnieks
2006-11-11 4:31 ` Mark Rosenstand
1 sibling, 1 reply; 7+ messages in thread
From: Valdis.Kletnieks @ 2006-11-09 4:11 UTC (permalink / raw)
To: martin f krafft; +Cc: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 368 bytes --]
On Wed, 08 Nov 2006 17:20:22 +0100, martin f krafft said:
> The RAM modules are *not* ECC modules, nor does the Asus K8V Deluxe
> motherboard support ECC to my knowledge. I've turned ECC support on
> and off in the Bios without any effect.
How odd. Is it considered normal to have a BIOS option to turn
ECC support on/off on a motherboard that doesn't support ECC?
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to interpret MCE messages?
2006-11-09 4:11 ` Valdis.Kletnieks
@ 2006-11-11 4:31 ` Mark Rosenstand
0 siblings, 0 replies; 7+ messages in thread
From: Mark Rosenstand @ 2006-11-11 4:31 UTC (permalink / raw)
To: Valdis.Kletnieks, linux-kernel
On Wed, 2006-11-08 at 23:11 -0500, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 08 Nov 2006 17:20:22 +0100, martin f krafft said:
>
> > The RAM modules are *not* ECC modules, nor does the Asus K8V Deluxe
> > motherboard support ECC to my knowledge. I've turned ECC support on
> > and off in the Bios without any effect.
>
> How odd. Is it considered normal to have a BIOS option to turn
> ECC support on/off on a motherboard that doesn't support ECC?
I think it does support ECC, at least that was my main argument to get a
K8V-X (less feature-bloated version, recommended by djb) two years ago,
as very few socket 754 boards supported it at that time (which is extra
weird since it comes pretty much for free with K8 CPU's.)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to interpret MCE messages?
2006-11-08 16:29 ` Alan Cox
2006-11-08 23:12 ` martin f krafft
@ 2006-11-15 9:27 ` martin f krafft
2006-11-17 23:27 ` dean gaudet
1 sibling, 1 reply; 7+ messages in thread
From: martin f krafft @ 2006-11-15 9:27 UTC (permalink / raw)
To: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]
also sprach Alan Cox <alan@lxorguk.ukuu.org.uk> [2006.11.08.1729 +0100]:
> > Before I go out and buy a new motherboard (as I assume that it's
> > a L1/L2 cache problem),
>
> L1/L2 cache are on the CPU these days. Double check with the
> processor docs and vendor but I think mcelog is actually trying to
> tell you that the CPU wants to be warranty returned. It might also
> of course be a heat problem.
I've cleaned the fan and cooler and put a huge fan next to the open
case, blowing any heat out of it. I saw the errors again, even
without any load.
Thus I guess the CPU is asking for retirement. I am just
double-checking with you guys whether I can be sure that it's only
the CPU, or whether it could also be the fault of the motherboard...
Thanks,
--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
spamtraps: madduck.bogus@madduck.net
time wounds all heels.
-- groucho marx
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to interpret MCE messages?
2006-11-15 9:27 ` martin f krafft
@ 2006-11-17 23:27 ` dean gaudet
0 siblings, 0 replies; 7+ messages in thread
From: dean gaudet @ 2006-11-17 23:27 UTC (permalink / raw)
To: martin f krafft; +Cc: linux kernel mailing list
On Wed, 15 Nov 2006, martin f krafft wrote:
> Thus I guess the CPU is asking for retirement. I am just
> double-checking with you guys whether I can be sure that it's only
> the CPU, or whether it could also be the fault of the motherboard...
could be VRMs and/or PSU delivering unclean power... but you'd probably
see other errors in that case too.
-dean
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-11-17 23:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-08 16:20 How to interpret MCE messages? martin f krafft
2006-11-08 16:29 ` Alan Cox
2006-11-08 23:12 ` martin f krafft
2006-11-15 9:27 ` martin f krafft
2006-11-17 23:27 ` dean gaudet
2006-11-09 4:11 ` Valdis.Kletnieks
2006-11-11 4:31 ` Mark Rosenstand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox