From: Borislav Petkov <bp@amd64.org>
To: Kevin Bowling <kevin.bowling@kev009.com>
Cc: "bluesmoke-devel@lists.sourceforge.net"
<bluesmoke-devel@lists.sourceforge.net>
Subject: Re: Interpreting EDAC errors
Date: Mon, 20 Jun 2011 09:34:22 +0200 [thread overview]
Message-ID: <20110620073422.GB9070@aftab> (raw)
In-Reply-To: <BANLkTin-O8co99AnG+KT3+CUGmgaudRvYA@mail.gmail.com>
Hi,
On Mon, Jun 20, 2011 at 12:57:26AM -0400, Kevin Bowling wrote:
> Hello,
>
> I've been seeing the following errors from the EDAC system. I'm not
> quite sure how to associate the output from edac-util to physical
> DIMMs. How do we account for multi-rank DIMMs, interleaving, NUMA,
> etc?
Judging by the mainboard, this is a dual socket Magny-Cours. A couple of
things:
* interpreting DRAM ECC errors is still suboptimal and we're working on
it, I'll try to come up with an interim solution to make the decoded
error info a bit more understandable.
* you have one singe-bit error which got corrected by the memory
controller on 4 DIMMs and over the current system uptime so I wouldn't
worry too much. I would monitor the DIMMs though and take action only if
those error rates start to grow over time.
You have 4 8G DIMMs per node but I don't know they rank
count so please take the below with a grain of salt. Wait,
http://www.alldatasheet.com/datasheet-pdf/pdf/332888/HYNIX/HMT31GR7BFR4C-H9.html
says that yours are actually dual-ranked.
Btw, kernel dmesg output of EDAC should help to pinpoint them better.
> root@PM-LAS-PROD-0:~# edac-util
> mc0: csrow3: ch0: 1 Corrected Errors
This should be P1_DIMM1A if your DIMMs are quadranked, P1_DIMM2A if
dual-ranked.
> mc1: csrow2: ch0: 1 Corrected Errors
P1_DIMM3A or P1_DIMM4A as above. Also, I'm assuming that the increasing
nomenclature in the silkscreen labeling is mapping the memory
controllers in the same way, i.e.:
mc0 -> 1A, 2A
mc1 -> 3A, 4A
> mc2: csrow3: ch0: 1 Corrected Errors
> mc2: csrow3: ch1: 1 Corrected Errors
This looks like P2_DIMM3A
So, yeah, it is suboptimal and it needs fixing, I know.
HTH.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
next prev parent reply other threads:[~2011-06-20 7:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-20 4:57 Interpreting EDAC errors Kevin Bowling
2011-06-20 7:34 ` Borislav Petkov [this message]
2011-06-20 8:31 ` Kevin Bowling
2011-06-20 10:15 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110620073422.GB9070@aftab \
--to=bp@amd64.org \
--cc=bluesmoke-devel@lists.sourceforge.net \
--cc=kevin.bowling@kev009.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).