From: Borislav Petkov <bp@amd64.org>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@amd64.org>,
Mauro Carvalho Chehab <mchehab@redhat.com>,
Ingo Molnar <mingo@elte.hu>,
EDAC devel <linux-edac@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint
Date: Wed, 29 Feb 2012 18:16:26 +0100 [thread overview]
Message-ID: <20120229171626.GJ21224@aftab> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F040115@ORSMSX104.amr.corp.intel.com>
On Wed, Feb 29, 2012 at 04:58:09PM +0000, Luck, Tony wrote:
> > - severity: No real need for it. If the error is severe enough, the
> > kernel handles automatically, i.e. memory poisoning and recovery. In all
> > the other cases it is not severe enough.
>
> We'll never see fatal errors via the perf/tracepoint (no way the RAS daemon
> will run to pull them). But we will see both corrected error chatter and
> recovered uncorrectable errors. I would be able to tell these apart.
> Corrected errors in small doses are normal and don't require any
> action beyond logging so you can see whether there are enough to cross
> a threshold and cause alarm. Recovered uncorrectable errors are going
> to be much rarer, and I think deserve closer scrutiny - even when there
> is just one of them.
> If you drop the severity field, is there some other way to make this
> distinction?
Err, MCi_STATUS bits like bit 55 (Action Required) and 56 (Signaled #MC)
in your case...?
> > - silkscreen_label: <sarcasm> yeah, I'm getting a, say, a Data
> > Cache error during an L1 linefill from L2, what the f*ck does the
> > silkscreen label mean for such an error?! Well, nobody knows wtf it
> > means!</sarcasm>
>
> Cache error should point to a cpu socket - I'd like to have a silk
> screen label for that (are they numbered "0, 1, 2 ..." on the motherboard
> or "1, 2, 3 ..."?) No idea where we'd get that information from. dmidecode
> shows "Socket Designation: CPU 1" (and "2") for my current Sandy Bridge
> system. I'd have to pull the system apart to see if those are helpful
> in identifying which physical cpu is which.
First of all, silkscreen label denotes DIMM slots in this context
AFAICT. Concerning CPU sockets, I'm not aware of a method to read out
the silkscreen labels at the CPU sockets, are you? Or am I missing
something?
IOW, we want to assume that cores 0, 1, 2 ... k-1 are on node 0; k, k+1
... 2k-1 belong to node 1, etc., where k is the number of cores on a
socket and thus we have a regular core enumeration on the box.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2012-02-29 17:16 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-28 16:11 [RFC PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-02-28 16:11 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
2012-02-29 1:14 ` Hidetoshi Seto
2012-02-29 10:10 ` Borislav Petkov
2012-02-29 12:04 ` Mauro Carvalho Chehab
2012-02-29 12:19 ` Borislav Petkov
2012-02-29 13:05 ` Mauro Carvalho Chehab
2012-02-29 13:37 ` Borislav Petkov
2012-02-29 17:11 ` Luck, Tony
2012-02-29 17:19 ` Borislav Petkov
2012-03-01 2:23 ` Hidetoshi Seto
2012-03-01 11:40 ` Borislav Petkov
2012-03-01 18:28 ` Luck, Tony
2012-03-02 4:02 ` Hidetoshi Seto
2012-03-02 13:17 ` Mauro Carvalho Chehab
2012-03-02 20:05 ` Luck, Tony
2012-02-29 17:20 ` Luck, Tony
2012-02-29 18:00 ` Mauro Carvalho Chehab
2012-02-29 18:11 ` Luck, Tony
2012-02-29 12:52 ` Mauro Carvalho Chehab
2012-02-29 13:45 ` Borislav Petkov
2012-02-29 14:04 ` Mauro Carvalho Chehab
2012-02-29 14:40 ` Borislav Petkov
2012-02-29 16:58 ` Luck, Tony
2012-02-29 17:16 ` Borislav Petkov [this message]
2012-02-29 17:33 ` Luck, Tony
2012-03-01 11:29 ` Borislav Petkov
2012-03-01 13:19 ` Mauro Carvalho Chehab
2012-03-01 18:15 ` Luck, Tony
2012-03-01 18:45 ` Borislav Petkov
2012-03-01 18:58 ` Luck, Tony
2012-03-01 19:54 ` Mauro Carvalho Chehab
2012-02-29 17:45 ` Mauro Carvalho Chehab
2012-02-29 17:17 ` Mauro Carvalho Chehab
2012-02-28 16:11 ` [PATCH 2/3] x86, RAS: Add a decoded msg buffer Borislav Petkov
2012-02-28 22:43 ` Luck, Tony
2012-02-29 10:11 ` Borislav Petkov
2012-03-02 9:55 ` Borislav Petkov
2012-02-28 16:11 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
-- strict thread matches above, loose matches on Subject: below --
2012-03-06 13:31 [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-03-06 13:31 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120229171626.GJ21224@aftab \
--to=bp@amd64.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=mingo@elte.hu \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.