From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Aristeu Rozanski <aris@redhat.com>,
Mauro Carvalho Chehab <mchehab@s-opensource.com>,
linux-edac@vger.kernel.org
Subject: EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting
Date: Fri, 28 Sep 2018 14:39:34 -0700 [thread overview]
Message-ID: <20180928213934.19890-1-tony.luck@intel.com> (raw)
We pick up the count of errors from bits 52:38 of the machine
check bank status register. But this is the count of *corrected*
errors. If we are logging an uncorrected error, the h/w set this
field to 0. Which means that when we call into edac_mc_handle_error()
the EDAC core will carefully add zero to the appropriate uncorrected
error counts.
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
So I was digging around in /sys/devices/system/edac/mc/mc*/* and noticed
the the "ue_count" files all read "0" even after injecting and recovering
from some 2-bit error.
drivers/edac/i7core_edac.c | 1 +
drivers/edac/sb_edac.c | 1 +
drivers/edac/skx_edac.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 8e120bf60624..f1d19504a028 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1711,6 +1711,7 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
u32 errnum = find_first_bit(&error, 32);
if (uncorrected_error) {
+ core_err_cnt = 1;
if (ripv)
tp_event = HW_EVENT_ERR_FATAL;
else
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 07726fb00321..72cea3cb8622 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2888,6 +2888,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
recoverable = GET_BITFIELD(m->status, 56, 56);
if (uncorrected_error) {
+ core_err_cnt = 1;
if (ripv) {
type = "FATAL";
tp_event = HW_EVENT_ERR_FATAL;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index fae095162c01..3c5c95428f1d 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -959,6 +959,7 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
recoverable = GET_BITFIELD(m->status, 56, 56);
if (uncorrected_error) {
+ core_err_cnt = 1;
if (ripv) {
type = "FATAL";
tp_event = HW_EVENT_ERR_FATAL;
next reply other threads:[~2018-09-28 21:39 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-28 21:39 Luck, Tony [this message]
-- strict thread matches above, loose matches on Subject: below --
2018-09-29 9:03 EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180928213934.19890-1-tony.luck@intel.com \
--to=tony.luck@intel.com \
--cc=aris@redhat.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=mchehab@s-opensource.com \
--cc=qiuxu.zhuo@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox