Linux EDAC development
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Aristeu Rozanski <aris@redhat.com>,
	Mauro Carvalho Chehab <mchehab@s-opensource.com>,
	linux-edac@vger.kernel.org
Subject: EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting
Date: Fri, 28 Sep 2018 14:39:34 -0700	[thread overview]
Message-ID: <20180928213934.19890-1-tony.luck@intel.com> (raw)

We pick up the count of errors from bits 52:38 of the machine
check bank status register. But this is the count of *corrected*
errors. If we are logging an uncorrected error, the h/w set this
field to 0. Which means that when we call into edac_mc_handle_error()
the EDAC core will carefully add zero to the appropriate uncorrected
error counts.

Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

So I was digging around in /sys/devices/system/edac/mc/mc*/* and noticed
the the "ue_count" files all read "0" even after injecting and recovering
from some 2-bit error.

 drivers/edac/i7core_edac.c | 1 +
 drivers/edac/sb_edac.c     | 1 +
 drivers/edac/skx_edac.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 8e120bf60624..f1d19504a028 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1711,6 +1711,7 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 	u32 errnum = find_first_bit(&error, 32);
 
 	if (uncorrected_error) {
+		core_err_cnt = 1;
 		if (ripv)
 			tp_event = HW_EVENT_ERR_FATAL;
 		else
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 07726fb00321..72cea3cb8622 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2888,6 +2888,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 		recoverable = GET_BITFIELD(m->status, 56, 56);
 
 	if (uncorrected_error) {
+		core_err_cnt = 1;
 		if (ripv) {
 			type = "FATAL";
 			tp_event = HW_EVENT_ERR_FATAL;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index fae095162c01..3c5c95428f1d 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -959,6 +959,7 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
 	recoverable = GET_BITFIELD(m->status, 56, 56);
 
 	if (uncorrected_error) {
+		core_err_cnt = 1;
 		if (ripv) {
 			type = "FATAL";
 			tp_event = HW_EVENT_ERR_FATAL;

             reply	other threads:[~2018-09-28 21:39 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-28 21:39 Luck, Tony [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-09-29  9:03 EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180928213934.19890-1-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=aris@redhat.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=mchehab@s-opensource.com \
    --cc=qiuxu.zhuo@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox