EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting

All of lore.kernel.org
 help / color / mirror / Atom feed

* EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting
@ 2018-09-28 21:39 Luck, Tony
  0 siblings, 0 replies; 2+ messages in thread
From: Luck, Tony @ 2018-09-28 21:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Qiuxu Zhuo, Aristeu Rozanski, Mauro Carvalho Chehab,
	linux-edac

We pick up the count of errors from bits 52:38 of the machine
check bank status register. But this is the count of *corrected*
errors. If we are logging an uncorrected error, the h/w set this
field to 0. Which means that when we call into edac_mc_handle_error()
the EDAC core will carefully add zero to the appropriate uncorrected
error counts.

Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

So I was digging around in /sys/devices/system/edac/mc/mc*/* and noticed
the the "ue_count" files all read "0" even after injecting and recovering
from some 2-bit error.

 drivers/edac/i7core_edac.c | 1 +
 drivers/edac/sb_edac.c     | 1 +
 drivers/edac/skx_edac.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 8e120bf60624..f1d19504a028 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1711,6 +1711,7 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 	u32 errnum = find_first_bit(&error, 32);
 
 	if (uncorrected_error) {
+		core_err_cnt = 1;
 		if (ripv)
 			tp_event = HW_EVENT_ERR_FATAL;
 		else
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 07726fb00321..72cea3cb8622 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2888,6 +2888,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 		recoverable = GET_BITFIELD(m->status, 56, 56);
 
 	if (uncorrected_error) {
+		core_err_cnt = 1;
 		if (ripv) {
 			type = "FATAL";
 			tp_event = HW_EVENT_ERR_FATAL;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index fae095162c01..3c5c95428f1d 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -959,6 +959,7 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
 	recoverable = GET_BITFIELD(m->status, 56, 56);
 
 	if (uncorrected_error) {
+		core_err_cnt = 1;
 		if (ripv) {
 			type = "FATAL";
 			tp_event = HW_EVENT_ERR_FATAL;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting
@ 2018-09-29  9:03 Borislav Petkov
  0 siblings, 0 replies; 2+ messages in thread
From: Borislav Petkov @ 2018-09-29  9:03 UTC (permalink / raw)
  To: Tony Luck; +Cc: Qiuxu Zhuo, Aristeu Rozanski, Mauro Carvalho Chehab, linux-edac

On Fri, Sep 28, 2018 at 02:39:34PM -0700, Tony Luck wrote:
> We pick up the count of errors from bits 52:38 of the machine
> check bank status register. But this is the count of *corrected*
> errors. If we are logging an uncorrected error, the h/w set this
> field to 0. Which means that when we call into edac_mc_handle_error()
> the EDAC core will carefully add zero to the appropriate uncorrected
> error counts.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> 
> So I was digging around in /sys/devices/system/edac/mc/mc*/* and noticed
> the the "ue_count" files all read "0" even after injecting and recovering
> from some 2-bit error.
> 
>  drivers/edac/i7core_edac.c | 1 +
>  drivers/edac/sb_edac.c     | 1 +
>  drivers/edac/skx_edac.c    | 1 +
>  3 files changed, 3 insertions(+)

Applied, thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-09-29  9:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-29  9:03 EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2018-09-28 21:39 Luck, Tony

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.