* EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting
@ 2018-09-28 21:39 Luck, Tony
0 siblings, 0 replies; 2+ messages in thread
From: Luck, Tony @ 2018-09-28 21:39 UTC (permalink / raw)
To: Borislav Petkov
Cc: Tony Luck, Qiuxu Zhuo, Aristeu Rozanski, Mauro Carvalho Chehab,
linux-edac
We pick up the count of errors from bits 52:38 of the machine
check bank status register. But this is the count of *corrected*
errors. If we are logging an uncorrected error, the h/w set this
field to 0. Which means that when we call into edac_mc_handle_error()
the EDAC core will carefully add zero to the appropriate uncorrected
error counts.
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
So I was digging around in /sys/devices/system/edac/mc/mc*/* and noticed
the the "ue_count" files all read "0" even after injecting and recovering
from some 2-bit error.
drivers/edac/i7core_edac.c | 1 +
drivers/edac/sb_edac.c | 1 +
drivers/edac/skx_edac.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 8e120bf60624..f1d19504a028 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1711,6 +1711,7 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
u32 errnum = find_first_bit(&error, 32);
if (uncorrected_error) {
+ core_err_cnt = 1;
if (ripv)
tp_event = HW_EVENT_ERR_FATAL;
else
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 07726fb00321..72cea3cb8622 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2888,6 +2888,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
recoverable = GET_BITFIELD(m->status, 56, 56);
if (uncorrected_error) {
+ core_err_cnt = 1;
if (ripv) {
type = "FATAL";
tp_event = HW_EVENT_ERR_FATAL;
diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c
index fae095162c01..3c5c95428f1d 100644
--- a/drivers/edac/skx_edac.c
+++ b/drivers/edac/skx_edac.c
@@ -959,6 +959,7 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
recoverable = GET_BITFIELD(m->status, 56, 56);
if (uncorrected_error) {
+ core_err_cnt = 1;
if (ripv) {
type = "FATAL";
tp_event = HW_EVENT_ERR_FATAL;
^ permalink raw reply related [flat|nested] 2+ messages in thread
* EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting
@ 2018-09-29 9:03 Borislav Petkov
0 siblings, 0 replies; 2+ messages in thread
From: Borislav Petkov @ 2018-09-29 9:03 UTC (permalink / raw)
To: Tony Luck; +Cc: Qiuxu Zhuo, Aristeu Rozanski, Mauro Carvalho Chehab, linux-edac
On Fri, Sep 28, 2018 at 02:39:34PM -0700, Tony Luck wrote:
> We pick up the count of errors from bits 52:38 of the machine
> check bank status register. But this is the count of *corrected*
> errors. If we are logging an uncorrected error, the h/w set this
> field to 0. Which means that when we call into edac_mc_handle_error()
> the EDAC core will carefully add zero to the appropriate uncorrected
> error counts.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>
> So I was digging around in /sys/devices/system/edac/mc/mc*/* and noticed
> the the "ue_count" files all read "0" even after injecting and recovering
> from some 2-bit error.
>
> drivers/edac/i7core_edac.c | 1 +
> drivers/edac/sb_edac.c | 1 +
> drivers/edac/skx_edac.c | 1 +
> 3 files changed, 3 insertions(+)
Applied, thanks.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-09-29 9:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-29 9:03 EDAC, i7core/sb_edac/skx_edac: Fix uncorrected error counting Borislav Petkov
-- strict thread matches above, loose matches on Subject: below --
2018-09-28 21:39 Luck, Tony
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.