public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Song, Youquan" <youquan.song@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] x86/mce: Add Skylake quirk for patrol scrub reported errors
Date: Fri, 25 Sep 2020 21:19:12 +0200	[thread overview]
Message-ID: <20200925191912.GO16872@zn.tnic> (raw)
In-Reply-To: <20200828202150.GA11854@agluck-desk2.amr.corp.intel.com>

On Fri, Aug 28, 2020 at 01:21:50PM -0700, Luck, Tony wrote:
> +static void adjust_mce_log(struct mce *m)
> +{
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> +	if (c->x86_vendor == X86_VENDOR_INTEL && c->x86 == 6 &&
> +	    c->x86_model == INTEL_FAM6_SKYLAKE_X && c->x86_stepping >= 4) {
> +		/*
> +		 * Check the error code to see if this is an uncorrected patrol
> +		 * scrub error from one of the memory controller banks. If so,
> +		 * then adjust the severity level to MCE_AO_SEVERITY
> +		 */
> +		if (((m->status & MCACOD_SCRUBMSK) == MCACOD_SCRUB) &&
> +		    ((m->status & MSCOD_MASK) == MSCOD_UCE_SCRUB) &&
> +		    m->bank >= 13 && m->bank <= 18)
> +			m->severity = MCE_AO_SEVERITY;
> +	}
> +}
> +
>  DEFINE_PER_CPU(unsigned, mce_poll_count);
>  
>  /*
> @@ -772,6 +801,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
>  			goto clear_it;
>  
> +		adjust_mce_log(&m);
>  		mce_log(&m);

Coming back to this and looking at it, I can't say that I like it. We're
sticking hooks to look at and massage the logged data everywhere on the
MCE processing path and it is getting really unwieldy.

And after staring at this a bit, it looks like all it wants to do is to
adjust the severity. And we have a severity grading mechanism. So let's
see how ugly it would become if we extended it to check that too.

So how's that below instead?

It builds here, I haven't even thought about testing it and I might've
missed out on some aspects but tbh this looks much better to me. Because
it is not bolted on the handling path but integral part of it.

Thoughts?

---
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index e1da619add19..8c1a41aa5e40 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -9,9 +9,11 @@
 #include <linux/seq_file.h>
 #include <linux/init.h>
 #include <linux/debugfs.h>
-#include <asm/mce.h>
 #include <linux/uaccess.h>
 
+#include <asm/mce.h>
+#include <asm/intel-family.h>
+
 #include "internal.h"
 
 /*
@@ -40,9 +42,14 @@ static struct severity {
 	unsigned char context;
 	unsigned char excp;
 	unsigned char covered;
+	unsigned char cpu_model;
+	unsigned char cpu_stepping;
+	unsigned char bank_lo, bank_hi;
 	char *msg;
 } severities[] = {
 #define MCESEV(s, m, c...) { .sev = MCE_ ## s ## _SEVERITY, .msg = m, ## c }
+#define BANK_RANGE(l, h) .bank_lo = l, .bank_hi = h
+#define MODEL_STEPPING(m,s) .cpu_model = m, .cpu_stepping = s
 #define  KERNEL		.context = IN_KERNEL
 #define  USER		.context = IN_USER
 #define  KERNEL_RECOV	.context = IN_KERNEL_RECOV
@@ -97,7 +104,10 @@ static struct severity {
 		KEEP, "Corrected error",
 		NOSER, BITCLR(MCI_STATUS_UC)
 		),
-
+	MCESEV(AO, "UnCorrected Patrol Scrub Error",
+		NOSER, MASK(0xffffeff0, 0x001000c0),
+		MODEL_STEPPING(INTEL_FAM6_SKYLAKE_X, 4),BANK_RANGE(13,18)
+	),
 	/*
 	 * known AO MCACODs reported via MCE or CMC:
 	 *
@@ -324,6 +334,12 @@ static int mce_severity_intel(struct mce *m, int tolerant, char **msg, bool is_e
 			continue;
 		if (s->excp && excp != s->excp)
 			continue;
+		if (s->cpu_model && boot_cpu_data.x86_model != s->cpu_model)
+			continue;
+		if (s->cpu_stepping && boot_cpu_data.x86_stepping <= s->cpu_stepping)
+			continue;
+		if (s->bank_lo && (s->bank_lo <= m->bank && m->bank <= s->bank_hi))
+			continue;
 		if (msg)
 			*msg = s->msg;
 		s->covered = 1;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2020-09-25 20:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 18:40 [PATCH] x86/mce: Add Skylake quirk for patrol scrub reported errors Tony Luck
2020-06-16 19:29 ` Borislav Petkov
2020-06-16 22:33   ` Luck, Tony
2020-06-17  7:41     ` Borislav Petkov
2020-06-17 18:49       ` Luck, Tony
2020-08-28 20:21         ` [PATCH v2] " Luck, Tony
2020-09-25 19:19           ` Borislav Petkov [this message]
2020-09-25 23:06             ` Luck, Tony
2020-09-27 22:19               ` Borislav Petkov
2020-09-30  2:13                 ` [PATCH 0/2] mce severity quirk & cleanup Tony Luck
2020-09-30  2:13                   ` [PATCH 1/2] x86/mce: Add Skylake quirk for patrol scrub reported errors Tony Luck
2020-09-30  5:53                     ` [tip: ras/core] " tip-bot2 for Borislav Petkov
2020-09-30  2:13                   ` [PATCH 2/2] x86/mce: Drop AMD specific "DEFERRED" case from Intel severity rule list Tony Luck
2020-09-30  5:53                     ` [tip: ras/core] x86/mce: Drop AMD-specific " tip-bot2 for Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200925191912.GO16872@zn.tnic \
    --to=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox