public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tony Luck <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Youquan Song <youquan.song@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] x86/mce: Add Skylake quirk for patrol scrub reported errors
Date: Mon, 15 Jun 2020 11:40:56 -0700	[thread overview]
Message-ID: <20200615184056.26531-1-tony.luck@intel.com> (raw)

From: Youquan Song <youquan.song@intel.com>

Skylake has a mode where the system administrator can use a BIOS setup
option to request that the memory controller report uncorrected errors
found by the patrol scrubber as corrected.  This results in them being
signalled using CMCI, which is less disruptive than a machine check.

Add a quirk to detect that a "corrected" error is actually a downgraded
uncorrected error with model specific checks for the "MSCOD" signature in
MCi_STATUS and that the error was reported from a memory controller bank.

Adjust the severity to MCE_AO_SEVERITY so that Linux will try to take
the affected page offline.

[Tony: Wordsmith commit comment]

Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e9265e2f28c9..0dbd0a21a0bf 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -123,6 +123,8 @@ static struct irq_work mce_irq_work;
 
 static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
 
+static void no_adjust_mce_log(struct mce *m) {};
+static void (*adjust_mce_log)(struct mce *m) = no_adjust_mce_log;
 /*
  * CPU/chipset specific EDAC code can register a notifier call here to print
  * MCE errors in a human-readable form.
@@ -772,6 +774,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
 			goto clear_it;
 
+		adjust_mce_log(&m);
 		mce_log(&m);
 
 clear_it:
@@ -1640,6 +1643,30 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
 	m->cs = regs->cs;
 }
 
+/*
+ * Skylake family CPUs have a mode where the user can request that
+ * the memory controller report uncorrected errors found by the patrol
+ * scrubber as corrected (MCI_STATUS_UC == 0). This results in them being
+ * signalled using CMCI, which is less disruptive that a machine check.
+ * The following quirk detects such errors and adjusts the severity.
+ */
+
+#define MSCOD_UCE_SCRUB	(0x0010 << 16) /* UnCorrected Patrol Scrub Error */
+#define MSCOD_MASK	GENMASK_ULL(31, 16)
+
+/*
+ * Check the error code to see if this is an uncorrected patrol
+ * scrub error from one of the memory controller banks. If so,
+ * then adjust the severity level to MCE_AO_SEVERITY
+ */
+static void quirk_skx_adjust_mce_log(struct mce *m)
+{
+	if (((m->status & MCACOD_SCRUBMSK) == MCACOD_SCRUB) &&
+	    ((m->status & MSCOD_MASK) == MSCOD_UCE_SCRUB) &&
+	    m->bank >= 13 && m->bank <= 18)
+		m->severity = MCE_AO_SEVERITY;
+}
+
 /* Add per CPU specific workarounds here */
 static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 {
@@ -1714,6 +1741,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 
 		if (c->x86 == 6 && c->x86_model == 45)
 			quirk_no_way_out = quirk_sandybridge_ifu;
+
+		if (c->x86 == 6 && c->x86_model == INTEL_FAM6_SKYLAKE_X)
+			adjust_mce_log = quirk_skx_adjust_mce_log;
 	}
 
 	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
-- 
2.21.1


             reply	other threads:[~2020-06-15 18:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 18:40 Tony Luck [this message]
2020-06-16 19:29 ` [PATCH] x86/mce: Add Skylake quirk for patrol scrub reported errors Borislav Petkov
2020-06-16 22:33   ` Luck, Tony
2020-06-17  7:41     ` Borislav Petkov
2020-06-17 18:49       ` Luck, Tony
2020-08-28 20:21         ` [PATCH v2] " Luck, Tony
2020-09-25 19:19           ` Borislav Petkov
2020-09-25 23:06             ` Luck, Tony
2020-09-27 22:19               ` Borislav Petkov
2020-09-30  2:13                 ` [PATCH 0/2] mce severity quirk & cleanup Tony Luck
2020-09-30  2:13                   ` [PATCH 1/2] x86/mce: Add Skylake quirk for patrol scrub reported errors Tony Luck
2020-09-30  5:53                     ` [tip: ras/core] " tip-bot2 for Borislav Petkov
2020-09-30  2:13                   ` [PATCH 2/2] x86/mce: Drop AMD specific "DEFERRED" case from Intel severity rule list Tony Luck
2020-09-30  5:53                     ` [tip: ras/core] x86/mce: Drop AMD-specific " tip-bot2 for Tony Luck
  -- strict thread matches above, loose matches on Subject: below --
2021-03-22 22:37 [PATCH] x86/mce: Add Skylake quirk for patrol scrub reported errors Tony Luck
2021-03-24 15:00 ` Borislav Petkov
2021-03-24 15:35   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200615184056.26531-1-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox