From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752639AbcD3Mff (ORCPT ); Sat, 30 Apr 2016 08:35:35 -0400 Received: from mail.skyhub.de ([78.46.96.112]:40520 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751257AbcD3MeC (ORCPT ); Sat, 30 Apr 2016 08:34:02 -0400 From: Borislav Petkov To: Ingo Molnar Cc: Tony Luck , LKML Subject: [PATCH 2/7] x86/mce: Grade uncorrected errors for SMCA-enabled systems Date: Sat, 30 Apr 2016 14:33:52 +0200 Message-Id: <1462019637-16474-3-git-send-email-bp@alien8.de> X-Mailer: git-send-email 2.7.3 In-Reply-To: <1462019637-16474-1-git-send-email-bp@alien8.de> References: <1462019637-16474-1-git-send-email-bp@alien8.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Aravind Gopalakrishnan For upcoming processors with Scalable MCA feature, we need to check the "succor" CPUID bit and the TCC bit in the MCx_STATUS register in order to grade an MCE's severity. Signed-off-by: Aravind Gopalakrishnan Cc: Aravind Gopalakrishnan Cc: Tony Luck Cc: linux-edac Cc: x86-ml Link: http://lkml.kernel.org/r/1459886686-13977-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Yazen Ghannam [ Simplify code flow, shorten comments. ] Signed-off-by: Borislav Petkov --- arch/x86/kernel/cpu/mcheck/mce-severity.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c index 5119766d9889..631356c8cca4 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -204,6 +204,33 @@ static int error_context(struct mce *m) return IN_KERNEL; } +static int mce_severity_amd_smca(struct mce *m, int err_ctx) +{ + u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank); + u32 low, high; + + /* + * We need to look at the following bits: + * - "succor" bit (data poisoning support), and + * - TCC bit (Task Context Corrupt) + * in MCi_STATUS to determine error severity. + */ + if (!mce_flags.succor) + return MCE_PANIC_SEVERITY; + + if (rdmsr_safe(addr, &low, &high)) + return MCE_PANIC_SEVERITY; + + /* TCC (Task context corrupt). If set and if IN_KERNEL, panic. */ + if ((low & MCI_CONFIG_MCAX) && + (m->status & MCI_STATUS_TCC) && + (err_ctx == IN_KERNEL)) + return MCE_PANIC_SEVERITY; + + /* ...otherwise invoke hwpoison handler. */ + return MCE_AR_SEVERITY; +} + /* * See AMD Error Scope Hierarchy table in a newer BKDG. For example * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features" @@ -225,6 +252,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc * to at least kill process to prolong system operation. */ if (mce_flags.overflow_recov) { + if (mce_flags.smca) + return mce_severity_amd_smca(m, ctx); + /* software can try to contain */ if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == IN_KERNEL)) return MCE_PANIC_SEVERITY; -- 2.7.3