From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97779C433F5 for ; Fri, 29 Oct 2021 20:58:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7532360FC1 for ; Fri, 29 Oct 2021 20:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231569AbhJ2VAf (ORCPT ); Fri, 29 Oct 2021 17:00:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231484AbhJ2VAe (ORCPT ); Fri, 29 Oct 2021 17:00:34 -0400 Received: from scorn.kernelslacker.org (scorn.kernelslacker.org [IPv6:2600:3c03:e000:2fb::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 793F2C061570 for ; Fri, 29 Oct 2021 13:58:05 -0700 (PDT) Received: from [2601:196:4600:6634:ae9e:17ff:feb7:72ca] (helo=wopr.kernelslacker.org) by scorn.kernelslacker.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mgYwq-006YvE-2T; Fri, 29 Oct 2021 16:58:00 -0400 Received: by wopr.kernelslacker.org (Postfix, from userid 1026) id 50F65560090; Fri, 29 Oct 2021 16:57:59 -0400 (EDT) Date: Fri, 29 Oct 2021 16:57:59 -0400 From: Dave Jones To: Tony Luck Cc: Linux Kernel Subject: mce: Add errata workaround for Skylake SKX37 Message-ID: <20211029205759.GA7385@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Tony Luck , Linux Kernel MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Errata SKX37 is word-for-word identical to the other errata listed in this workaround. I happened to notice this after investigating a CMCI storm on a Skylake host. While I can't confirm this was the root cause, spurious corrected errors does sound like a likely suspect. Signed-off-by: Dave Jones diff --git arch/x86/kernel/cpu/mce/intel.c arch/x86/kernel/cpu/mce/intel.c index acfd5d9f93c6..bb9a46a804bf 100644 --- arch/x86/kernel/cpu/mce/intel.c +++ arch/x86/kernel/cpu/mce/intel.c @@ -547,12 +547,13 @@ bool intel_filter_mce(struct mce *m) { struct cpuinfo_x86 *c = &boot_cpu_data; - /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */ + /* MCE errata HSD131, HSM142, HSW131, BDM48, HSM142 and SKX37 */ if ((c->x86 == 6) && ((c->x86_model == INTEL_FAM6_HASWELL) || (c->x86_model == INTEL_FAM6_HASWELL_L) || (c->x86_model == INTEL_FAM6_BROADWELL) || - (c->x86_model == INTEL_FAM6_HASWELL_G)) && + (c->x86_model == INTEL_FAM6_HASWELL_G) || + (c->x86_model == INTEL_FAM6_SKYLAKE_X)) && (m->bank == 0) && ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005)) return true;