From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [4.17,008/220] x86/mce: Fix incorrect "Machine check from unknown source" message From: Greg Kroah-Hartman Message-Id: <20180701160908.665706753@linuxfoundation.org> Date: Sun, 1 Jul 2018 18:20:32 +0200 To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tony Luck , Borislav Petkov , Thomas Gleixner , Ashok Raj , Dan Williams , Qiuxu Zhuo , linux-edac List-ID: NC4xNy1zdGFibGUgcmV2aWV3IHBhdGNoLiAgSWYgYW55b25lIGhhcyBhbnkgb2JqZWN0aW9ucywg cGxlYXNlIGxldCBtZSBrbm93LgoKLS0tLS0tLS0tLS0tLS0tLS0tCgpGcm9tOiBUb255IEx1Y2sg PHRvbnkubHVja0BpbnRlbC5jb20+Cgpjb21taXQgNDBjMzZlMjc0MWQ3ZmUxZTY2ZDZlYzU1NDc3 YmE1ZmQxOWM5YzVkMiB1cHN0cmVhbS4KClNvbWUgaW5qZWN0aW9uIHRlc3RpbmcgcmVzdWx0ZWQg aW4gdGhlIGZvbGxvd2luZyBjb25zb2xlIGxvZzoKCiAgbWNlOiBbSGFyZHdhcmUgRXJyb3JdOiBD UFUgMjI6IE1hY2hpbmUgQ2hlY2sgRXhjZXB0aW9uOiBmIEJhbmsgMTogYmQ4MDAwMDAwMDEwMDEz NAogIG1jZTogW0hhcmR3YXJlIEVycm9yXTogUklQIDEwOjxmZmZmZmZmZmMwNTI5MmRkPiB7cG1l bV9kb19idmVjKzB4MTFkLzB4MzMwIFtuZF9wbWVtXX0KICBtY2U6IFtIYXJkd2FyZSBFcnJvcl06 IFRTQyBjNTFhNjMwMzVkNTIgQUREUiAzMjM0YmM0MDAwIE1JU0MgODgKICBtY2U6IFtIYXJkd2Fy ZSBFcnJvcl06IFBST0NFU1NPUiAwOjUwNjU0IFRJTUUgMTUyNjUwMjE5OSBTT0NLRVQgMCBBUElD IDM4IG1pY3JvY29kZSAyMDAwMDQzCiAgbWNlOiBbSGFyZHdhcmUgRXJyb3JdOiBSdW4gdGhlIGFi b3ZlIHRocm91Z2ggJ21jZWxvZyAtLWFzY2lpJwogIEtlcm5lbCBwYW5pYyAtIG5vdCBzeW5jaW5n OiBNYWNoaW5lIGNoZWNrIGZyb20gdW5rbm93biBzb3VyY2UKClRoaXMgY29uZnVzZWQgZXZlcnli b2R5IGJlY2F1c2UgdGhlIGZpcnN0IGxpbmUgcXVpdGUgY2xlYXJseSBzaG93cwp0aGF0IHdlIGZv dW5kIGEgbG9nZ2VkIGVycm9yIGluICJCYW5rIDEiLCB3aGlsZSB0aGUgbGFzdCBsaW5lIHNheXMK InVua25vd24gc291cmNlIi4KClRoZSBwcm9ibGVtIGlzIHRoYXQgdGhlIExpbnV4IGNvZGUgZG9l c24ndCBkbyB0aGUgcmlnaHQgdGhpbmcKZm9yIGEgbG9jYWwgbWFjaGluZSBjaGVjayB0aGF0IHJl c3VsdHMgaW4gYSBmYXRhbCBlcnJvci4KCkl0IHR1cm5zIG91dCB0aGF0IHdlIGtub3cgdmVyeSBl YXJseSBpbiB0aGUgaGFuZGxlciB3aGV0aGVyIHRoZQptYWNoaW5lIGNoZWNrIGlzIGZhdGFsLiBU aGUgY2FsbCB0byBtY2Vfbm9fd2F5X291dCgpIGhhcyBjaGVja2VkCmFsbCB0aGUgYmFua3MgZm9y IHRoZSBDUFUgdGhhdCB0b29rIHRoZSBsb2NhbCBtYWNoaW5lIGNoZWNrLiBJZgppdCBzYXlzIHdl IG11c3QgY3Jhc2gsIHdlIGNhbiBkbyBzbyByaWdodCBhd2F5IHdpdGggdGhlIHJpZ2h0Cm1lc3Nh Z2VzLgoKV2UgZG8gc2NhbiBhbGwgdGhlIGJhbmtzIGFnYWluLiBUaGlzIG1lYW5zIHRoYXQgd2Ug bWlnaHQgaW5pdGlhbGx5Cm5vdCBzZWUgYSBwcm9ibGVtLCBidXQgZHVyaW5nIHRoZSBzZWNvbmQg c2NhbiBmaW5kIHNvbWV0aGluZyBmYXRhbC4KSWYgdGhpcyBoYXBwZW5zIHdlIHByaW50IGEgc2xp Z2h0bHkgZGlmZmVyZW50IG1lc3NhZ2UgKHNvIEkgY2FuCnNlZSBpZiBpdCBhY3R1YWxseSBldmVy eSBoYXBwZW5zKS4KClsgYnA6IFJlbW92ZSB1bm5lZWRlZCBzZXZlcml0eSBhc3NpZ25tZW50LiBd CgpTaWduZWQtb2ZmLWJ5OiBUb255IEx1Y2sgPHRvbnkubHVja0BpbnRlbC5jb20+ClNpZ25lZC1v ZmYtYnk6IEJvcmlzbGF2IFBldGtvdiA8YnBAc3VzZS5kZT4KU2lnbmVkLW9mZi1ieTogVGhvbWFz IEdsZWl4bmVyIDx0Z2x4QGxpbnV0cm9uaXguZGU+CkNjOiBBc2hvayBSYWogPGFzaG9rLnJhakBp bnRlbC5jb20+CkNjOiBEYW4gV2lsbGlhbXMgPGRhbi5qLndpbGxpYW1zQGludGVsLmNvbT4KQ2M6 IFFpdXh1IFpodW8gPHFpdXh1LnpodW9AaW50ZWwuY29tPgpDYzogbGludXgtZWRhYyA8bGludXgt ZWRhY0B2Z2VyLmtlcm5lbC5vcmc+CkNjOiBzdGFibGVAdmdlci5rZXJuZWwub3JnICMgNC4yCkxp bms6IGh0dHA6Ly9sa21sLmtlcm5lbC5vcmcvci81MmUwNDlhNDk3ZTg2ZmQwYjcxYzUyOTY1MWRl Zjg4NzFjODA0ZGYwLjE1MjcyODM4OTcuZ2l0LnRvbnkubHVja0BpbnRlbC5jb20KU2lnbmVkLW9m Zi1ieTogR3JlZyBLcm9haC1IYXJ0bWFuIDxncmVna2hAbGludXhmb3VuZGF0aW9uLm9yZz4KLS0t CiBhcmNoL3g4Ni9rZXJuZWwvY3B1L21jaGVjay9tY2UuYyB8ICAgMjYgKysrKysrKysrKysrKysr KysrLS0tLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAxOCBpbnNlcnRpb25zKCspLCA4IGRlbGV0aW9u cygtKQoKCgotLQpUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAi dW5zdWJzY3JpYmUgbGludXgtZWRhYyIgaW4KdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9y ZG9tb0B2Z2VyLmtlcm5lbC5vcmcKTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIu a2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sCgotLS0gYS9hcmNoL3g4Ni9rZXJuZWwvY3B1 L21jaGVjay9tY2UuYworKysgYi9hcmNoL3g4Ni9rZXJuZWwvY3B1L21jaGVjay9tY2UuYwpAQCAt MTIwNSwxMyArMTIwNSwxOCBAQCB2b2lkIGRvX21hY2hpbmVfY2hlY2soc3RydWN0IHB0X3JlZ3Mg KnJlCiAJCWxtY2UgPSBtLm1jZ3N0YXR1cyAmIE1DR19TVEFUVVNfTE1DRVM7CiAKIAkvKgorCSAq IExvY2FsIG1hY2hpbmUgY2hlY2sgbWF5IGFscmVhZHkga25vdyB0aGF0IHdlIGhhdmUgdG8gcGFu aWMuCisJICogQnJvYWRjYXN0IG1hY2hpbmUgY2hlY2sgYmVnaW5zIHJlbmRlenZvdXMgaW4gbWNl X3N0YXJ0KCkKIAkgKiBHbyB0aHJvdWdoIGFsbCBiYW5rcyBpbiBleGNsdXNpb24gb2YgdGhlIG90 aGVyIENQVXMuIFRoaXMgd2F5IHdlCiAJICogZG9uJ3QgcmVwb3J0IGR1cGxpY2F0ZWQgZXZlbnRz IG9uIHNoYXJlZCBiYW5rcyBiZWNhdXNlIHRoZSBmaXJzdCBvbmUKLQkgKiB0byBzZWUgaXQgd2ls bCBjbGVhciBpdC4gSWYgdGhpcyBpcyBhIExvY2FsIE1DRSwgdGhlbiBubyBuZWVkIHRvCi0JICog cGVyZm9ybSByZW5kZXp2b3VzLgorCSAqIHRvIHNlZSBpdCB3aWxsIGNsZWFyIGl0LgogCSAqLwot CWlmICghbG1jZSkKKwlpZiAobG1jZSkgeworCQlpZiAobm9fd2F5X291dCkKKwkJCW1jZV9wYW5p YygiRmF0YWwgbG9jYWwgbWFjaGluZSBjaGVjayIsICZtLCBtc2cpOworCX0gZWxzZSB7CiAJCW9y ZGVyID0gbWNlX3N0YXJ0KCZub193YXlfb3V0KTsKKwl9CiAKIAlmb3IgKGkgPSAwOyBpIDwgY2Zn LT5iYW5rczsgaSsrKSB7CiAJCV9fY2xlYXJfYml0KGksIHRvY2xlYXIpOwpAQCAtMTI4NywxMiAr MTI5MiwxNyBAQCB2b2lkIGRvX21hY2hpbmVfY2hlY2soc3RydWN0IHB0X3JlZ3MgKnJlCiAJCQlu b193YXlfb3V0ID0gd29yc3QgPj0gTUNFX1BBTklDX1NFVkVSSVRZOwogCX0gZWxzZSB7CiAJCS8q Ci0JCSAqIExvY2FsIE1DRSBza2lwcGVkIGNhbGxpbmcgbWNlX3JlaWduKCkKLQkJICogSWYgd2Ug Zm91bmQgYSBmYXRhbCBlcnJvciwgd2UgbmVlZCB0byBwYW5pYyBoZXJlLgorCQkgKiBJZiB0aGVy ZSB3YXMgYSBmYXRhbCBtYWNoaW5lIGNoZWNrIHdlIHNob3VsZCBoYXZlCisJCSAqIGFscmVhZHkg Y2FsbGVkIG1jZV9wYW5pYyBlYXJsaWVyIGluIHRoaXMgZnVuY3Rpb24uCisJCSAqIFNpbmNlIHdl IHJlLXJlYWQgdGhlIGJhbmtzLCB3ZSBtaWdodCBoYXZlIGZvdW5kCisJCSAqIHNvbWV0aGluZyBu ZXcuIENoZWNrIGFnYWluIHRvIHNlZSBpZiB3ZSBmb3VuZCBhCisJCSAqIGZhdGFsIGVycm9yLiBX ZSBjYWxsICJtY2Vfc2V2ZXJpdHkoKSIgYWdhaW4gdG8KKwkJICogbWFrZSBzdXJlIHdlIGhhdmUg dGhlIHJpZ2h0ICJtc2ciLgogCQkgKi8KLQkJIGlmICh3b3JzdCA+PSBNQ0VfUEFOSUNfU0VWRVJJ VFkgJiYgbWNhX2NmZy50b2xlcmFudCA8IDMpCi0JCQltY2VfcGFuaWMoIk1hY2hpbmUgY2hlY2sg ZnJvbSB1bmtub3duIHNvdXJjZSIsCi0JCQkJTlVMTCwgTlVMTCk7CisJCWlmICh3b3JzdCA+PSBN Q0VfUEFOSUNfU0VWRVJJVFkgJiYgbWNhX2NmZy50b2xlcmFudCA8IDMpIHsKKwkJCW1jZV9zZXZl cml0eSgmbSwgY2ZnLT50b2xlcmFudCwgJm1zZywgdHJ1ZSk7CisJCQltY2VfcGFuaWMoIkxvY2Fs IGZhdGFsIG1hY2hpbmUgY2hlY2shIiwgJm0sIG1zZyk7CisJCX0KIAl9CiAKIAkvKgo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E670C6778A for ; Sun, 1 Jul 2018 17:22:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C518E25650 for ; Sun, 1 Jul 2018 17:22:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C518E25650 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965993AbeGARWG (ORCPT ); Sun, 1 Jul 2018 13:22:06 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:36658 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031113AbeGAQiL (ORCPT ); Sun, 1 Jul 2018 12:38:11 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 01EE0AA6; Sun, 1 Jul 2018 16:38:10 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tony Luck , Borislav Petkov , Thomas Gleixner , Ashok Raj , Dan Williams , Qiuxu Zhuo , linux-edac Subject: [PATCH 4.17 008/220] x86/mce: Fix incorrect "Machine check from unknown source" message Date: Sun, 1 Jul 2018 18:20:32 +0200 Message-Id: <20180701160908.665706753@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180701160908.272447118@linuxfoundation.org> References: <20180701160908.272447118@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.17-stable review patch. If anyone has any objections, please let me know. ------------------ From: Tony Luck commit 40c36e2741d7fe1e66d6ec55477ba5fd19c9c5d2 upstream. Some injection testing resulted in the following console log: mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134 mce: [Hardware Error]: RIP 10: {pmem_do_bvec+0x11d/0x330 [nd_pmem]} mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88 mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043 mce: [Hardware Error]: Run the above through 'mcelog --ascii' Kernel panic - not syncing: Machine check from unknown source This confused everybody because the first line quite clearly shows that we found a logged error in "Bank 1", while the last line says "unknown source". The problem is that the Linux code doesn't do the right thing for a local machine check that results in a fatal error. It turns out that we know very early in the handler whether the machine check is fatal. The call to mce_no_way_out() has checked all the banks for the CPU that took the local machine check. If it says we must crash, we can do so right away with the right messages. We do scan all the banks again. This means that we might initially not see a problem, but during the second scan find something fatal. If this happens we print a slightly different message (so I can see if it actually every happens). [ bp: Remove unneeded severity assignment. ] Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Ashok Raj Cc: Dan Williams Cc: Qiuxu Zhuo Cc: linux-edac Cc: stable@vger.kernel.org # 4.2 Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1205,13 +1205,18 @@ void do_machine_check(struct pt_regs *re lmce = m.mcgstatus & MCG_STATUS_LMCES; /* + * Local machine check may already know that we have to panic. + * Broadcast machine check begins rendezvous in mce_start() * Go through all banks in exclusion of the other CPUs. This way we * don't report duplicated events on shared banks because the first one - * to see it will clear it. If this is a Local MCE, then no need to - * perform rendezvous. + * to see it will clear it. */ - if (!lmce) + if (lmce) { + if (no_way_out) + mce_panic("Fatal local machine check", &m, msg); + } else { order = mce_start(&no_way_out); + } for (i = 0; i < cfg->banks; i++) { __clear_bit(i, toclear); @@ -1287,12 +1292,17 @@ void do_machine_check(struct pt_regs *re no_way_out = worst >= MCE_PANIC_SEVERITY; } else { /* - * Local MCE skipped calling mce_reign() - * If we found a fatal error, we need to panic here. + * If there was a fatal machine check we should have + * already called mce_panic earlier in this function. + * Since we re-read the banks, we might have found + * something new. Check again to see if we found a + * fatal error. We call "mce_severity()" again to + * make sure we have the right "msg". */ - if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) - mce_panic("Machine check from unknown source", - NULL, NULL); + if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { + mce_severity(&m, cfg->tolerant, &msg, true); + mce_panic("Local fatal machine check!", &m, msg); + } } /*