From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [4.9,004/101] x86/mce: Fix incorrect "Machine check from unknown source" message From: Greg Kroah-Hartman Message-Id: <20180701160757.311573596@linuxfoundation.org> Date: Sun, 1 Jul 2018 18:20:50 +0200 To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tony Luck , Borislav Petkov , Thomas Gleixner , Ashok Raj , Dan Williams , Qiuxu Zhuo , linux-edac List-ID: NC45LXN0YWJsZSByZXZpZXcgcGF0Y2guICBJZiBhbnlvbmUgaGFzIGFueSBvYmplY3Rpb25zLCBw bGVhc2UgbGV0IG1lIGtub3cuCgotLS0tLS0tLS0tLS0tLS0tLS0KCkZyb206IFRvbnkgTHVjayA8 dG9ueS5sdWNrQGludGVsLmNvbT4KCmNvbW1pdCA0MGMzNmUyNzQxZDdmZTFlNjZkNmVjNTU0Nzdi YTVmZDE5YzljNWQyIHVwc3RyZWFtLgoKU29tZSBpbmplY3Rpb24gdGVzdGluZyByZXN1bHRlZCBp biB0aGUgZm9sbG93aW5nIGNvbnNvbGUgbG9nOgoKICBtY2U6IFtIYXJkd2FyZSBFcnJvcl06IENQ VSAyMjogTWFjaGluZSBDaGVjayBFeGNlcHRpb246IGYgQmFuayAxOiBiZDgwMDAwMDAwMTAwMTM0 CiAgbWNlOiBbSGFyZHdhcmUgRXJyb3JdOiBSSVAgMTA6PGZmZmZmZmZmYzA1MjkyZGQ+IHtwbWVt X2RvX2J2ZWMrMHgxMWQvMHgzMzAgW25kX3BtZW1dfQogIG1jZTogW0hhcmR3YXJlIEVycm9yXTog VFNDIGM1MWE2MzAzNWQ1MiBBRERSIDMyMzRiYzQwMDAgTUlTQyA4OAogIG1jZTogW0hhcmR3YXJl IEVycm9yXTogUFJPQ0VTU09SIDA6NTA2NTQgVElNRSAxNTI2NTAyMTk5IFNPQ0tFVCAwIEFQSUMg MzggbWljcm9jb2RlIDIwMDAwNDMKICBtY2U6IFtIYXJkd2FyZSBFcnJvcl06IFJ1biB0aGUgYWJv dmUgdGhyb3VnaCAnbWNlbG9nIC0tYXNjaWknCiAgS2VybmVsIHBhbmljIC0gbm90IHN5bmNpbmc6 IE1hY2hpbmUgY2hlY2sgZnJvbSB1bmtub3duIHNvdXJjZQoKVGhpcyBjb25mdXNlZCBldmVyeWJv ZHkgYmVjYXVzZSB0aGUgZmlyc3QgbGluZSBxdWl0ZSBjbGVhcmx5IHNob3dzCnRoYXQgd2UgZm91 bmQgYSBsb2dnZWQgZXJyb3IgaW4gIkJhbmsgMSIsIHdoaWxlIHRoZSBsYXN0IGxpbmUgc2F5cwoi dW5rbm93biBzb3VyY2UiLgoKVGhlIHByb2JsZW0gaXMgdGhhdCB0aGUgTGludXggY29kZSBkb2Vz bid0IGRvIHRoZSByaWdodCB0aGluZwpmb3IgYSBsb2NhbCBtYWNoaW5lIGNoZWNrIHRoYXQgcmVz dWx0cyBpbiBhIGZhdGFsIGVycm9yLgoKSXQgdHVybnMgb3V0IHRoYXQgd2Uga25vdyB2ZXJ5IGVh cmx5IGluIHRoZSBoYW5kbGVyIHdoZXRoZXIgdGhlCm1hY2hpbmUgY2hlY2sgaXMgZmF0YWwuIFRo ZSBjYWxsIHRvIG1jZV9ub193YXlfb3V0KCkgaGFzIGNoZWNrZWQKYWxsIHRoZSBiYW5rcyBmb3Ig dGhlIENQVSB0aGF0IHRvb2sgdGhlIGxvY2FsIG1hY2hpbmUgY2hlY2suIElmCml0IHNheXMgd2Ug bXVzdCBjcmFzaCwgd2UgY2FuIGRvIHNvIHJpZ2h0IGF3YXkgd2l0aCB0aGUgcmlnaHQKbWVzc2Fn ZXMuCgpXZSBkbyBzY2FuIGFsbCB0aGUgYmFua3MgYWdhaW4uIFRoaXMgbWVhbnMgdGhhdCB3ZSBt aWdodCBpbml0aWFsbHkKbm90IHNlZSBhIHByb2JsZW0sIGJ1dCBkdXJpbmcgdGhlIHNlY29uZCBz Y2FuIGZpbmQgc29tZXRoaW5nIGZhdGFsLgpJZiB0aGlzIGhhcHBlbnMgd2UgcHJpbnQgYSBzbGln aHRseSBkaWZmZXJlbnQgbWVzc2FnZSAoc28gSSBjYW4Kc2VlIGlmIGl0IGFjdHVhbGx5IGV2ZXJ5 IGhhcHBlbnMpLgoKWyBicDogUmVtb3ZlIHVubmVlZGVkIHNldmVyaXR5IGFzc2lnbm1lbnQuIF0K ClNpZ25lZC1vZmYtYnk6IFRvbnkgTHVjayA8dG9ueS5sdWNrQGludGVsLmNvbT4KU2lnbmVkLW9m Zi1ieTogQm9yaXNsYXYgUGV0a292IDxicEBzdXNlLmRlPgpTaWduZWQtb2ZmLWJ5OiBUaG9tYXMg R2xlaXhuZXIgPHRnbHhAbGludXRyb25peC5kZT4KQ2M6IEFzaG9rIFJhaiA8YXNob2sucmFqQGlu dGVsLmNvbT4KQ2M6IERhbiBXaWxsaWFtcyA8ZGFuLmoud2lsbGlhbXNAaW50ZWwuY29tPgpDYzog UWl1eHUgWmh1byA8cWl1eHUuemh1b0BpbnRlbC5jb20+CkNjOiBsaW51eC1lZGFjIDxsaW51eC1l ZGFjQHZnZXIua2VybmVsLm9yZz4KQ2M6IHN0YWJsZUB2Z2VyLmtlcm5lbC5vcmcgIyA0LjIKTGlu azogaHR0cDovL2xrbWwua2VybmVsLm9yZy9yLzUyZTA0OWE0OTdlODZmZDBiNzFjNTI5NjUxZGVm ODg3MWM4MDRkZjAuMTUyNzI4Mzg5Ny5naXQudG9ueS5sdWNrQGludGVsLmNvbQpTaWduZWQtb2Zm LWJ5OiBHcmVnIEtyb2FoLUhhcnRtYW4gPGdyZWdraEBsaW51eGZvdW5kYXRpb24ub3JnPgotLS0K IGFyY2gveDg2L2tlcm5lbC9jcHUvbWNoZWNrL21jZS5jIHwgICAyNiArKysrKysrKysrKysrKysr KystLS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDE4IGluc2VydGlvbnMoKyksIDggZGVsZXRpb25z KC0pCgoKCi0tClRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1 bnN1YnNjcmliZSBsaW51eC1lZGFjIiBpbgp0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jk b21vQHZnZXIua2VybmVsLm9yZwpNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5r ZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwKCi0tLSBhL2FyY2gveDg2L2tlcm5lbC9jcHUv bWNoZWNrL21jZS5jCisrKyBiL2FyY2gveDg2L2tlcm5lbC9jcHUvbWNoZWNrL21jZS5jCkBAIC0x MTQwLDEzICsxMTQwLDE4IEBAIHZvaWQgZG9fbWFjaGluZV9jaGVjayhzdHJ1Y3QgcHRfcmVncyAq cmUKIAkJbG1jZSA9IG0ubWNnc3RhdHVzICYgTUNHX1NUQVRVU19MTUNFUzsKIAogCS8qCisJICog TG9jYWwgbWFjaGluZSBjaGVjayBtYXkgYWxyZWFkeSBrbm93IHRoYXQgd2UgaGF2ZSB0byBwYW5p Yy4KKwkgKiBCcm9hZGNhc3QgbWFjaGluZSBjaGVjayBiZWdpbnMgcmVuZGV6dm91cyBpbiBtY2Vf c3RhcnQoKQogCSAqIEdvIHRocm91Z2ggYWxsIGJhbmtzIGluIGV4Y2x1c2lvbiBvZiB0aGUgb3Ro ZXIgQ1BVcy4gVGhpcyB3YXkgd2UKIAkgKiBkb24ndCByZXBvcnQgZHVwbGljYXRlZCBldmVudHMg b24gc2hhcmVkIGJhbmtzIGJlY2F1c2UgdGhlIGZpcnN0IG9uZQotCSAqIHRvIHNlZSBpdCB3aWxs IGNsZWFyIGl0LiBJZiB0aGlzIGlzIGEgTG9jYWwgTUNFLCB0aGVuIG5vIG5lZWQgdG8KLQkgKiBw ZXJmb3JtIHJlbmRlenZvdXMuCisJICogdG8gc2VlIGl0IHdpbGwgY2xlYXIgaXQuCiAJICovCi0J aWYgKCFsbWNlKQorCWlmIChsbWNlKSB7CisJCWlmIChub193YXlfb3V0KQorCQkJbWNlX3Bhbmlj KCJGYXRhbCBsb2NhbCBtYWNoaW5lIGNoZWNrIiwgJm0sIG1zZyk7CisJfSBlbHNlIHsKIAkJb3Jk ZXIgPSBtY2Vfc3RhcnQoJm5vX3dheV9vdXQpOworCX0KIAogCWZvciAoaSA9IDA7IGkgPCBjZmct PmJhbmtzOyBpKyspIHsKIAkJX19jbGVhcl9iaXQoaSwgdG9jbGVhcik7CkBAIC0xMjIyLDEyICsx MjI3LDE3IEBAIHZvaWQgZG9fbWFjaGluZV9jaGVjayhzdHJ1Y3QgcHRfcmVncyAqcmUKIAkJCW5v X3dheV9vdXQgPSB3b3JzdCA+PSBNQ0VfUEFOSUNfU0VWRVJJVFk7CiAJfSBlbHNlIHsKIAkJLyoK LQkJICogTG9jYWwgTUNFIHNraXBwZWQgY2FsbGluZyBtY2VfcmVpZ24oKQotCQkgKiBJZiB3ZSBm b3VuZCBhIGZhdGFsIGVycm9yLCB3ZSBuZWVkIHRvIHBhbmljIGhlcmUuCisJCSAqIElmIHRoZXJl IHdhcyBhIGZhdGFsIG1hY2hpbmUgY2hlY2sgd2Ugc2hvdWxkIGhhdmUKKwkJICogYWxyZWFkeSBj YWxsZWQgbWNlX3BhbmljIGVhcmxpZXIgaW4gdGhpcyBmdW5jdGlvbi4KKwkJICogU2luY2Ugd2Ug cmUtcmVhZCB0aGUgYmFua3MsIHdlIG1pZ2h0IGhhdmUgZm91bmQKKwkJICogc29tZXRoaW5nIG5l dy4gQ2hlY2sgYWdhaW4gdG8gc2VlIGlmIHdlIGZvdW5kIGEKKwkJICogZmF0YWwgZXJyb3IuIFdl IGNhbGwgIm1jZV9zZXZlcml0eSgpIiBhZ2FpbiB0bworCQkgKiBtYWtlIHN1cmUgd2UgaGF2ZSB0 aGUgcmlnaHQgIm1zZyIuCiAJCSAqLwotCQkgaWYgKHdvcnN0ID49IE1DRV9QQU5JQ19TRVZFUklU WSAmJiBtY2FfY2ZnLnRvbGVyYW50IDwgMykKLQkJCW1jZV9wYW5pYygiTWFjaGluZSBjaGVjayBm cm9tIHVua25vd24gc291cmNlIiwKLQkJCQlOVUxMLCBOVUxMKTsKKwkJaWYgKHdvcnN0ID49IE1D RV9QQU5JQ19TRVZFUklUWSAmJiBtY2FfY2ZnLnRvbGVyYW50IDwgMykgeworCQkJbWNlX3NldmVy aXR5KCZtLCBjZmctPnRvbGVyYW50LCAmbXNnLCB0cnVlKTsKKwkJCW1jZV9wYW5pYygiTG9jYWwg ZmF0YWwgbWFjaGluZSBjaGVjayEiLCAmbSwgbXNnKTsKKwkJfQogCX0KIAogCS8qCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82B30C6778C for ; Sun, 1 Jul 2018 16:25:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 432A22518E for ; Sun, 1 Jul 2018 16:25:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 432A22518E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933788AbeGAQZ4 (ORCPT ); Sun, 1 Jul 2018 12:25:56 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:33738 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965255AbeGAQZv (ORCPT ); Sun, 1 Jul 2018 12:25:51 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 443D3AD8; Sun, 1 Jul 2018 16:25:50 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tony Luck , Borislav Petkov , Thomas Gleixner , Ashok Raj , Dan Williams , Qiuxu Zhuo , linux-edac Subject: [PATCH 4.9 004/101] x86/mce: Fix incorrect "Machine check from unknown source" message Date: Sun, 1 Jul 2018 18:20:50 +0200 Message-Id: <20180701160757.311573596@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180701160757.138608453@linuxfoundation.org> References: <20180701160757.138608453@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Tony Luck commit 40c36e2741d7fe1e66d6ec55477ba5fd19c9c5d2 upstream. Some injection testing resulted in the following console log: mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134 mce: [Hardware Error]: RIP 10: {pmem_do_bvec+0x11d/0x330 [nd_pmem]} mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88 mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043 mce: [Hardware Error]: Run the above through 'mcelog --ascii' Kernel panic - not syncing: Machine check from unknown source This confused everybody because the first line quite clearly shows that we found a logged error in "Bank 1", while the last line says "unknown source". The problem is that the Linux code doesn't do the right thing for a local machine check that results in a fatal error. It turns out that we know very early in the handler whether the machine check is fatal. The call to mce_no_way_out() has checked all the banks for the CPU that took the local machine check. If it says we must crash, we can do so right away with the right messages. We do scan all the banks again. This means that we might initially not see a problem, but during the second scan find something fatal. If this happens we print a slightly different message (so I can see if it actually every happens). [ bp: Remove unneeded severity assignment. ] Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Ashok Raj Cc: Dan Williams Cc: Qiuxu Zhuo Cc: linux-edac Cc: stable@vger.kernel.org # 4.2 Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1140,13 +1140,18 @@ void do_machine_check(struct pt_regs *re lmce = m.mcgstatus & MCG_STATUS_LMCES; /* + * Local machine check may already know that we have to panic. + * Broadcast machine check begins rendezvous in mce_start() * Go through all banks in exclusion of the other CPUs. This way we * don't report duplicated events on shared banks because the first one - * to see it will clear it. If this is a Local MCE, then no need to - * perform rendezvous. + * to see it will clear it. */ - if (!lmce) + if (lmce) { + if (no_way_out) + mce_panic("Fatal local machine check", &m, msg); + } else { order = mce_start(&no_way_out); + } for (i = 0; i < cfg->banks; i++) { __clear_bit(i, toclear); @@ -1222,12 +1227,17 @@ void do_machine_check(struct pt_regs *re no_way_out = worst >= MCE_PANIC_SEVERITY; } else { /* - * Local MCE skipped calling mce_reign() - * If we found a fatal error, we need to panic here. + * If there was a fatal machine check we should have + * already called mce_panic earlier in this function. + * Since we re-read the banks, we might have found + * something new. Check again to see if we found a + * fatal error. We call "mce_severity()" again to + * make sure we have the right "msg". */ - if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) - mce_panic("Machine check from unknown source", - NULL, NULL); + if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { + mce_severity(&m, cfg->tolerant, &msg, true); + mce_panic("Local fatal machine check!", &m, msg); + } } /*