From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: FAILED: patch "[PATCH] x86/mce: Fix incorrect "Machine check from unknown source"" failed to apply to 4.4-stable tree From: Greg Kroah-Hartman Message-Id: <1530151642162195@kroah.com> Date: Thu, 28 Jun 2018 11:07:22 +0900 To: tony.luck@intel.com, ashok.raj@intel.com, bp@suse.de, dan.j.williams@intel.com, linux-edac@vger.kernel.org, qiuxu.zhuo@intel.com, tglx@linutronix.de Cc: stable@vger.kernel.org List-ID: VGhlIHBhdGNoIGJlbG93IGRvZXMgbm90IGFwcGx5IHRvIHRoZSA0LjQtc3RhYmxlIHRyZWUuCklm IHNvbWVvbmUgd2FudHMgaXQgYXBwbGllZCB0aGVyZSwgb3IgdG8gYW55IG90aGVyIHN0YWJsZSBv ciBsb25ndGVybQp0cmVlLCB0aGVuIHBsZWFzZSBlbWFpbCB0aGUgYmFja3BvcnQsIGluY2x1ZGlu ZyB0aGUgb3JpZ2luYWwgZ2l0IGNvbW1pdAppZCB0byA8c3RhYmxlQHZnZXIua2VybmVsLm9yZz4u Cgp0aGFua3MsCgpncmVnIGstaAoKLS0tLS0tLS0tLS0tLS0tLS0tIG9yaWdpbmFsIGNvbW1pdCBp biBMaW51cydzIHRyZWUgLS0tLS0tLS0tLS0tLS0tLS0tCgpGcm9tIDQwYzM2ZTI3NDFkN2ZlMWU2 NmQ2ZWM1NTQ3N2JhNWZkMTljOWM1ZDIgTW9uIFNlcCAxNyAwMDowMDowMCAyMDAxCkZyb206IFRv bnkgTHVjayA8dG9ueS5sdWNrQGludGVsLmNvbT4KRGF0ZTogRnJpLCAyMiBKdW4gMjAxOCAxMTo1 NDoyMyArMDIwMApTdWJqZWN0OiBbUEFUQ0hdIHg4Ni9tY2U6IEZpeCBpbmNvcnJlY3QgIk1hY2hp bmUgY2hlY2sgZnJvbSB1bmtub3duIHNvdXJjZSIKIG1lc3NhZ2UKClNvbWUgaW5qZWN0aW9uIHRl c3RpbmcgcmVzdWx0ZWQgaW4gdGhlIGZvbGxvd2luZyBjb25zb2xlIGxvZzoKCiAgbWNlOiBbSGFy ZHdhcmUgRXJyb3JdOiBDUFUgMjI6IE1hY2hpbmUgQ2hlY2sgRXhjZXB0aW9uOiBmIEJhbmsgMTog YmQ4MDAwMDAwMDEwMDEzNAogIG1jZTogW0hhcmR3YXJlIEVycm9yXTogUklQIDEwOjxmZmZmZmZm ZmMwNTI5MmRkPiB7cG1lbV9kb19idmVjKzB4MTFkLzB4MzMwIFtuZF9wbWVtXX0KICBtY2U6IFtI YXJkd2FyZSBFcnJvcl06IFRTQyBjNTFhNjMwMzVkNTIgQUREUiAzMjM0YmM0MDAwIE1JU0MgODgK ICBtY2U6IFtIYXJkd2FyZSBFcnJvcl06IFBST0NFU1NPUiAwOjUwNjU0IFRJTUUgMTUyNjUwMjE5 OSBTT0NLRVQgMCBBUElDIDM4IG1pY3JvY29kZSAyMDAwMDQzCiAgbWNlOiBbSGFyZHdhcmUgRXJy b3JdOiBSdW4gdGhlIGFib3ZlIHRocm91Z2ggJ21jZWxvZyAtLWFzY2lpJwogIEtlcm5lbCBwYW5p YyAtIG5vdCBzeW5jaW5nOiBNYWNoaW5lIGNoZWNrIGZyb20gdW5rbm93biBzb3VyY2UKClRoaXMg Y29uZnVzZWQgZXZlcnlib2R5IGJlY2F1c2UgdGhlIGZpcnN0IGxpbmUgcXVpdGUgY2xlYXJseSBz aG93cwp0aGF0IHdlIGZvdW5kIGEgbG9nZ2VkIGVycm9yIGluICJCYW5rIDEiLCB3aGlsZSB0aGUg bGFzdCBsaW5lIHNheXMKInVua25vd24gc291cmNlIi4KClRoZSBwcm9ibGVtIGlzIHRoYXQgdGhl IExpbnV4IGNvZGUgZG9lc24ndCBkbyB0aGUgcmlnaHQgdGhpbmcKZm9yIGEgbG9jYWwgbWFjaGlu ZSBjaGVjayB0aGF0IHJlc3VsdHMgaW4gYSBmYXRhbCBlcnJvci4KCkl0IHR1cm5zIG91dCB0aGF0 IHdlIGtub3cgdmVyeSBlYXJseSBpbiB0aGUgaGFuZGxlciB3aGV0aGVyIHRoZQptYWNoaW5lIGNo ZWNrIGlzIGZhdGFsLiBUaGUgY2FsbCB0byBtY2Vfbm9fd2F5X291dCgpIGhhcyBjaGVja2VkCmFs bCB0aGUgYmFua3MgZm9yIHRoZSBDUFUgdGhhdCB0b29rIHRoZSBsb2NhbCBtYWNoaW5lIGNoZWNr LiBJZgppdCBzYXlzIHdlIG11c3QgY3Jhc2gsIHdlIGNhbiBkbyBzbyByaWdodCBhd2F5IHdpdGgg dGhlIHJpZ2h0Cm1lc3NhZ2VzLgoKV2UgZG8gc2NhbiBhbGwgdGhlIGJhbmtzIGFnYWluLiBUaGlz IG1lYW5zIHRoYXQgd2UgbWlnaHQgaW5pdGlhbGx5Cm5vdCBzZWUgYSBwcm9ibGVtLCBidXQgZHVy aW5nIHRoZSBzZWNvbmQgc2NhbiBmaW5kIHNvbWV0aGluZyBmYXRhbC4KSWYgdGhpcyBoYXBwZW5z IHdlIHByaW50IGEgc2xpZ2h0bHkgZGlmZmVyZW50IG1lc3NhZ2UgKHNvIEkgY2FuCnNlZSBpZiBp dCBhY3R1YWxseSBldmVyeSBoYXBwZW5zKS4KClsgYnA6IFJlbW92ZSB1bm5lZWRlZCBzZXZlcml0 eSBhc3NpZ25tZW50LiBdCgpTaWduZWQtb2ZmLWJ5OiBUb255IEx1Y2sgPHRvbnkubHVja0BpbnRl bC5jb20+ClNpZ25lZC1vZmYtYnk6IEJvcmlzbGF2IFBldGtvdiA8YnBAc3VzZS5kZT4KU2lnbmVk LW9mZi1ieTogVGhvbWFzIEdsZWl4bmVyIDx0Z2x4QGxpbnV0cm9uaXguZGU+CkNjOiBBc2hvayBS YWogPGFzaG9rLnJhakBpbnRlbC5jb20+CkNjOiBEYW4gV2lsbGlhbXMgPGRhbi5qLndpbGxpYW1z QGludGVsLmNvbT4KQ2M6IFFpdXh1IFpodW8gPHFpdXh1LnpodW9AaW50ZWwuY29tPgpDYzogbGlu dXgtZWRhYyA8bGludXgtZWRhY0B2Z2VyLmtlcm5lbC5vcmc+CkNjOiBzdGFibGVAdmdlci5rZXJu ZWwub3JnICMgNC4yCkxpbms6IGh0dHA6Ly9sa21sLmtlcm5lbC5vcmcvci81MmUwNDlhNDk3ZTg2 ZmQwYjcxYzUyOTY1MWRlZjg4NzFjODA0ZGYwLjE1MjcyODM4OTcuZ2l0LnRvbnkubHVja0BpbnRl bC5jb20KLS0tClRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1 bnN1YnNjcmliZSBsaW51eC1lZGFjIiBpbgp0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jk b21vQHZnZXIua2VybmVsLm9yZwpNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5r ZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwKCmRpZmYgLS1naXQgYS9hcmNoL3g4Ni9rZXJu ZWwvY3B1L21jaGVjay9tY2UuYyBiL2FyY2gveDg2L2tlcm5lbC9jcHUvbWNoZWNrL21jZS5jCmlu ZGV4IDdlNmY1MWE5ZDkxNy4uZTkzNjcwZDczNmE2IDEwMDY0NAotLS0gYS9hcmNoL3g4Ni9rZXJu ZWwvY3B1L21jaGVjay9tY2UuYworKysgYi9hcmNoL3g4Ni9rZXJuZWwvY3B1L21jaGVjay9tY2Uu YwpAQCAtMTIwNywxMyArMTIwNywxOCBAQCB2b2lkIGRvX21hY2hpbmVfY2hlY2soc3RydWN0IHB0 X3JlZ3MgKnJlZ3MsIGxvbmcgZXJyb3JfY29kZSkKIAkJbG1jZSA9IG0ubWNnc3RhdHVzICYgTUNH X1NUQVRVU19MTUNFUzsKIAogCS8qCisJICogTG9jYWwgbWFjaGluZSBjaGVjayBtYXkgYWxyZWFk eSBrbm93IHRoYXQgd2UgaGF2ZSB0byBwYW5pYy4KKwkgKiBCcm9hZGNhc3QgbWFjaGluZSBjaGVj ayBiZWdpbnMgcmVuZGV6dm91cyBpbiBtY2Vfc3RhcnQoKQogCSAqIEdvIHRocm91Z2ggYWxsIGJh bmtzIGluIGV4Y2x1c2lvbiBvZiB0aGUgb3RoZXIgQ1BVcy4gVGhpcyB3YXkgd2UKIAkgKiBkb24n dCByZXBvcnQgZHVwbGljYXRlZCBldmVudHMgb24gc2hhcmVkIGJhbmtzIGJlY2F1c2UgdGhlIGZp cnN0IG9uZQotCSAqIHRvIHNlZSBpdCB3aWxsIGNsZWFyIGl0LiBJZiB0aGlzIGlzIGEgTG9jYWwg TUNFLCB0aGVuIG5vIG5lZWQgdG8KLQkgKiBwZXJmb3JtIHJlbmRlenZvdXMuCisJICogdG8gc2Vl IGl0IHdpbGwgY2xlYXIgaXQuCiAJICovCi0JaWYgKCFsbWNlKQorCWlmIChsbWNlKSB7CisJCWlm IChub193YXlfb3V0KQorCQkJbWNlX3BhbmljKCJGYXRhbCBsb2NhbCBtYWNoaW5lIGNoZWNrIiwg Jm0sIG1zZyk7CisJfSBlbHNlIHsKIAkJb3JkZXIgPSBtY2Vfc3RhcnQoJm5vX3dheV9vdXQpOwor CX0KIAogCWZvciAoaSA9IDA7IGkgPCBjZmctPmJhbmtzOyBpKyspIHsKIAkJX19jbGVhcl9iaXQo aSwgdG9jbGVhcik7CkBAIC0xMjg5LDEyICsxMjk0LDE3IEBAIHZvaWQgZG9fbWFjaGluZV9jaGVj ayhzdHJ1Y3QgcHRfcmVncyAqcmVncywgbG9uZyBlcnJvcl9jb2RlKQogCQkJbm9fd2F5X291dCA9 IHdvcnN0ID49IE1DRV9QQU5JQ19TRVZFUklUWTsKIAl9IGVsc2UgewogCQkvKgotCQkgKiBMb2Nh bCBNQ0Ugc2tpcHBlZCBjYWxsaW5nIG1jZV9yZWlnbigpCi0JCSAqIElmIHdlIGZvdW5kIGEgZmF0 YWwgZXJyb3IsIHdlIG5lZWQgdG8gcGFuaWMgaGVyZS4KKwkJICogSWYgdGhlcmUgd2FzIGEgZmF0 YWwgbWFjaGluZSBjaGVjayB3ZSBzaG91bGQgaGF2ZQorCQkgKiBhbHJlYWR5IGNhbGxlZCBtY2Vf cGFuaWMgZWFybGllciBpbiB0aGlzIGZ1bmN0aW9uLgorCQkgKiBTaW5jZSB3ZSByZS1yZWFkIHRo ZSBiYW5rcywgd2UgbWlnaHQgaGF2ZSBmb3VuZAorCQkgKiBzb21ldGhpbmcgbmV3LiBDaGVjayBh Z2FpbiB0byBzZWUgaWYgd2UgZm91bmQgYQorCQkgKiBmYXRhbCBlcnJvci4gV2UgY2FsbCAibWNl X3NldmVyaXR5KCkiIGFnYWluIHRvCisJCSAqIG1ha2Ugc3VyZSB3ZSBoYXZlIHRoZSByaWdodCAi bXNnIi4KIAkJICovCi0JCSBpZiAod29yc3QgPj0gTUNFX1BBTklDX1NFVkVSSVRZICYmIG1jYV9j ZmcudG9sZXJhbnQgPCAzKQotCQkJbWNlX3BhbmljKCJNYWNoaW5lIGNoZWNrIGZyb20gdW5rbm93 biBzb3VyY2UiLAotCQkJCU5VTEwsIE5VTEwpOworCQlpZiAod29yc3QgPj0gTUNFX1BBTklDX1NF VkVSSVRZICYmIG1jYV9jZmcudG9sZXJhbnQgPCAzKSB7CisJCQltY2Vfc2V2ZXJpdHkoJm0sIGNm Zy0+dG9sZXJhbnQsICZtc2csIHRydWUpOworCQkJbWNlX3BhbmljKCJMb2NhbCBmYXRhbCBtYWNo aW5lIGNoZWNrISIsICZtLCBtc2cpOworCQl9CiAJfQogCiAJLyoK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1-smtp.messagingengine.com ([66.111.4.25]:49843 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932146AbeF1CH0 (ORCPT ); Wed, 27 Jun 2018 22:07:26 -0400 Subject: FAILED: patch "[PATCH] x86/mce: Fix incorrect "Machine check from unknown source"" failed to apply to 4.4-stable tree To: tony.luck@intel.com, ashok.raj@intel.com, bp@suse.de, dan.j.williams@intel.com, linux-edac@vger.kernel.org, qiuxu.zhuo@intel.com, tglx@linutronix.de Cc: From: Date: Thu, 28 Jun 2018 11:07:22 +0900 Message-ID: <1530151642162195@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to . thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >>From 40c36e2741d7fe1e66d6ec55477ba5fd19c9c5d2 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Fri, 22 Jun 2018 11:54:23 +0200 Subject: [PATCH] x86/mce: Fix incorrect "Machine check from unknown source" message Some injection testing resulted in the following console log: mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134 mce: [Hardware Error]: RIP 10: {pmem_do_bvec+0x11d/0x330 [nd_pmem]} mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88 mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043 mce: [Hardware Error]: Run the above through 'mcelog --ascii' Kernel panic - not syncing: Machine check from unknown source This confused everybody because the first line quite clearly shows that we found a logged error in "Bank 1", while the last line says "unknown source". The problem is that the Linux code doesn't do the right thing for a local machine check that results in a fatal error. It turns out that we know very early in the handler whether the machine check is fatal. The call to mce_no_way_out() has checked all the banks for the CPU that took the local machine check. If it says we must crash, we can do so right away with the right messages. We do scan all the banks again. This means that we might initially not see a problem, but during the second scan find something fatal. If this happens we print a slightly different message (so I can see if it actually every happens). [ bp: Remove unneeded severity assignment. ] Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Ashok Raj Cc: Dan Williams Cc: Qiuxu Zhuo Cc: linux-edac Cc: stable@vger.kernel.org # 4.2 Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 7e6f51a9d917..e93670d736a6 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1207,13 +1207,18 @@ void do_machine_check(struct pt_regs *regs, long error_code) lmce = m.mcgstatus & MCG_STATUS_LMCES; /* + * Local machine check may already know that we have to panic. + * Broadcast machine check begins rendezvous in mce_start() * Go through all banks in exclusion of the other CPUs. This way we * don't report duplicated events on shared banks because the first one - * to see it will clear it. If this is a Local MCE, then no need to - * perform rendezvous. + * to see it will clear it. */ - if (!lmce) + if (lmce) { + if (no_way_out) + mce_panic("Fatal local machine check", &m, msg); + } else { order = mce_start(&no_way_out); + } for (i = 0; i < cfg->banks; i++) { __clear_bit(i, toclear); @@ -1289,12 +1294,17 @@ void do_machine_check(struct pt_regs *regs, long error_code) no_way_out = worst >= MCE_PANIC_SEVERITY; } else { /* - * Local MCE skipped calling mce_reign() - * If we found a fatal error, we need to panic here. + * If there was a fatal machine check we should have + * already called mce_panic earlier in this function. + * Since we re-read the banks, we might have found + * something new. Check again to see if we found a + * fatal error. We call "mce_severity()" again to + * make sure we have the right "msg". */ - if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) - mce_panic("Machine check from unknown source", - NULL, NULL); + if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { + mce_severity(&m, cfg->tolerant, &msg, true); + mce_panic("Local fatal machine check!", &m, msg); + } } /*