stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: gregkh@linuxfoundation.org
Cc: ashok.raj@intel.com, bp@suse.de, dan.j.williams@intel.com,
	linux-edac@vger.kernel.org, qiuxu.zhuo@intel.com,
	tglx@linutronix.de, stable@vger.kernel.org
Subject: Re: FAILED: patch "[PATCH] x86/mce: Fix incorrect "Machine check from unknown source"" failed to apply to 4.4-stable tree
Date: Thu, 28 Jun 2018 15:09:31 -0700	[thread overview]
Message-ID: <20180628220931.GA569@agluck-desk> (raw)
In-Reply-To: <1530151642162195@kroah.com>

On Thu, Jun 28, 2018 at 11:07:22AM +0900, gregkh@linuxfoundation.org wrote:
> 
> The patch below does not apply to the 4.4-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
> 
> thanks,

This patch relies on:

	3acb431b84d8 ("x86/mce: Detect local MCEs properly")

cherry pick that (and fix up the trivial merge problem around the
change to initialize "lmce = 1;" instead of "lmce = 0";)

Then this will merge cleanly.

-Tony
> 
> ------------------ original commit in Linus's tree ------------------
> 
> From 40c36e2741d7fe1e66d6ec55477ba5fd19c9c5d2 Mon Sep 17 00:00:00 2001
> From: Tony Luck <tony.luck@intel.com>
> Date: Fri, 22 Jun 2018 11:54:23 +0200
> Subject: [PATCH] x86/mce: Fix incorrect "Machine check from unknown source"
>  message
> 
> Some injection testing resulted in the following console log:
> 
>   mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
>   mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
>   mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
>   mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
>   mce: [Hardware Error]: Run the above through 'mcelog --ascii'
>   Kernel panic - not syncing: Machine check from unknown source
> 
> This confused everybody because the first line quite clearly shows
> that we found a logged error in "Bank 1", while the last line says
> "unknown source".
> 
> The problem is that the Linux code doesn't do the right thing
> for a local machine check that results in a fatal error.
> 
> It turns out that we know very early in the handler whether the
> machine check is fatal. The call to mce_no_way_out() has checked
> all the banks for the CPU that took the local machine check. If
> it says we must crash, we can do so right away with the right
> messages.
> 
> We do scan all the banks again. This means that we might initially
> not see a problem, but during the second scan find something fatal.
> If this happens we print a slightly different message (so I can
> see if it actually every happens).
> 
> [ bp: Remove unneeded severity assignment. ]
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Cc: linux-edac <linux-edac@vger.kernel.org>
> Cc: stable@vger.kernel.org # 4.2
> Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 7e6f51a9d917..e93670d736a6 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1207,13 +1207,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  		lmce = m.mcgstatus & MCG_STATUS_LMCES;
>  
>  	/*
> +	 * Local machine check may already know that we have to panic.
> +	 * Broadcast machine check begins rendezvous in mce_start()
>  	 * Go through all banks in exclusion of the other CPUs. This way we
>  	 * don't report duplicated events on shared banks because the first one
> -	 * to see it will clear it. If this is a Local MCE, then no need to
> -	 * perform rendezvous.
> +	 * to see it will clear it.
>  	 */
> -	if (!lmce)
> +	if (lmce) {
> +		if (no_way_out)
> +			mce_panic("Fatal local machine check", &m, msg);
> +	} else {
>  		order = mce_start(&no_way_out);
> +	}
>  
>  	for (i = 0; i < cfg->banks; i++) {
>  		__clear_bit(i, toclear);
> @@ -1289,12 +1294,17 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  			no_way_out = worst >= MCE_PANIC_SEVERITY;
>  	} else {
>  		/*
> -		 * Local MCE skipped calling mce_reign()
> -		 * If we found a fatal error, we need to panic here.
> +		 * If there was a fatal machine check we should have
> +		 * already called mce_panic earlier in this function.
> +		 * Since we re-read the banks, we might have found
> +		 * something new. Check again to see if we found a
> +		 * fatal error. We call "mce_severity()" again to
> +		 * make sure we have the right "msg".
>  		 */
> -		 if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
> -			mce_panic("Machine check from unknown source",
> -				NULL, NULL);
> +		if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
> +			mce_severity(&m, cfg->tolerant, &msg, true);
> +			mce_panic("Local fatal machine check!", &m, msg);
> +		}
>  	}
>  
>  	/*
> 

  reply	other threads:[~2018-06-28 22:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28  2:07 FAILED: patch "[PATCH] x86/mce: Fix incorrect "Machine check from unknown source"" failed to apply to 4.4-stable tree gregkh
2018-06-28 22:09 ` Luck, Tony [this message]
2018-07-05 18:11   ` Greg KH
2018-07-05 18:15     ` Luck, Tony
2018-07-05 18:21       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180628220931.GA569@agluck-desk \
    --to=tony.luck@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=bp@suse.de \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).