[PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
@ 2015-02-27  4:58 Naoya Horiguchi
  2015-02-27  4:58 ` [PATCH v2 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
  2015-02-27 11:09 ` [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Prarit Bhargava
  0 siblings, 2 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2015-02-27  4:58 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Prarit Bhargava, Vivek Goyal, linux-kernel@vger.kernel.org,
	Junichi Nomura, Kiyoshi Ueda

kexec disables (or "shoots down") all CPUs other than a crashing CPU before
entering the 2nd kernel. But the MCE handler is still enabled after that, so
if MCE happens and broadcasts around CPUs after the main thread starts the
2nd kernel (which might not start MCE yet, or might decide not to start MCE,)
MCE handler runs only on the other CPUs (not on the main thread,) leading to
kernel panic with MCE synchronization. The user-visible effect of this bug
is kdump failure.

Note that this problem exists since current MCE handler was implemented in
2.6.32, and recently commit 716079f66eac ("mce: Panic when a core has reached
a timeout") made it more visible by changing the default behavior of the
synchronization timeout from "ignore" to "panic".

This patch adds a global variable representing that the system is running
kdump code in order to "turn off" the MCE handling code in kdump context.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>        [2.6.32+]
---
ChangeLog v1 -> v2
- clear MSR_IA32_MCG_CTL, MSR_IA32_MCx_CTL, and CR4.MCE instead of using
  global flag to ignore MCE events.
- fixed the description of the problem
---
 arch/x86/include/asm/mce.h       |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 17 +++++++++++++++++
 arch/x86/kernel/crash.c          |  8 ++++++++
 3 files changed, 26 insertions(+)

diff --git v3.19.orig/arch/x86/include/asm/mce.h v3.19/arch/x86/include/asm/mce.h
index 51b26e895933..7ae9927d781a 100644
--- v3.19.orig/arch/x86/include/asm/mce.h
+++ v3.19/arch/x86/include/asm/mce.h
@@ -175,6 +175,7 @@ static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
+void cpu_emergency_mce_disable(void);
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c v3.19/arch/x86/kernel/cpu/mcheck/mce.c
index 3112b79ace8e..10359ae1f558 100644
--- v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ v3.19/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2105,6 +2105,23 @@ static void mce_syscore_shutdown(void)
 }
 
 /*
+ * Called in kdump entering code to turn off MCE handling function. We clear
+ * global switch first to forbid the situation where only portion of CPUs are
+ * responsive to MCE and MCE causes kernel panic with synchronization timeout.
+ */
+void cpu_emergency_mce_disable(void)
+{
+	u64 cap;
+	int i;
+
+	rdmsrl(MSR_IA32_MCG_CAP, cap);
+	if (cap & MCG_CTL_P)
+		wrmsr(MSR_IA32_MCG_CTL, 0, 0);
+	mce_disable_error_reporting();
+	clear_in_cr4(X86_CR4_MCE);
+}
+
+/*
  * On resume clear all MCE state. Don't want to see leftovers from the BIOS.
  * Only one CPU is active at this time, the others get re-added later using
  * CPU hotplug:
diff --git v3.19.orig/arch/x86/kernel/crash.c v3.19/arch/x86/kernel/crash.c
index aceb2f90c716..22451c687fca 100644
--- v3.19.orig/arch/x86/kernel/crash.c
+++ v3.19/arch/x86/kernel/crash.c
@@ -34,6 +34,7 @@
 #include <asm/cpu.h>
 #include <asm/reboot.h>
 #include <asm/virtext.h>
+#include <asm/mce.h>
 
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
@@ -112,6 +113,8 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 #endif
 	crash_save_cpu(regs, cpu);
 
+	cpu_emergency_mce_disable();
+
 	/*
 	 * VMCLEAR VMCSs loaded on all cpus if needed.
 	 */
@@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	/* The kernel is broken so disable interrupts */
 	local_irq_disable();
 
+	/*
+	 * We can't expect MCE handling to work any more, so turn it off.
+	 */
+	cpu_emergency_mce_disable();
+
 	kdump_nmi_shootdown_cpus();
 
 	/*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant
  2015-02-27  4:58 [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Naoya Horiguchi
@ 2015-02-27  4:58 ` Naoya Horiguchi
  2015-02-27 11:09 ` [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Prarit Bhargava
  1 sibling, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2015-02-27  4:58 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Prarit Bhargava, Vivek Goyal, linux-kernel@vger.kernel.org,
	Junichi Nomura, Kiyoshi Ueda

commit 716079f66eac ("mce: Panic when a core has reached a timeout") changed
the behavior of mca_cfg->tolerant. So let's add comment about it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c v3.19/arch/x86/kernel/cpu/mcheck/mce.c
index 10359ae1f558..abdd2631036b 100644
--- v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ v3.19/arch/x86/kernel/cpu/mcheck/mce.c
@@ -69,8 +69,10 @@ struct mca_config mca_cfg __read_mostly = {
 	/*
 	 * Tolerant levels:
 	 * 0: always panic on uncorrected errors, log corrected errors
-	 * 1: panic or SIGBUS on uncorrected errors, log corrected errors
-	 * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
+	 * 1: panic or SIGBUS on uncorrected errors, log corrected errors,
+	 *    panic on MCE synchronization timeout.
+	 * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors,
+	 *    no panic on MCE synchronization timeout.
 	 * 3: never panic or SIGBUS, log all errors (for testing only)
 	 */
 	.tolerant = 1,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27  4:58 [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Naoya Horiguchi
  2015-02-27  4:58 ` [PATCH v2 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
@ 2015-02-27 11:09 ` Prarit Bhargava
  2015-02-27 12:06   ` Borislav Petkov
  2015-02-27 12:46   ` Naoya Horiguchi
  1 sibling, 2 replies; 14+ messages in thread
From: Prarit Bhargava @ 2015-02-27 11:09 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Tony Luck, Borislav Petkov, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda



On 02/26/2015 11:58 PM, Naoya Horiguchi wrote:
> kexec disables (or "shoots down") all CPUs other than a crashing CPU before
> entering the 2nd kernel. But the MCE handler is still enabled after that, so
> if MCE happens and broadcasts around CPUs after the main thread starts the
> 2nd kernel (which might not start MCE yet, or might decide not to start MCE,)
> MCE handler runs only on the other CPUs (not on the main thread,) leading to
> kernel panic with MCE synchronization. The user-visible effect of this bug
> is kdump failure.
> 
> Note that this problem exists since current MCE handler was implemented in
> 2.6.32, and recently commit 716079f66eac ("mce: Panic when a core has reached
> a timeout") made it more visible by changing the default behavior of the
> synchronization timeout from "ignore" to "panic".
> 
> This patch adds a global variable representing that the system is running
> kdump code in order to "turn off" the MCE handling code in kdump context.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: <stable@vger.kernel.org>        [2.6.32+]
> ---
> ChangeLog v1 -> v2
> - clear MSR_IA32_MCG_CTL, MSR_IA32_MCx_CTL, and CR4.MCE instead of using
>   global flag to ignore MCE events.
> - fixed the description of the problem
> ---
>  arch/x86/include/asm/mce.h       |  1 +
>  arch/x86/kernel/cpu/mcheck/mce.c | 17 +++++++++++++++++
>  arch/x86/kernel/crash.c          |  8 ++++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git v3.19.orig/arch/x86/include/asm/mce.h v3.19/arch/x86/include/asm/mce.h
> index 51b26e895933..7ae9927d781a 100644
> --- v3.19.orig/arch/x86/include/asm/mce.h
> +++ v3.19/arch/x86/include/asm/mce.h
> @@ -175,6 +175,7 @@ static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
>  #endif
>  
>  int mce_available(struct cpuinfo_x86 *c);
> +void cpu_emergency_mce_disable(void);
>  
>  DECLARE_PER_CPU(unsigned, mce_exception_count);
>  DECLARE_PER_CPU(unsigned, mce_poll_count);
> diff --git v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c v3.19/arch/x86/kernel/cpu/mcheck/mce.c
> index 3112b79ace8e..10359ae1f558 100644
> --- v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c
> +++ v3.19/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2105,6 +2105,23 @@ static void mce_syscore_shutdown(void)
>  }
>  
>  /*
> + * Called in kdump entering code to turn off MCE handling function. We clear
> + * global switch first to forbid the situation where only portion of CPUs are
> + * responsive to MCE and MCE causes kernel panic with synchronization timeout.
> + */
> +void cpu_emergency_mce_disable(void)
> +{
> +	u64 cap;
> +	int i;
> +
> +	rdmsrl(MSR_IA32_MCG_CAP, cap);
> +	if (cap & MCG_CTL_P)
> +		wrmsr(MSR_IA32_MCG_CTL, 0, 0);
> +	mce_disable_error_reporting();
> +	clear_in_cr4(X86_CR4_MCE);
> +}
> +
> +/*
>   * On resume clear all MCE state. Don't want to see leftovers from the BIOS.
>   * Only one CPU is active at this time, the others get re-added later using
>   * CPU hotplug:
> diff --git v3.19.orig/arch/x86/kernel/crash.c v3.19/arch/x86/kernel/crash.c
> index aceb2f90c716..22451c687fca 100644
> --- v3.19.orig/arch/x86/kernel/crash.c
> +++ v3.19/arch/x86/kernel/crash.c
> @@ -34,6 +34,7 @@
>  #include <asm/cpu.h>
>  #include <asm/reboot.h>
>  #include <asm/virtext.h>
> +#include <asm/mce.h>
>  
>  /* Alignment required for elf header segment */
>  #define ELF_CORE_HEADER_ALIGN   4096
> @@ -112,6 +113,8 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
>  #endif
>  	crash_save_cpu(regs, cpu);
>  
> +	cpu_emergency_mce_disable();
> +
>  	/*
>  	 * VMCLEAR VMCSs loaded on all cpus if needed.
>  	 */
> @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  	/* The kernel is broken so disable interrupts */
>  	local_irq_disable();
>  
> +	/*
> +	 * We can't expect MCE handling to work any more, so turn it off.
> +	 */
> +	cpu_emergency_mce_disable();

What if the system is actually having problems with MCE errors -- which are
leading to system panics of some sort.  Do you *really* want the system to
continue on at that point?

P.

> +
>  	kdump_nmi_shootdown_cpus();
>  
>  	/*
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27 11:09 ` [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Prarit Bhargava
@ 2015-02-27 12:06   ` Borislav Petkov
  2015-02-27 18:27     ` Luck, Tony
  2015-02-27 12:46   ` Naoya Horiguchi
  1 sibling, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2015-02-27 12:06 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Naoya Horiguchi, Tony Luck, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote:
> What if the system is actually having problems with MCE errors --
> which are leading to system panics of some sort. Do you *really* want
> the system to continue on at that point?

No one said that disabling MCA and doing kdump is a 100% reliable thing.

When CR4.MCE=0b and an MCE happens, it will shutdown the system, at
least on Intel, according to Tony. On AMD, disabling error reporting in
addition leads to CR4.MCE being ignored.

In any case, disabling MCA contains a risk kdump should be willing to
take. Let's ask the reverse question: is kdump prepared to handle an MCE
when one happens during dumping?

If we have to be really correct, kdump should actually be prepared
to handle MCEs and in the case where it cannot recover, stop dumping
because the already dumped data might be faulty and corrupted... And
print a nasty message on the screen...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27 11:09 ` [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Prarit Bhargava
  2015-02-27 12:06   ` Borislav Petkov
@ 2015-02-27 12:46   ` Naoya Horiguchi
  2015-02-27 13:14     ` Prarit Bhargava
  1 sibling, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2015-02-27 12:46 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Naoya Horiguchi, Tony Luck, Borislav Petkov, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

Hi Prarit,

On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote:
...
> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> >  	/* The kernel is broken so disable interrupts */
> >  	local_irq_disable();
> >
> > +	/*
> > +	 * We can't expect MCE handling to work any more, so turn it off.
> > +	 */
> > +	cpu_emergency_mce_disable();
>
> What if the system is actually having problems with MCE errors -- which are
> leading to system panics of some sort.  Do you *really* want the system to
> continue on at that point?

Yes, when running the above code, the system doesn't run any business logic,
so no worry about consuming broken data caused by HW errors.
And what we really want to get is any kind of information to find out what
caused the 1st panic, which are likely to be contained in kdump data.
So I think it's justified to improve the success rate of kdump by continuing
the operation here.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27 12:46   ` Naoya Horiguchi
@ 2015-02-27 13:14     ` Prarit Bhargava
  2015-03-02  2:16       ` Naoya Horiguchi
  0 siblings, 1 reply; 14+ messages in thread
From: Prarit Bhargava @ 2015-02-27 13:14 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Naoya Horiguchi, Tony Luck, Borislav Petkov, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda



On 02/27/2015 07:46 AM, Naoya Horiguchi wrote:
> Hi Prarit,
> 
> On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote:
> ...
>> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>> >      /* The kernel is broken so disable interrupts */
>> >      local_irq_disable();
>> >
>> > +    /*
>> > +     * We can't expect MCE handling to work any more, so turn it off.
>> > +     */
>> > +    cpu_emergency_mce_disable();
>>
>> What if the system is actually having problems with MCE errors -- which are
>> leading to system panics of some sort.  Do you *really* want the system to
>> continue on at that point?
> 
> Yes, when running the above code, the system doesn't run any business logic,
> so no worry about consuming broken data caused by HW errors.
> And what we really want to get is any kind of information to find out what
> caused the 1st panic, which are likely to be contained in kdump data.
> So I think it's justified to improve the success rate of kdump by continuing
> the operation here.

I looked into it a bit further -- IIUC (according to the Intel spec) disabling
MCE this way will result in power cycle of the system if an MCE is detected.  So
I guess it isn't a worry for Intel.  If anyone from AMD can hazard a guess what
happens in their case it would be appreciated.

I still don't like this approach all that much as a corrected non-fatal error is
something I would want to know about as an admin, but that risk is mitigated by
BMC and system monitoring hardware.

>But the MCE handler is still enabled after that, so
>if MCE happens and broadcasts around CPUs after the main thread starts the
>2nd kernel (which might not start MCE yet, or might decide not to start MCE,)
>MCE handler runs only on the other CPUs (not on the main thread,) leading to
>kernel panic with MCE synchronization.

Not having looked at the code (and relying on your description) -- there is no
way to disable the MCE handler?

P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27 12:06   ` Borislav Petkov
@ 2015-02-27 18:27     ` Luck, Tony
  2015-03-02  2:31       ` Naoya Horiguchi
  0 siblings, 1 reply; 14+ messages in thread
From: Luck, Tony @ 2015-02-27 18:27 UTC (permalink / raw)
  To: Borislav Petkov, Prarit Bhargava
  Cc: Naoya Horiguchi, Vivek Goyal, linux-kernel@vger.kernel.org,
	Junichi Nomura, Kiyoshi Ueda

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1201 bytes --]

> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at
> least on Intel, according to Tony

I checked with the architects ... and I was right. If you clear CR4.MCE you'll still
see the machine check - and you'll pull the big system reset lever.

If you think the other cpus can survive the reset - then the right thing to do is to
have any offline cpus that show up in the machine check handler just clear MCG_STATUS
and return:

do_machine_check()
{
	/* offline cpus may show up for the party - but don't need to do anything here - send them back home */
	if (!(cpu_online(smp_processor_id())) {
		mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
		return;
	}

If we are crashing because of a machine check - I wonder how useful it is to run kdump().  There are a very
small set of ways that you can induce a machine check from program action - normally the problem is that
something bad happened in the h/w ... a kdump will just fill your disk and waste your time looking at what
the s/w was dong when the machine check happened.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27 13:14     ` Prarit Bhargava
@ 2015-03-02  2:16       ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2015-03-02  2:16 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Naoya Horiguchi, Tony Luck, Borislav Petkov, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Fri, Feb 27, 2015 at 08:14:47AM -0500, Prarit Bhargava wrote:
> On 02/27/2015 07:46 AM, Naoya Horiguchi wrote:
> > Hi Prarit,
> > 
> > On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote:
> > ...
> >> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> >> >      /* The kernel is broken so disable interrupts */
> >> >      local_irq_disable();
> >> >
> >> > +    /*
> >> > +     * We can't expect MCE handling to work any more, so turn it off.
> >> > +     */
> >> > +    cpu_emergency_mce_disable();
> >>
> >> What if the system is actually having problems with MCE errors -- which are
> >> leading to system panics of some sort.  Do you *really* want the system to
> >> continue on at that point?
> > 
> > Yes, when running the above code, the system doesn't run any business logic,
> > so no worry about consuming broken data caused by HW errors.
> > And what we really want to get is any kind of information to find out what
> > caused the 1st panic, which are likely to be contained in kdump data.
> > So I think it's justified to improve the success rate of kdump by continuing
> > the operation here.
> 
> I looked into it a bit further -- IIUC (according to the Intel spec) disabling
> MCE this way will result in power cycle of the system if an MCE is detected.  So
> I guess it isn't a worry for Intel.  If anyone from AMD can hazard a guess what
> happens in their case it would be appreciated.
> 
> I still don't like this approach all that much as a corrected non-fatal error is
> something I would want to know about as an admin, but that risk is mitigated by
> BMC and system monitoring hardware.

Generally corrected non-fatal errors are not reported via MCE but via CMCI,
so not affected by sync timeout problem (they should be logged by mcelog after
reboot in a normal manner.)

But as for BMC/FW/HW logging, I'm not 100% sure that such logging mechanisms
still work when disabling CR4.MCE, so I might need reconsider this approach.

> >But the MCE handler is still enabled after that, so
> >if MCE happens and broadcasts around CPUs after the main thread starts the
> >2nd kernel (which might not start MCE yet, or might decide not to start MCE,)
> >MCE handler runs only on the other CPUs (not on the main thread,) leading to
> >kernel panic with MCE synchronization.
> 
> Not having looked at the code (and relying on your description) -- there is no
> way to disable the MCE handler?

We have the way, as Tony suggesting in another email. And I feel like moving
back to that approach in the next version.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-02-27 18:27     ` Luck, Tony
@ 2015-03-02  2:31       ` Naoya Horiguchi
  2015-03-02 12:17         ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2015-03-02  2:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Prarit Bhargava, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Fri, Feb 27, 2015 at 06:27:16PM +0000, Luck, Tony wrote:
> > When CR4.MCE=0b and an MCE happens, it will shutdown the system, at
> > least on Intel, according to Tony
> 
> I checked with the architects ... and I was right. If you clear CR4.MCE you'll still
> see the machine check - and you'll pull the big system reset lever.

Thank you for confirmation.

> If you think the other cpus can survive the reset - then the right thing to do is to
> have any offline cpus that show up in the machine check handler just clear MCG_STATUS
> and return:
> 
> do_machine_check()
> {
> 	/* offline cpus may show up for the party - but don't need to do anything here - send them back home */
> 	if (!(cpu_online(smp_processor_id())) {
> 		mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> 		return;
> 	}

It seems that kdump shootdown doesn't clear online CPU's cpumask, so this
cpu_online() check doesn't work to this (kdump-specific) problem.
But I think the checking the number of online CPUs for MCE synchronization is
generally correct for other contexts (like MCE under CPU hotremoved system?),
so worth doing in another patch.

> If we are crashing because of a machine check - I wonder how useful it is to run kdump().  There are a very
> small set of ways that you can induce a machine check from program action - normally the problem is that
> something bad happened in the h/w ... a kdump will just fill your disk and waste your time looking at what
> the s/w was dong when the machine check happened.

I don't think every MCE always makes the server inoperative. One good example
is uncorrected errors (including SRAO and SRAR).

And please note that the target of this patch is an MCE when the kernel is
already running on kdump code (so crashing happened *not* because of the MCE).
In that case, we can expect that kdump works fine if the MCE hits the "kdump
shotdown" CPU which are just running cpu_relax() loop, because a 2nd kernel's
CPU isn't affected by the MCE (even the CPU failure is fatal one.)

If a fatal MCE happens on the CPU running kdump code, there's no reason to
try harder to get kdump as you pointed out. In such case, what we can do is
to print out a message like "kdump failed due to MCE" and reset the system.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-03-02  2:31       ` Naoya Horiguchi
@ 2015-03-02 12:17         ` Borislav Petkov
  2015-03-02 14:33           ` Naoya Horiguchi
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2015-03-02 12:17 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Luck, Tony, Prarit Bhargava, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Mon, Mar 02, 2015 at 02:31:19AM +0000, Naoya Horiguchi wrote:
> And please note that the target of this patch is an MCE when the kernel is
> already running on kdump code (so crashing happened *not* because of the MCE).
> In that case, we can expect that kdump works fine if the MCE hits the "kdump
> shotdown" CPU which are just running cpu_relax() loop, because a 2nd kernel's
> CPU isn't affected by the MCE (even the CPU failure is fatal one.)

Well, why would you even want to disable MCA then? If all the CPUs are
offlined, it is very very highly unlikely they'd cause an MCE.

> If a fatal MCE happens on the CPU running kdump code, there's no reason to
> try harder to get kdump as you pointed out. In such case, what we can do is
> to print out a message like "kdump failed due to MCE" and reset the system.

Yes, so a primitive kdump-specific MCE handler would be more viable than
disabling MCA.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-03-02 12:17         ` Borislav Petkov
@ 2015-03-02 14:33           ` Naoya Horiguchi
  2015-03-02 16:32             ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2015-03-02 14:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naoya Horiguchi, Luck, Tony, Prarit Bhargava, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Mon, Mar 02, 2015 at 01:17:01PM +0100, Borislav Petkov wrote:
> On Mon, Mar 02, 2015 at 02:31:19AM +0000, Naoya Horiguchi wrote:
> > And please note that the target of this patch is an MCE when the kernel is
> > already running on kdump code (so crashing happened *not* because of the MCE).
> > In that case, we can expect that kdump works fine if the MCE hits the "kdump
> > shotdown" CPU which are just running cpu_relax() loop, because a 2nd kernel's
> > CPU isn't affected by the MCE (even the CPU failure is fatal one.)
>
> Well, why would you even want to disable MCA then? If all the CPUs are
> offlined, it is very very highly unlikely they'd cause an MCE.

Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why
current kexec implementation doesn't offline the other CPUs but just doing
cpu_relax() loop, but my guess is that in some kernel panic situation (like
soft lockup) we want to keep CPUs' status undisturbed to make sure the bug's
info is captured in kdump.

> > If a fatal MCE happens on the CPU running kdump code, there's no reason to
> > try harder to get kdump as you pointed out. In such case, what we can do is
> > to print out a message like "kdump failed due to MCE" and reset the system.
>
> Yes, so a primitive kdump-specific MCE handler would be more viable than
> disabling MCA.

OK.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-03-02 14:33           ` Naoya Horiguchi
@ 2015-03-02 16:32             ` Borislav Petkov
  2015-03-02 16:50               ` Prarit Bhargava
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2015-03-02 16:32 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Naoya Horiguchi, Luck, Tony, Prarit Bhargava, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Mon, Mar 02, 2015 at 11:33:33PM +0900, Naoya Horiguchi wrote:
> Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why
> current kexec implementation doesn't offline the other CPUs but just doing
> cpu_relax() loop, but my guess is that in some kernel panic situation (like
> soft lockup) we want to keep CPUs' status undisturbed to make sure the bug's
> info is captured in kdump.

Well either offlining or keeping them in the idle loop is fine - they're
not executing anything else and thus the probability of them causing an
MCE becomes disappearingly small.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-03-02 16:32             ` Borislav Petkov
@ 2015-03-02 16:50               ` Prarit Bhargava
  2015-03-02 17:25                 ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Prarit Bhargava @ 2015-03-02 16:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naoya Horiguchi, Naoya Horiguchi, Luck, Tony, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda



On 03/02/2015 11:32 AM, Borislav Petkov wrote:
> On Mon, Mar 02, 2015 at 11:33:33PM +0900, Naoya Horiguchi wrote:
>> Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why
>> current kexec implementation doesn't offline the other CPUs but just doing
>> cpu_relax() loop, but my guess is that in some kernel panic situation (like
>> soft lockup) we want to keep CPUs' status undisturbed to make sure the bug's
>> info is captured in kdump.
> 
> Well either offlining or keeping them in the idle loop is fine - they're
> not executing anything else and thus the probability of them causing an
> MCE becomes disappearingly small.

Unless entering a deep C state kicks an MCE ... which we've seen with flaky
hardware.

P.

> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec
  2015-03-02 16:50               ` Prarit Bhargava
@ 2015-03-02 17:25                 ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2015-03-02 17:25 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Naoya Horiguchi, Naoya Horiguchi, Luck, Tony, Vivek Goyal,
	linux-kernel@vger.kernel.org, Junichi Nomura, Kiyoshi Ueda

On Mon, Mar 02, 2015 at 11:50:49AM -0500, Prarit Bhargava wrote:
> Unless entering a deep C state kicks an MCE ... which we've seen with flaky
> hardware.

If that is the case, you'll see the MCE not only when entering kdump.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-02 17:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-27  4:58 [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Naoya Horiguchi
2015-02-27  4:58 ` [PATCH v2 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
2015-02-27 11:09 ` [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec Prarit Bhargava
2015-02-27 12:06   ` Borislav Petkov
2015-02-27 18:27     ` Luck, Tony
2015-03-02  2:31       ` Naoya Horiguchi
2015-03-02 12:17         ` Borislav Petkov
2015-03-02 14:33           ` Naoya Horiguchi
2015-03-02 16:32             ` Borislav Petkov
2015-03-02 16:50               ` Prarit Bhargava
2015-03-02 17:25                 ` Borislav Petkov
2015-02-27 12:46   ` Naoya Horiguchi
2015-02-27 13:14     ` Prarit Bhargava
2015-03-02  2:16       ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox