public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE
@ 2009-08-26  7:20 Hidetoshi Seto
  2009-08-26  9:14 ` Ingo Molnar
  2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
  0 siblings, 2 replies; 3+ messages in thread
From: Hidetoshi Seto @ 2009-08-26  7:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andi Kleen, H. Peter Anvin, Jin Dongming

[based on tip/x86/mce]

If MCE handler is called but none of mces_seen have machine check event
which might signal the MCE (i.e. event higher than MCE_KEEP_SEVERITY),
panic with "Machine check from unknown source" will be taken since the
MCE is assumed to be signaled from external agent or so.

Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE.
But it can happen because initial value of mces_seen is accidentally
modified by mce_no_way_out() - in case if mce_no_way_out() run through
all banks and the last bank has the CE, mces_seen points the CE and
the "panic by unknown" will not be taken.

This patch fix this undesired behavior, and clarify the logic.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>

---
 arch/x86/kernel/cpu/mcheck/mce.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 54bd1b2..7b485e9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -665,7 +665,7 @@ static void mce_reign(void)
 	 * No machine check event found. Must be some external
 	 * source or one CPU is hung. Panic.
 	 */
-	if (!m && tolerant < 3)
+	if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
 		mce_panic("Machine check from unknown source", NULL, NULL);
 
 	/*
@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	mce_setup(&m);
 
 	m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
-	no_way_out = mce_no_way_out(&m, &msg);
-
 	final = &__get_cpu_var(mces_seen);
 	*final = m;
 
+	no_way_out = mce_no_way_out(&m, &msg);
+
 	barrier();
 
 	/*
-- 
1.6.4.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE
  2009-08-26  7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
@ 2009-08-26  9:14 ` Ingo Molnar
  2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
  1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2009-08-26  9:14 UTC (permalink / raw)
  To: Hidetoshi Seto; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin, Jin Dongming


* Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> wrote:

> [based on tip/x86/mce]
> 
> If MCE handler is called but none of mces_seen have machine check 
> event which might signal the MCE (i.e. event higher than 
> MCE_KEEP_SEVERITY), panic with "Machine check from unknown source" 
> will be taken since the MCE is assumed to be signaled from 
> external agent or so.
> 
> Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE. 
> But it can happen because initial value of mces_seen is 
> accidentally modified by mce_no_way_out() - in case if 
> mce_no_way_out() run through all banks and the last bank has the 
> CE, mces_seen points the CE and the "panic by unknown" will not be 
> taken.
> 
> This patch fix this undesired behavior, and clarify the logic.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> 
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)

applied, thanks!

Btw., i had a quick look at 
arch/x86/kernel/cpu/mcheck/mce-severity.c, and it is quite a pile of 
unclean, over-engineered crap really.

Would you be interested in sending me a patch that converts that to 
clean, proper C code that just checks the bits in a straightforward, 
readable way? We dont need that silly, unreadable table with the 
macro jungle and we definitely dont want to expose it via debugfs - 
the debugfs bits can be removed altogether.

[ Plus in the mce_severity() implementation please rename 'a' to at
  least 'm' - that name choice for a variable shows zero taste. We
  dont program the kernel in BASIC with 'A', 'B' and 'C' variable
  names anymore. ]

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:x86/pat] x86, mce: CE in last bank prevents panic by unknown MCE
  2009-08-26  7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
  2009-08-26  9:14 ` Ingo Molnar
@ 2009-09-17 21:37 ` tip-bot for Hidetoshi Seto
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Hidetoshi Seto @ 2009-09-17 21:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, jin.dongming, seto.hidetoshi, ak, tglx,
	mingo

Commit-ID:  680b6cfd3cee30a7d997d49430fb73af84523853
Gitweb:     http://git.kernel.org/tip/680b6cfd3cee30a7d997d49430fb73af84523853
Author:     Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
AuthorDate: Wed, 26 Aug 2009 16:20:36 +0900
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Aug 2009 20:21:11 +0200

x86, mce: CE in last bank prevents panic by unknown MCE

If MCE handler is called but none of mces_seen have machine
check event which might signal the MCE (i.e. event higher than
MCE_KEEP_SEVERITY), panic with "Machine check from unknown
source" will be taken since the MCE is assumed to be signaled
from external agent or so.

Usually mces_seen never point MCE_KEEP_SEVERITY event such as
CE. But it can happen because initial value of mces_seen is
accidentally modified by mce_no_way_out() - in case if
mce_no_way_out() run through all banks and the last bank has
the CE, mces_seen points the CE and the "panic by unknown" will
not be taken.

This patch fixes this undesired behavior, and clarifies the logic.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
LKML-Reference: <4A94E244.3020301@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>


---
 arch/x86/kernel/cpu/mcheck/mce.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 54bd1b2..325559d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -612,7 +612,7 @@ out:
  * This way we prevent any potential data corruption in a unrecoverable case
  * and also makes sure always all CPU's errors are examined.
  *
- * Also this detects the case of an machine check event coming from outer
+ * Also this detects the case of a machine check event coming from outer
  * space (not detected by any CPUs) In this case some external agent wants
  * us to shut down, so panic too.
  *
@@ -665,7 +665,7 @@ static void mce_reign(void)
 	 * No machine check event found. Must be some external
 	 * source or one CPU is hung. Panic.
 	 */
-	if (!m && tolerant < 3)
+	if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
 		mce_panic("Machine check from unknown source", NULL, NULL);
 
 	/*
@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	mce_setup(&m);
 
 	m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
-	no_way_out = mce_no_way_out(&m, &msg);
-
 	final = &__get_cpu_var(mces_seen);
 	*final = m;
 
+	no_way_out = mce_no_way_out(&m, &msg);
+
 	barrier();
 
 	/*

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-09-17 21:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-26  7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
2009-08-26  9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox