* [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE
@ 2009-08-26 7:20 Hidetoshi Seto
2009-08-26 9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
0 siblings, 2 replies; 3+ messages in thread
From: Hidetoshi Seto @ 2009-08-26 7:20 UTC (permalink / raw)
To: linux-kernel; +Cc: Andi Kleen, H. Peter Anvin, Jin Dongming
[based on tip/x86/mce]
If MCE handler is called but none of mces_seen have machine check event
which might signal the MCE (i.e. event higher than MCE_KEEP_SEVERITY),
panic with "Machine check from unknown source" will be taken since the
MCE is assumed to be signaled from external agent or so.
Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE.
But it can happen because initial value of mces_seen is accidentally
modified by mce_no_way_out() - in case if mce_no_way_out() run through
all banks and the last bank has the CE, mces_seen points the CE and
the "panic by unknown" will not be taken.
This patch fix this undesired behavior, and clarify the logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 54bd1b2..7b485e9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -665,7 +665,7 @@ static void mce_reign(void)
* No machine check event found. Must be some external
* source or one CPU is hung. Panic.
*/
- if (!m && tolerant < 3)
+ if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
mce_panic("Machine check from unknown source", NULL, NULL);
/*
@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_setup(&m);
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
- no_way_out = mce_no_way_out(&m, &msg);
-
final = &__get_cpu_var(mces_seen);
*final = m;
+ no_way_out = mce_no_way_out(&m, &msg);
+
barrier();
/*
--
1.6.4.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE
2009-08-26 7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
@ 2009-08-26 9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2009-08-26 9:14 UTC (permalink / raw)
To: Hidetoshi Seto; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin, Jin Dongming
* Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> wrote:
> [based on tip/x86/mce]
>
> If MCE handler is called but none of mces_seen have machine check
> event which might signal the MCE (i.e. event higher than
> MCE_KEEP_SEVERITY), panic with "Machine check from unknown source"
> will be taken since the MCE is assumed to be signaled from
> external agent or so.
>
> Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE.
> But it can happen because initial value of mces_seen is
> accidentally modified by mce_no_way_out() - in case if
> mce_no_way_out() run through all banks and the last bank has the
> CE, mces_seen points the CE and the "panic by unknown" will not be
> taken.
>
> This patch fix this undesired behavior, and clarify the logic.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
applied, thanks!
Btw., i had a quick look at
arch/x86/kernel/cpu/mcheck/mce-severity.c, and it is quite a pile of
unclean, over-engineered crap really.
Would you be interested in sending me a patch that converts that to
clean, proper C code that just checks the bits in a straightforward,
readable way? We dont need that silly, unreadable table with the
macro jungle and we definitely dont want to expose it via debugfs -
the debugfs bits can be removed altogether.
[ Plus in the mce_severity() implementation please rename 'a' to at
least 'm' - that name choice for a variable shows zero taste. We
dont program the kernel in BASIC with 'A', 'B' and 'C' variable
names anymore. ]
Thanks,
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
* [tip:x86/pat] x86, mce: CE in last bank prevents panic by unknown MCE
2009-08-26 7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
2009-08-26 9:14 ` Ingo Molnar
@ 2009-09-17 21:37 ` tip-bot for Hidetoshi Seto
1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Hidetoshi Seto @ 2009-09-17 21:37 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, jin.dongming, seto.hidetoshi, ak, tglx,
mingo
Commit-ID: 680b6cfd3cee30a7d997d49430fb73af84523853
Gitweb: http://git.kernel.org/tip/680b6cfd3cee30a7d997d49430fb73af84523853
Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
AuthorDate: Wed, 26 Aug 2009 16:20:36 +0900
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Aug 2009 20:21:11 +0200
x86, mce: CE in last bank prevents panic by unknown MCE
If MCE handler is called but none of mces_seen have machine
check event which might signal the MCE (i.e. event higher than
MCE_KEEP_SEVERITY), panic with "Machine check from unknown
source" will be taken since the MCE is assumed to be signaled
from external agent or so.
Usually mces_seen never point MCE_KEEP_SEVERITY event such as
CE. But it can happen because initial value of mces_seen is
accidentally modified by mce_no_way_out() - in case if
mce_no_way_out() run through all banks and the last bank has
the CE, mces_seen points the CE and the "panic by unknown" will
not be taken.
This patch fixes this undesired behavior, and clarifies the logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
LKML-Reference: <4A94E244.3020301@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 54bd1b2..325559d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -612,7 +612,7 @@ out:
* This way we prevent any potential data corruption in a unrecoverable case
* and also makes sure always all CPU's errors are examined.
*
- * Also this detects the case of an machine check event coming from outer
+ * Also this detects the case of a machine check event coming from outer
* space (not detected by any CPUs) In this case some external agent wants
* us to shut down, so panic too.
*
@@ -665,7 +665,7 @@ static void mce_reign(void)
* No machine check event found. Must be some external
* source or one CPU is hung. Panic.
*/
- if (!m && tolerant < 3)
+ if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
mce_panic("Machine check from unknown source", NULL, NULL);
/*
@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_setup(&m);
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
- no_way_out = mce_no_way_out(&m, &msg);
-
final = &__get_cpu_var(mces_seen);
*final = m;
+ no_way_out = mce_no_way_out(&m, &msg);
+
barrier();
/*
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-09-17 21:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-26 7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
2009-08-26 9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.