* [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE
@ 2009-08-26 7:20 Hidetoshi Seto
2009-08-26 9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
0 siblings, 2 replies; 3+ messages in thread
From: Hidetoshi Seto @ 2009-08-26 7:20 UTC (permalink / raw)
To: linux-kernel; +Cc: Andi Kleen, H. Peter Anvin, Jin Dongming
[based on tip/x86/mce]
If MCE handler is called but none of mces_seen have machine check event
which might signal the MCE (i.e. event higher than MCE_KEEP_SEVERITY),
panic with "Machine check from unknown source" will be taken since the
MCE is assumed to be signaled from external agent or so.
Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE.
But it can happen because initial value of mces_seen is accidentally
modified by mce_no_way_out() - in case if mce_no_way_out() run through
all banks and the last bank has the CE, mces_seen points the CE and
the "panic by unknown" will not be taken.
This patch fix this undesired behavior, and clarify the logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 54bd1b2..7b485e9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -665,7 +665,7 @@ static void mce_reign(void)
* No machine check event found. Must be some external
* source or one CPU is hung. Panic.
*/
- if (!m && tolerant < 3)
+ if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
mce_panic("Machine check from unknown source", NULL, NULL);
/*
@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_setup(&m);
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
- no_way_out = mce_no_way_out(&m, &msg);
-
final = &__get_cpu_var(mces_seen);
*final = m;
+ no_way_out = mce_no_way_out(&m, &msg);
+
barrier();
/*
--
1.6.4.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE
2009-08-26 7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
@ 2009-08-26 9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2009-08-26 9:14 UTC (permalink / raw)
To: Hidetoshi Seto; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin, Jin Dongming
* Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> wrote:
> [based on tip/x86/mce]
>
> If MCE handler is called but none of mces_seen have machine check
> event which might signal the MCE (i.e. event higher than
> MCE_KEEP_SEVERITY), panic with "Machine check from unknown source"
> will be taken since the MCE is assumed to be signaled from
> external agent or so.
>
> Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE.
> But it can happen because initial value of mces_seen is
> accidentally modified by mce_no_way_out() - in case if
> mce_no_way_out() run through all banks and the last bank has the
> CE, mces_seen points the CE and the "panic by unknown" will not be
> taken.
>
> This patch fix this undesired behavior, and clarify the logic.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
applied, thanks!
Btw., i had a quick look at
arch/x86/kernel/cpu/mcheck/mce-severity.c, and it is quite a pile of
unclean, over-engineered crap really.
Would you be interested in sending me a patch that converts that to
clean, proper C code that just checks the bits in a straightforward,
readable way? We dont need that silly, unreadable table with the
macro jungle and we definitely dont want to expose it via debugfs -
the debugfs bits can be removed altogether.
[ Plus in the mce_severity() implementation please rename 'a' to at
least 'm' - that name choice for a variable shows zero taste. We
dont program the kernel in BASIC with 'A', 'B' and 'C' variable
names anymore. ]
Thanks,
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
* [tip:x86/pat] x86, mce: CE in last bank prevents panic by unknown MCE
2009-08-26 7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
2009-08-26 9:14 ` Ingo Molnar
@ 2009-09-17 21:37 ` tip-bot for Hidetoshi Seto
1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Hidetoshi Seto @ 2009-09-17 21:37 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, jin.dongming, seto.hidetoshi, ak, tglx,
mingo
Commit-ID: 680b6cfd3cee30a7d997d49430fb73af84523853
Gitweb: http://git.kernel.org/tip/680b6cfd3cee30a7d997d49430fb73af84523853
Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
AuthorDate: Wed, 26 Aug 2009 16:20:36 +0900
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Aug 2009 20:21:11 +0200
x86, mce: CE in last bank prevents panic by unknown MCE
If MCE handler is called but none of mces_seen have machine
check event which might signal the MCE (i.e. event higher than
MCE_KEEP_SEVERITY), panic with "Machine check from unknown
source" will be taken since the MCE is assumed to be signaled
from external agent or so.
Usually mces_seen never point MCE_KEEP_SEVERITY event such as
CE. But it can happen because initial value of mces_seen is
accidentally modified by mce_no_way_out() - in case if
mce_no_way_out() run through all banks and the last bank has
the CE, mces_seen points the CE and the "panic by unknown" will
not be taken.
This patch fixes this undesired behavior, and clarifies the logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
LKML-Reference: <4A94E244.3020301@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 54bd1b2..325559d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -612,7 +612,7 @@ out:
* This way we prevent any potential data corruption in a unrecoverable case
* and also makes sure always all CPU's errors are examined.
*
- * Also this detects the case of an machine check event coming from outer
+ * Also this detects the case of a machine check event coming from outer
* space (not detected by any CPUs) In this case some external agent wants
* us to shut down, so panic too.
*
@@ -665,7 +665,7 @@ static void mce_reign(void)
* No machine check event found. Must be some external
* source or one CPU is hung. Panic.
*/
- if (!m && tolerant < 3)
+ if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
mce_panic("Machine check from unknown source", NULL, NULL);
/*
@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_setup(&m);
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
- no_way_out = mce_no_way_out(&m, &msg);
-
final = &__get_cpu_var(mces_seen);
*final = m;
+ no_way_out = mce_no_way_out(&m, &msg);
+
barrier();
/*
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-09-17 21:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-26 7:20 [PATCH -tip] x86, mce: CE in last bank prevents panic by unknown MCE Hidetoshi Seto
2009-08-26 9:14 ` Ingo Molnar
2009-09-17 21:37 ` [tip:x86/pat] " tip-bot for Hidetoshi Seto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox