From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>, "hpa@zytor.com" <hpa@zytor.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"mingo@elte.hu" <mingo@elte.hu>,
"tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC
Date: Fri, 24 Apr 2009 09:27:38 +0900 [thread overview]
Message-ID: <49F1077A.5030801@jp.fujitsu.com> (raw)
In-Reply-To: <1240479838.6842.555.camel@yhuang-dev.sh.intel.com>
Huang Ying wrote:
> Add some description for the patch, hope that to be more clear.
>
> Best Regards,
> Huang Ying
> -------------------------------------------------->
> Impact: Spec compliance
>
> Tolerant level 0 means: always panic on uncorrected errors, that is,
> panic even for recoverable uncorrected errors. This is a useful option
> for someone think panic is the better hardware error containment
> mechanism than trying to recover.
>
> Current implementation does not comply with the tolerant == 0 spec,
> that is, it tries to recover (by killing related processes) for
> recoverable uncorrected errors (errors triggered in userspace) when
> tolerant == 0. This patch fixes this by going panic for that case.
>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> ---
> arch/x86/kernel/cpu/mcheck/mce_64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/arch/x86/kernel/cpu/mcheck/mce_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
> @@ -400,7 +400,7 @@ void do_machine_check(struct pt_regs * r
> * force_sig() takes an awful lot of locks and has a slight
> * risk of deadlocking.
> */
> - if (user_space) {
> + if (user_space && tolerant > 0) {
> force_sig(SIGBUS, current);
> } else if (panic_on_oops || tolerant < 2) {
> mce_panic("Uncorrected machine check",
>
Wait, I want confirmation.
Given:
* Tolerant levels:
* 0: always panic on uncorrected errors, log corrected errors
Let's walk do_machine_check():
266 void do_machine_check(struct pt_regs * regs, long error_code)
267 {
:
302 for (i = 0; i < banks; i++) {
:
311 rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);
312 if ((m.status & MCI_STATUS_VAL) == 0)
313 continue;
:
319 if ((m.status & MCI_STATUS_UC) == 0)
320 continue;
:
# Now we start checking status with VAL and UC
:
329 if (m.status & MCI_STATUS_EN) {
330 /* if PCC was set, there's no way out */
331 no_way_out |= !!(m.status & MCI_STATUS_PCC);
332 /*
333 * If this error was uncorrectable and there was
334 * an overflow, we're in trouble. If no overflow,
335 * we might get away with just killing a task.
336 */
337 if (m.status & MCI_STATUS_UC) {
338 if (tolerant < 1 || m.status & MCI_STATUS_OVER)
339 no_way_out = 1;
340 kill_it = 1;
341 }
342 } else {
343 /*
344 * Machine check event was not enabled. Clear, but
345 * ignore.
346 */
347 continue;
348 }
:
# Humm, second UC check should be removed...
# Anyway, in case of tolerant == 0, no_way_out == 1 if the event is enabled.
# And kill_it == 1 unless there are no event enabled.
# Therefore, in case of tolerant == 0, always "no_way_out == kill_it".
:
364 }
365 }
:
376 if (no_way_out && tolerant < 3)
377 mce_panic("Machine check", &panicm, mcestart);
:
# in case of tolerant == 0, we usually hit here.
:
385 if (kill_it && tolerant < 3) {
386 int user_space = 0;
387
388 /*
389 * If the EIPV bit is set, it means the saved IP is the
390 * instruction which caused the MCE.
391 */
392 if (m.mcgstatus & MCG_STATUS_EIPV)
393 user_space = panicm.ip && (panicm.cs & 3);
394
395 /*
396 * If we know that the error was in user space, send a
397 * SIGBUS. Otherwise, panic if tolerance is low.
398 *
399 * force_sig() takes an awful lot of locks and has a slight
400 * risk of deadlocking.
401 */
402 if (user_space) {
403 force_sig(SIGBUS, current);
404 } else if (panic_on_oops || tolerant < 2) {
405 mce_panic("Uncorrected machine check",
406 &panicm, mcestart);
407 }
408 }
:
# Then, when we enter here with tolerant == 0 ?
:
421 }
Or, should this patch be applied after committing some of Andi's patches?
It means this patch targets a bug in Andi's patch set and the bug is not
in 2.6.30-rc* yet.
Thanks,
H.Seto
next prev parent reply other threads:[~2009-04-24 0:28 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-07 15:06 [PATCH] [0/4] x86: MCE: Machine check bug fix series for 2.6.30 Andi Kleen
2009-04-07 15:06 ` [PATCH] [1/4] x86: MCE: Make polling timer interval per CPU Andi Kleen
2009-04-08 3:43 ` Hidetoshi Seto
2009-04-08 10:43 ` Andi Kleen
2009-04-08 11:30 ` Hidetoshi Seto
2009-04-08 11:40 ` Andi Kleen
2009-04-09 10:28 ` [PATCH] [1/4] x86: MCE: Make polling timer interval per CPU v2 Andi Kleen
2009-04-07 15:06 ` [PATCH] [2/4] x86: MCE: Fix boot logging logic Andi Kleen
2009-04-07 15:06 ` [PATCH] [3/4] x86: MCE: Improve mce_get_rip Andi Kleen
2009-04-08 8:15 ` Hidetoshi Seto
2009-04-08 10:06 ` Andi Kleen
2009-04-09 4:59 ` Hidetoshi Seto
2009-04-09 7:14 ` Andi Kleen
2009-04-09 9:59 ` Hidetoshi Seto
2009-04-09 10:13 ` Andi Kleen
2009-04-10 4:38 ` Hidetoshi Seto
2009-04-10 8:25 ` Andi Kleen
2009-04-10 9:49 ` Hidetoshi Seto
2009-04-23 9:43 ` Huang Ying
2009-04-24 6:16 ` Hidetoshi Seto
2009-04-24 6:35 ` Huang Ying
2009-04-24 7:28 ` Hidetoshi Seto
2009-04-24 8:50 ` Andi Kleen
2009-04-24 8:52 ` Huang Ying
2009-04-24 10:11 ` Hidetoshi Seto
2009-04-07 15:06 ` [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC Andi Kleen
2009-04-23 9:43 ` Huang Ying
2009-04-23 20:49 ` H. Peter Anvin
2009-04-24 8:35 ` Andi Kleen
2009-04-24 0:27 ` Hidetoshi Seto [this message]
2009-04-24 1:11 ` Huang Ying
2009-04-24 5:40 ` H. Peter Anvin
2009-04-24 8:46 ` Andi Kleen
2009-04-24 10:30 ` Hidetoshi Seto
2009-04-24 16:32 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49F1077A.5030801@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.