From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>, "hpa@zytor.com" <hpa@zytor.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"mingo@elte.hu" <mingo@elte.hu>,
"tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC
Date: Fri, 24 Apr 2009 09:27:38 +0900 [thread overview]
Message-ID: <49F1077A.5030801@jp.fujitsu.com> (raw)
In-Reply-To: <1240479838.6842.555.camel@yhuang-dev.sh.intel.com>
Huang Ying wrote:
> Add some description for the patch, hope that to be more clear.
>
> Best Regards,
> Huang Ying
> -------------------------------------------------->
> Impact: Spec compliance
>
> Tolerant level 0 means: always panic on uncorrected errors, that is,
> panic even for recoverable uncorrected errors. This is a useful option
> for someone think panic is the better hardware error containment
> mechanism than trying to recover.
>
> Current implementation does not comply with the tolerant == 0 spec,
> that is, it tries to recover (by killing related processes) for
> recoverable uncorrected errors (errors triggered in userspace) when
> tolerant == 0. This patch fixes this by going panic for that case.
>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> ---
> arch/x86/kernel/cpu/mcheck/mce_64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/arch/x86/kernel/cpu/mcheck/mce_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
> @@ -400,7 +400,7 @@ void do_machine_check(struct pt_regs * r
> * force_sig() takes an awful lot of locks and has a slight
> * risk of deadlocking.
> */
> - if (user_space) {
> + if (user_space && tolerant > 0) {
> force_sig(SIGBUS, current);
> } else if (panic_on_oops || tolerant < 2) {
> mce_panic("Uncorrected machine check",
>
Wait, I want confirmation.
Given:
* Tolerant levels:
* 0: always panic on uncorrected errors, log corrected errors
Let's walk do_machine_check():
266 void do_machine_check(struct pt_regs * regs, long error_code)
267 {
:
302 for (i = 0; i < banks; i++) {
:
311 rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);
312 if ((m.status & MCI_STATUS_VAL) == 0)
313 continue;
:
319 if ((m.status & MCI_STATUS_UC) == 0)
320 continue;
:
# Now we start checking status with VAL and UC
:
329 if (m.status & MCI_STATUS_EN) {
330 /* if PCC was set, there's no way out */
331 no_way_out |= !!(m.status & MCI_STATUS_PCC);
332 /*
333 * If this error was uncorrectable and there was
334 * an overflow, we're in trouble. If no overflow,
335 * we might get away with just killing a task.
336 */
337 if (m.status & MCI_STATUS_UC) {
338 if (tolerant < 1 || m.status & MCI_STATUS_OVER)
339 no_way_out = 1;
340 kill_it = 1;
341 }
342 } else {
343 /*
344 * Machine check event was not enabled. Clear, but
345 * ignore.
346 */
347 continue;
348 }
:
# Humm, second UC check should be removed...
# Anyway, in case of tolerant == 0, no_way_out == 1 if the event is enabled.
# And kill_it == 1 unless there are no event enabled.
# Therefore, in case of tolerant == 0, always "no_way_out == kill_it".
:
364 }
365 }
:
376 if (no_way_out && tolerant < 3)
377 mce_panic("Machine check", &panicm, mcestart);
:
# in case of tolerant == 0, we usually hit here.
:
385 if (kill_it && tolerant < 3) {
386 int user_space = 0;
387
388 /*
389 * If the EIPV bit is set, it means the saved IP is the
390 * instruction which caused the MCE.
391 */
392 if (m.mcgstatus & MCG_STATUS_EIPV)
393 user_space = panicm.ip && (panicm.cs & 3);
394
395 /*
396 * If we know that the error was in user space, send a
397 * SIGBUS. Otherwise, panic if tolerance is low.
398 *
399 * force_sig() takes an awful lot of locks and has a slight
400 * risk of deadlocking.
401 */
402 if (user_space) {
403 force_sig(SIGBUS, current);
404 } else if (panic_on_oops || tolerant < 2) {
405 mce_panic("Uncorrected machine check",
406 &panicm, mcestart);
407 }
408 }
:
# Then, when we enter here with tolerant == 0 ?
:
421 }
Or, should this patch be applied after committing some of Andi's patches?
It means this patch targets a bug in Andi's patch set and the bug is not
in 2.6.30-rc* yet.
Thanks,
H.Seto
next prev parent reply other threads:[~2009-04-24 0:28 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-07 15:06 [PATCH] [0/4] x86: MCE: Machine check bug fix series for 2.6.30 Andi Kleen
2009-04-07 15:06 ` [PATCH] [1/4] x86: MCE: Make polling timer interval per CPU Andi Kleen
2009-04-08 3:43 ` Hidetoshi Seto
2009-04-08 10:43 ` Andi Kleen
2009-04-08 11:30 ` Hidetoshi Seto
2009-04-08 11:40 ` Andi Kleen
2009-04-09 10:28 ` [PATCH] [1/4] x86: MCE: Make polling timer interval per CPU v2 Andi Kleen
2009-04-07 15:06 ` [PATCH] [2/4] x86: MCE: Fix boot logging logic Andi Kleen
2009-04-07 15:06 ` [PATCH] [3/4] x86: MCE: Improve mce_get_rip Andi Kleen
2009-04-08 8:15 ` Hidetoshi Seto
2009-04-08 10:06 ` Andi Kleen
2009-04-09 4:59 ` Hidetoshi Seto
2009-04-09 7:14 ` Andi Kleen
2009-04-09 9:59 ` Hidetoshi Seto
2009-04-09 10:13 ` Andi Kleen
2009-04-10 4:38 ` Hidetoshi Seto
2009-04-10 8:25 ` Andi Kleen
2009-04-10 9:49 ` Hidetoshi Seto
2009-04-23 9:43 ` Huang Ying
2009-04-24 6:16 ` Hidetoshi Seto
2009-04-24 6:35 ` Huang Ying
2009-04-24 7:28 ` Hidetoshi Seto
2009-04-24 8:50 ` Andi Kleen
2009-04-24 8:52 ` Huang Ying
2009-04-24 10:11 ` Hidetoshi Seto
2009-04-07 15:06 ` [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC Andi Kleen
2009-04-23 9:43 ` Huang Ying
2009-04-23 20:49 ` H. Peter Anvin
2009-04-24 8:35 ` Andi Kleen
2009-04-24 0:27 ` Hidetoshi Seto [this message]
2009-04-24 1:11 ` Huang Ying
2009-04-24 5:40 ` H. Peter Anvin
2009-04-24 8:46 ` Andi Kleen
2009-04-24 10:30 ` Hidetoshi Seto
2009-04-24 16:32 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49F1077A.5030801@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox