All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>, "hpa@zytor.com" <hpa@zytor.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	"tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC
Date: Fri, 24 Apr 2009 09:27:38 +0900	[thread overview]
Message-ID: <49F1077A.5030801@jp.fujitsu.com> (raw)
In-Reply-To: <1240479838.6842.555.camel@yhuang-dev.sh.intel.com>

Huang Ying wrote:
> Add some description for the patch, hope that to be more clear.
> 
> Best Regards,
> Huang Ying
> -------------------------------------------------->
> Impact: Spec compliance
> 
> Tolerant level 0 means: always panic on uncorrected errors, that is,
> panic even for recoverable uncorrected errors. This is a useful option
> for someone think panic is the better hardware error containment
> mechanism than trying to recover.
> 
> Current implementation does not comply with the tolerant == 0 spec,
> that is, it tries to recover (by killing related processes) for
> recoverable uncorrected errors (errors triggered in userspace) when
> tolerant == 0. This patch fixes this by going panic for that case.
> 
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  arch/x86/kernel/cpu/mcheck/mce_64.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/cpu/mcheck/mce_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
> @@ -400,7 +400,7 @@ void do_machine_check(struct pt_regs * r
>  		 * force_sig() takes an awful lot of locks and has a slight
>  		 * risk of deadlocking.
>  		 */
> -		if (user_space) {
> +		if (user_space && tolerant > 0) {
>  			force_sig(SIGBUS, current);
>  		} else if (panic_on_oops || tolerant < 2) {
>  			mce_panic("Uncorrected machine check",
> 

Wait, I want confirmation.

Given:
 * Tolerant levels:
 *   0: always panic on uncorrected errors, log corrected errors

Let's walk do_machine_check():

    266 void do_machine_check(struct pt_regs * regs, long error_code)
    267 {
	:
    302         for (i = 0; i < banks; i++) {
	:
    311                 rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status);
    312                 if ((m.status & MCI_STATUS_VAL) == 0)
    313                         continue;
	:
    319                 if ((m.status & MCI_STATUS_UC) == 0)
    320                         continue;
	:
# Now we start checking status with VAL and UC
	:
    329                 if (m.status & MCI_STATUS_EN) {
    330                         /* if PCC was set, there's no way out */
    331                         no_way_out |= !!(m.status & MCI_STATUS_PCC);
    332                         /*
    333                          * If this error was uncorrectable and there was
    334                          * an overflow, we're in trouble.  If no overflow,
    335                          * we might get away with just killing a task.
    336                          */
    337                         if (m.status & MCI_STATUS_UC) {
    338                                 if (tolerant < 1 || m.status & MCI_STATUS_OVER)
    339                                         no_way_out = 1;
    340                                 kill_it = 1;
    341                         }
    342                 } else {
    343                         /*
    344                          * Machine check event was not enabled. Clear, but
    345                          * ignore.
    346                          */
    347                         continue;
    348                 }
	:
# Humm, second UC check should be removed...
# Anyway, in case of tolerant == 0, no_way_out == 1 if the event is enabled.
# And kill_it == 1 unless there are no event enabled.
# Therefore, in case of tolerant == 0, always "no_way_out == kill_it".
	:
    364                 }
    365         }
	:
    376         if (no_way_out && tolerant < 3)
    377                 mce_panic("Machine check", &panicm, mcestart);
	:
# in case of tolerant == 0, we usually hit here.
	:
    385         if (kill_it && tolerant < 3) {
    386                 int user_space = 0;
    387
    388                 /*
    389                  * If the EIPV bit is set, it means the saved IP is the
    390                  * instruction which caused the MCE.
    391                  */
    392                 if (m.mcgstatus & MCG_STATUS_EIPV)
    393                         user_space = panicm.ip && (panicm.cs & 3);
    394
    395                 /*
    396                  * If we know that the error was in user space, send a
    397                  * SIGBUS.  Otherwise, panic if tolerance is low.
    398                  *
    399                  * force_sig() takes an awful lot of locks and has a slight
    400                  * risk of deadlocking.
    401                  */
    402                 if (user_space) {
    403                         force_sig(SIGBUS, current);
    404                 } else if (panic_on_oops || tolerant < 2) {
    405                         mce_panic("Uncorrected machine check",
    406                                 &panicm, mcestart);
    407                 }
    408         }
	:
# Then, when we enter here with tolerant == 0 ?
	:
    421 }

Or, should this patch be applied after committing some of Andi's patches?
It means this patch targets a bug in Andi's patch set and the bug is not
in 2.6.30-rc* yet.


Thanks,
H.Seto


  parent reply	other threads:[~2009-04-24  0:28 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-07 15:06 [PATCH] [0/4] x86: MCE: Machine check bug fix series for 2.6.30 Andi Kleen
2009-04-07 15:06 ` [PATCH] [1/4] x86: MCE: Make polling timer interval per CPU Andi Kleen
2009-04-08  3:43   ` Hidetoshi Seto
2009-04-08 10:43     ` Andi Kleen
2009-04-08 11:30       ` Hidetoshi Seto
2009-04-08 11:40         ` Andi Kleen
2009-04-09 10:28   ` [PATCH] [1/4] x86: MCE: Make polling timer interval per CPU v2 Andi Kleen
2009-04-07 15:06 ` [PATCH] [2/4] x86: MCE: Fix boot logging logic Andi Kleen
2009-04-07 15:06 ` [PATCH] [3/4] x86: MCE: Improve mce_get_rip Andi Kleen
2009-04-08  8:15   ` Hidetoshi Seto
2009-04-08 10:06     ` Andi Kleen
2009-04-09  4:59       ` Hidetoshi Seto
2009-04-09  7:14         ` Andi Kleen
2009-04-09  9:59           ` Hidetoshi Seto
2009-04-09 10:13             ` Andi Kleen
2009-04-10  4:38               ` Hidetoshi Seto
2009-04-10  8:25                 ` Andi Kleen
2009-04-10  9:49                   ` Hidetoshi Seto
2009-04-23  9:43     ` Huang Ying
2009-04-24  6:16       ` Hidetoshi Seto
2009-04-24  6:35         ` Huang Ying
2009-04-24  7:28           ` Hidetoshi Seto
2009-04-24  8:50             ` Andi Kleen
2009-04-24  8:52             ` Huang Ying
2009-04-24 10:11               ` Hidetoshi Seto
2009-04-07 15:06 ` [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC Andi Kleen
2009-04-23  9:43   ` Huang Ying
2009-04-23 20:49     ` H. Peter Anvin
2009-04-24  8:35       ` Andi Kleen
2009-04-24  0:27     ` Hidetoshi Seto [this message]
2009-04-24  1:11       ` Huang Ying
2009-04-24  5:40         ` H. Peter Anvin
2009-04-24  8:46           ` Andi Kleen
2009-04-24 10:30             ` Hidetoshi Seto
2009-04-24 16:32               ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49F1077A.5030801@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.