linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chen Gong <gong.chen@linux.intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Borislav Petkov <bp@amd64.org>,
	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [PATCH 5/5] mce: recover from "action required" errors reported in data path in usermode
Date: Wed, 07 Sep 2011 14:05:38 +0800	[thread overview]
Message-ID: <4E6709B2.7020401@linux.intel.com> (raw)
In-Reply-To: <4e5eb50721061dbb1b@agluck-desktop.sc.intel.com>

于 2011/9/1 6:26, Luck, Tony 写道:
> From: Tony Luck<tony.luck@intel.com>
>
> Two new entries in the mce severity table - one notes that data errors
> observed by innocent bystanders (who happen to share a machine check
> bank with the cpu experiencing the error) should be left alone by using
> the "KEEP" severity.
>
> Then inline in the do_machine_check() handler we process the user-mode
> data error that was marked at MCE_AR_SEVERITY.  Even though we are in
> "machine check context" it is almost safe to do so. We have already
> released all the other cpus from rendezvous and we know that the cpu
> with the error was executing user code - so it cannot have interrupts
> locked out, or hold any locks. I.e. this is almost equivalent to a
> page fault. Only difference (and risk) is that on x86_64 we are still
> on the machine check stack - so if another machine check arrives, we
> are toast (we didn't clear MCG_STATUS - yet, so cpu will reset rather
> than taking a nested machine check on the same stack).
>
> Signed-off-by: Tony Luck<tony.luck@intel.com>
> ---
>
> Using the "KEEP" state avoids the complexity of my earlier solution
> that sorted the cpus by severity and ran the more serious ones first.
>
>   arch/x86/kernel/cpu/mcheck/mce-severity.c |   14 ++++++++++-
>   arch/x86/kernel/cpu/mcheck/mce.c          |   35 ++++++++++++++++++++--------
>   2 files changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> index 7395d5f..c4d8b24 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -54,6 +54,7 @@ static struct severity {
>   #define  MASK(x, y)	.mask = x, .result = y
>   #define MCI_UC_S (MCI_STATUS_UC|MCI_STATUS_S)
>   #define MCI_UC_SAR (MCI_STATUS_UC|MCI_STATUS_S|MCI_STATUS_AR)
> +#define	MCI_ADDR (MCI_STATUS_ADDRV|MCI_STATUS_MISCV)
>   #define MCACOD 0xffff
>
>   	MCESEV(
> @@ -102,11 +103,22 @@ static struct severity {
>   		SER, BITCLR(MCI_STATUS_S)
>   		),
>
> -	/* AR add known MCACODs here */
>   	MCESEV(
>   		PANIC, "Action required with lost events",
>   		SER, BITSET(MCI_STATUS_OVER|MCI_UC_SAR)
>   		),
> +
> +	/* known AR MCACODs: */
> +	MCESEV(
> +		KEEP, "HT thread notices Action required: data load error",
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),
> +		MCGMASK(MCG_STATUS_EIPV, 0)
> +		),
> +	MCESEV(
> +		AR, "Action required: data load error",
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),
> +		USER
> +		),

I don't think *AR* makes sense here because the following codes have a 
assumption that it means *user space* condition. If so, in the future a 
new *AR* severity for kernel usage is created, we can't distinguish 
which one can call "memory_failure" as below. At least, it should have a 
suffix such as AR_USER/AR_KERN:

enum severity_level {
         MCE_NO_SEVERITY,
         MCE_KEEP_SEVERITY,
         MCE_SOME_SEVERITY,
         MCE_AO_SEVERITY,
         MCE_UC_SEVERITY,
         MCE_AR_USER_SEVERITY,
	MCE_AR_KERN_SEVERITY,
         MCE_PANIC_SEVERITY,
};


>   	MCESEV(
>   		PANIC, "Action required: unknown MCACOD",
>   		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_SAR)
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 135e12d..2c59a34 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -996,12 +996,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>   			continue;
>   		}
>
> -		/*
> -		 * Kill on action required.
> -		 */
> -		if (severity == MCE_AR_SEVERITY)
> -			kill_it = 1;
> -
>   		mce_read_aux(&m, i);
>
>   		/*
> @@ -1022,6 +1016,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>   		}
>   	}
>
> +	m = *final;
> +
>   	if (!no_way_out)
>   		mce_clear_state(toclear);
>
> @@ -1040,7 +1036,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>   	 * support MCE broadcasting or it has been disabled.
>   	 */
>   	if (no_way_out&&  tolerant<  3)
> -		mce_panic("Fatal machine check on current CPU", final, msg);
> +		mce_panic("Fatal machine check on current CPU",&m, msg);
>
>   	/*
>   	 * If the error seems to be unrecoverable, something should be
> @@ -1049,11 +1045,24 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>   	 * high, don't try to do anything at all.
>   	 */
>
> -	if (kill_it&&  tolerant<  3)
> +	if (worst != MCE_AR_SEVERITY&&  kill_it&&  tolerant<  3)
>   		force_sig(SIGBUS, current);
>
>   	if (worst>  0)
>   		mce_report_event(regs);
> +
> +	if (worst == MCE_AR_SEVERITY) {
> +		unsigned long pfn = m.addr>>  PAGE_SHIFT;
> +
> +		pr_err("Uncorrected hardware memory error in user-access at %llx",
> +			m.addr);

print in the MCE handler maybe makes a deadlock ? say, when other CPUs 
are printing something, suddently they received MCE broadcast from 
Monarch CPU, when Monarch CPU runs above codes, a deadlock happens ?
Please fix me if I miss something :-)

> +		if (__memory_failure(pfn, MCE_VECTOR, 0)<  0) {
> +			pr_err("Memory error not recovered");
> +			force_sig(SIGBUS, current);
> +		} else
> +			pr_err("Memory error recovered");
> +	}

as you mentioned in the comment, the biggest concern is that when 
__memory_failure runs too long, if another MCE happens at the same time, 
(assuming this MCE is happened on its sibling CPU which has the same 
banks), the 2nd MCE will crash the system. Why not delaying the process 
in a safer context, such as using user_return_notifer ?

> +
>   	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>   out:
>   	atomic_dec(&mce_entry);
> @@ -1061,12 +1070,18 @@ out:
>   }
>   EXPORT_SYMBOL_GPL(do_machine_check);
>
> -/* dummy to break dependency. actual code is in mm/memory-failure.c */
> -void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
> +#ifndef CONFIG_MEMORY_FAILURE
> +void memory_failure(unsigned long pfn, int vector)
>   {
>   	pr_err("Action optional memory failure at %lx ignored\n", pfn);
>   }
>
> +int __memory_failure(unsigned long pfn, int trapno, int flags)
> +{
> +	return -ENXIO;
> +}
> +#endif
> +
>   static void mce_process_work(struct work_struct *dummy)
>   {
>   	unsigned long pfn;


  reply	other threads:[~2011-09-07  6:05 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-31 22:21 [PATCH 0/5] Yet another pass at machine check recovery Luck, Tony
2011-08-31 22:25 ` [PATCH 1/5] x86, mce: rework use of TIF_MCE_NOTIFY Luck, Tony
2011-09-07  9:11   ` Borislav Petkov
2011-08-31 22:25 ` Luck, Tony
2011-09-09  2:23   ` huang ying
2011-08-31 22:25 ` [PATCH 2/5] mce: mask out undefined bits from MCi_ADDR Luck, Tony
2011-09-05  9:19   ` Chen Gong
2011-09-06 20:15     ` Luck, Tony
2011-08-31 22:25 ` Luck, Tony
2011-08-31 22:25 ` [PATCH 3/5] HWPOISON: Handle hwpoison in current process Luck, Tony
2011-09-07  5:47   ` Chen Gong
2011-08-31 22:26 ` Luck, Tony
2011-08-31 22:26 ` [PATCH 4/5] mce: remove TIF_MCE_NOTIFY Luck, Tony
2011-09-07  9:23   ` Borislav Petkov
2011-08-31 22:26 ` Luck, Tony
2011-08-31 22:26 ` [PATCH 5/5] mce: recover from "action required" errors reported in data path in usermode Luck, Tony
2011-09-07  6:05   ` Chen Gong [this message]
2011-09-07 13:25     ` Borislav Petkov
2011-09-07 13:50       ` Chen Gong
2011-09-08  3:05     ` Minskey Guo
2011-09-08  5:16       ` Luck, Tony
2011-09-08  9:25         ` Minskey Guo
2011-08-31 22:26 ` Luck, Tony
2011-08-31 22:41 ` [PATCH 0/5] Yet another pass at machine check recovery Valdis.Kletnieks
2011-08-31 22:54   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E6709B2.7020401@linux.intel.com \
    --to=gong.chen@linux.intel.com \
    --cc=bp@amd64.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).