All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Chen Yucong <slaoub@gmail.com>
Cc: tony.luck@intel.com, ak@linux.intel.com,
	aravind.gopalakrishnan@amd.com, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2 v2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll
Date: Thu, 6 Nov 2014 16:48:54 +0100	[thread overview]
Message-ID: <20141106154853.GE4318@pd.tnic> (raw)
In-Reply-To: <1415162873-1874-3-git-send-email-slaoub@gmail.com>

On Wed, Nov 05, 2014 at 12:47:53PM +0800, Chen Yucong wrote:
> Uncorrected no action required (UCNA) - is a UCR error that is not

Please explain "UCR" if you're using it out of the rest of the SDM text.

> signaled via a machine check exception and, instead, is reported to
> system software as a corrected machine check error. UCNA errors indicate
> that some data in the system is corrupted, but the data has not been
> consumed and the processor state is valid and you may continue execution
> on this processor. UCNA errors require no action from system software
> to continue execution. Note that UCNA errors are supported by the
> processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
>                                                -- Intel SDM Volume 3B
> 
> Deferred errors are errors that cannot be corrected by hardware, but
> do not cause an immediate interruption in program flow, loss of data
> integrity, or corruption of processor state. These errors indicate
> that data has been corrupted but not consumed. Hardware writes information
> to the status and address registers in the corresponding bank that
> identifies the source of the error if deferred errors are enabled for
> logging. Deferred errors are not reported via machine check exceptions;
> they can be seen by polling the MCi_STATUS registers.
>                                                 -- ADM64 APM Volume 2
						     ^^^

Please try to spell "AMD" correctly. Your first patch has "ADM" too.

> 
> Above two items, both UCNA and Deferred errors belong to detected
> errors, but they can't be corrected by hardware, and this is very
> similar to Software Recoverable Action Optional (SRAO) errors.
> Therefore, we can take some actions that have been used for handling
> SRAO errors to handle UCNA and Deferred errors.


> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   50 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 453e9bf..37f7649 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -575,6 +575,46 @@ static void mce_read_aux(struct mce *m, int i)
>  	}
>  }
>  
> +static bool mem_deferred_error(struct mce *m)
> +{
> +	int severity;
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> +	severity = mce_severity(m, mca_cfg.tolerant, NULL, false);
> +
> +	if (c->x86_vendor == X86_VENDOR_AMD) {
> +		/*
> +		 * AMD BKDGs - Machine Check Error Codes
> +		 *
> +		 * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating
> +		 * a memory-specific error. Note that this field encodes info-
> +		 * rmation about memory-hierarchy level involved in the error.
> +		 */
> +		if (severity == MCE_DEFERRED_SEVERITY)
> +			return  (m->status & 0xff00) == BIT(8);
> +	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
> +		/*
> +		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
> +		 *
> +		 * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
> +		 * indicating a memory error. Bit 8 is used for indicating a
> +		 * cache hierarchy error. The combination of bit 2 and bit 3
> +		 * is used for indicating a `generic' cache hierarchy error
> +		 * But we can't just blindly check the above bits, because if
> +		 * bit 11 is set, then it is a bus/interconnect error - and
> +		 * either way the above bits just gives more detail on what
> +		 * bus/interconnect error happened. Note that bit 12 can be
> +		 * ignored, as it's the "filter" bit.
> +		 */
> +		if (severity == MCE_UCNA_SEVERITY)
> +			return (m->status & 0xef80) == BIT(7) ||
> +			       (m->status & 0xef00) == BIT(8) ||
> +			       (m->status & 0xeffc) == 0xc;
> +	}
> +
> +	return false;
> +}

This function is combining deferred and memory error checks. Please do
this differently:

	if (memory_error(m))
		if (deferred_error(m))
			blabla

so that the memory_error() function can be used in other code paths
separately.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

      reply	other threads:[~2014-11-06 15:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-05  4:47 [PATCH 0/2 v2] RAS: add the support for handling UCNA/DEFERRED error Chen Yucong
2014-11-05  4:47 ` [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity Chen Yucong
2014-11-05 18:27   ` Tony Luck
2014-11-06  1:54     ` Chen Yucong
2014-11-06 15:41       ` Borislav Petkov
2014-11-06 15:35   ` Borislav Petkov
2014-11-06 17:27     ` Luck, Tony
2014-11-06 18:22       ` Borislav Petkov
2014-11-06 18:32         ` Luck, Tony
2014-11-06 18:56           ` Borislav Petkov
2014-11-06 21:24             ` Luck, Tony
2014-11-07 12:12               ` Borislav Petkov
2014-11-05  4:47 ` [PATCH 2/2 v2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll Chen Yucong
2014-11-06 15:48   ` Borislav Petkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141106154853.GE4318@pd.tnic \
    --to=bp@alien8.de \
    --cc=ak@linux.intel.com \
    --cc=aravind.gopalakrishnan@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=slaoub@gmail.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.