public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Sohil Mehta <sohil.mehta@intel.com>
To: <x86@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Yazen Ghannam" <yazen.ghannam@amd.com>,
	Arnd Bergmann <arnd@arndb.de>, <linux-kernel@vger.kernel.org>,
	<linux-edac@vger.kernel.org>
Subject: Re: x86/mce: Is mce_is_memory_error() incorrect for Intel?
Date: Thu, 14 Dec 2023 00:57:51 +0530	[thread overview]
Message-ID: <4ce89a5e-96f8-4939-b86e-f65c16f4bd4e@intel.com> (raw)
In-Reply-To: <20231206013846.1859347-1-sohil.mehta@intel.com>

On 12/6/2023 7:08 AM, Sohil Mehta wrote:

In an effort to rewrite the below Intel specific comment in
mce_is_memory_error(), I think I may have found an issue. It would be
really helpful if someone can help check the analysis below.

> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 7b397370b4d6..d42122b1afea 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -482,7 +482,8 @@ bool mce_is_memory_error(struct mce *m)
>  	case X86_VENDOR_INTEL:
>  	case X86_VENDOR_ZHAOXIN:
>  		/*
> -		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
> +		 * Intel SDM: Machine-Check Architecture -> "Compound Error
> +		 * Codes"
>  		 *
>  		 * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
>  		 * indicating a memory error. Bit 8 is used for indicating a


The full comment reads as follows:

* Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
* indicating a memory error. Bit 8 is used for indicating a
* cache hierarchy error. The combination of bit 2 and bit 3
* is used for indicating a `generic' cache hierarchy error
* But we can't just blindly check the above bits, because if
* bit 11 is set, then it is a bus/interconnect error - and
* either way the above bits just gives more detail on what
* bus/interconnect error happened. Note that bit 12 can be
* ignored, as it's the "filter" bit.
*/

return (m->status & 0xef80) == BIT(7) ||  ---> Memory Controller Errors
	(m->status & 0xef00) == BIT(8) || ---> Cache Hierarchy Errors
	(m->status & 0xeffc) == 0xc;	  ---> Generic Cache Hierarchy

The code tries to identify the memory and cache errors by masking the
status and then comparing based on the bit encodings below. But it seems
to be missing the "Extended Memory Errors" encoding which may have been
added after the original code was written.

Type 				Form
----				----
Generic Cache Hierarchy 	000F 0000 0000 11LL
TLB Errors			000F 0000 0001 TTLL
Memory Controller Errors	000F 0000 1MMM CCCC
Cache Hierarchy Errors		000F 0001 RRRR TTLL
Extended Memory Errors		000F 0010 1MMM CCCC
Bus and Interconnect Errors	000F 1PPT RRRR IILL

I am not sure what are the practical implications of getting
mce_is_memory_error() wrong. (This issue is completely theoretical right
now.) Any insights?

A couple of other points:

- The code seems ripe for a rewrite to be rid of the magic masks and bit
comparisons. I am thinking of doing that in a separate patch along side
of rewriting the comment. Would that be useful even if no issue exists?

- Relying on these bit encodings seems problematic in the long run with
the possibility of more things that could always be added. Is there a
better way to do it?

Sohil


  parent reply	other threads:[~2023-12-13 19:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06  1:38 [PATCH] x86/mce: Update references to the Intel SDM Sohil Mehta
2023-12-06 10:13 ` Borislav Petkov
2023-12-06 23:03   ` Sohil Mehta
2023-12-13 19:27 ` Sohil Mehta [this message]
2023-12-13 19:54   ` x86/mce: Is mce_is_memory_error() incorrect for Intel? Luck, Tony
2023-12-15 23:11     ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ce89a5e-96f8-4939-b86e-f65c16f4bd4e@intel.com \
    --to=sohil.mehta@intel.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox