public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: "Luck, Tony" <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>
Cc: yazen.ghannam@amd.com, Aristeu Rozanski <aris@ruivo.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	Aristeu Rozanski <aris@redhat.com>
Subject: Re: [PATCH] mce: prevent concurrent polling of MCE events
Date: Fri, 9 Jun 2023 12:00:53 -0400	[thread overview]
Message-ID: <facb48e2-73a0-e780-4fda-2ecbdfd3b48b@amd.com> (raw)
In-Reply-To: <SJ1PR11MB60831A82C6BACEBCCC50E397FC51A@SJ1PR11MB6083.namprd11.prod.outlook.com>

On 6/9/23 11:24 AM, Luck, Tony wrote:
>> So "UCNA" is like the AMD "Deferred" severity it seems. How is this
>> different from "Action Optional"? I've been equating DFR and AO.
>

Thanks Tony.

> Categories of uncorrected errors memory errors on Intel
> 
> 1) "UCNA" ... these are logged by memory controllers when ECC says that a memory read cannot
> supply correct data. If CMCI is enabled, signaled with CMCI. Note that these will occur on prefetch
> or speculative reads as well as "regular" reads. The data might never be consumed.
>

Yes, this is like AMD.

Key differences:
	* Logged using "Deferred" severity. However, deferred errors
	  aren't always from the memory controller. So there still needs
	  to be an error code check in addition to severity.
	* Signalled with a Deferred error APIC interrupt. This way UC
	  errors can be signalled independently of CEs.

> 2) "SRAO". This is now legacy. Pre-Icelake systems log these for uncorrected errors found by
> the patrol scrubber, and for evictions of poison from L3 cache (if that poison was due to an ECC
> failure in the cache itself, not for poison created elsewhere and currently resident in the cache).
> Signaled with a broadcast machine check. Icelake and newer systems use UCNA for these.
>

Yes, this mostly fails within the Deferred/UCNA case for AMD also.

> 3) "SRAR". Attempt to consume poison (either data read or instruction fetch). Signaled with
> machine check. Pre-Skylake this was broadcast. Skylake and newer have an opt-in mechanism
> to request #MC delivery to just the logical CPU trying to consume (Linux always opts-in).
> 
> 
> UCNA = Uncorrected No Action required. But Linux does take action to try to offline the page.
>

That's right. So we'll use this for the Deferred memory error case. But
there will need to be updates for these to be actionable on AMD systems.

> SRAO = Software Recoverable Action Optional. As with UCNA Linux tries to offline the page.
>

Okay, so ignore these going forward.

> SRAR = Software Recoverable Action Required. Linux will replace a clean page with a new copy
> if it can (think read-only text pages mapped from ELF executable). If not it sends SIGBUS to the
> application. Some SRAR in the kernel are recoverable ... see the copy_mc*() functions.
>

Yep, this mostly works. There's still an AMD IF Unit quirk that needs to
be handled. And the kernel recovery cases needs to be tested.

Thanks again!

-Yazen

  reply	other threads:[~2023-06-09 16:01 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 14:32 [PATCH] mce: prevent concurrent polling of MCE events Aristeu Rozanski
2023-05-15 14:52 ` Borislav Petkov
2023-05-15 17:18   ` Luck, Tony
2023-05-15 18:30     ` Borislav Petkov
2023-05-15 19:08       ` Luck, Tony
2023-05-15 19:44         ` Borislav Petkov
2023-05-15 20:07           ` Luck, Tony
2023-05-15 20:20             ` Aristeu Rozanski
2023-05-15 20:27               ` Luck, Tony
2023-05-15 20:32                 ` Aristeu Rozanski
2023-05-15 20:40                   ` Luck, Tony
2023-05-16 17:08                   ` Borislav Petkov
2023-05-23 14:15             ` Aristeu Rozanski
2023-06-04 16:04               ` Aristeu Rozanski
2023-06-05 15:33                 ` Luck, Tony
2023-06-05 17:41                   ` Borislav Petkov
2023-06-05 17:58                     ` Luck, Tony
2023-06-05 19:30                       ` Borislav Petkov
2023-06-05 19:37                         ` Luck, Tony
2023-06-05 19:43                           ` Borislav Petkov
2023-06-05 20:10                         ` Aristeu Rozanski
2023-06-05 20:33                         ` Aristeu Rozanski
2023-06-05 20:56                           ` Borislav Petkov
2023-06-05 21:01                             ` Aristeu Rozanski
2023-06-05 21:06                               ` Borislav Petkov
2023-06-05 21:29                                 ` Luck, Tony
2023-06-05 21:58                                 ` Aristeu Rozanski
2023-06-06  8:25                                   ` Borislav Petkov
2023-06-06 14:00                                     ` Aristeu Rozanski
2023-06-06 14:08                                       ` Borislav Petkov
2023-06-09  0:26                                         ` Luck, Tony
2023-06-09 10:17                                           ` Borislav Petkov
2023-06-09 15:00                                             ` Yazen Ghannam
2023-06-09 15:24                                               ` Luck, Tony
2023-06-09 16:00                                                 ` Yazen Ghannam [this message]
2023-06-09 15:59                                         ` Aristeu Rozanski
2023-06-05 19:08                     ` Aristeu Rozanski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=facb48e2-73a0-e780-4fda-2ecbdfd3b48b@amd.com \
    --to=yazen.ghannam@amd.com \
    --cc=aris@redhat.com \
    --cc=aris@ruivo.org \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox