* [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0
@ 2026-02-28 14:08 Yazen Ghannam
2026-03-02 16:16 ` Mario Limonciello
0 siblings, 1 reply; 3+ messages in thread
From: Yazen Ghannam @ 2026-02-28 14:08 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, Yazen Ghannam, Bert Karwatzki,
Mario Limonciello
User has observed multiple L3 cache deferred errors logs after recent
kernel rework of deferred error handling. [1]
Upon inspection, the errors are determined to be bogus due to
inconsistent status values. Also, user verified that bogus MCA_DESTAT
values are present on the system even with an older kernel. [2] The
errors seem to be garbage values present in the MCA_DESTAT of some L3
cache banks. These were implicitly ignored before the recent kernel
rework because these do not generate a deferred error interrupt.
A later revision of the rework patch was merged for v6.19. This
naturally filtered out most of the bogus error logs. However, a few
signatures still remain. [3]
Add the remaining bogus signatures to the MCE filter function. Minimize
the scope of the filter to the reported CPU family/model/stepping so
that similar issues are not implicitly masked on other systems.
Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de # [1]
Link: https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de # [2]
Link: https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de # [3]
---
arch/x86/kernel/cpu/mce/amd.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index da13c1e37f87..7a94492aa50f 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -604,6 +604,18 @@ bool amd_filter_mce(struct mce *m)
enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
struct cpuinfo_x86 *c = &boot_cpu_data;
+ /*
+ * Bogus L3 cache deferred errors on Cezanne A0.
+ *
+ * Case #1: PCC bit set. This is not valid for deferred errors.
+ * Case #2: XEC 29. This is not a valid error code.
+ */
+ if (c->x86 == 0x19 && c->x86_model == 0x50 && c->x86_stepping == 0x0 &&
+ bank_type == SMCA_L3_CACHE && (m->status & MCI_STATUS_DEFERRED)) {
+ if ((m->status & MCI_STATUS_PCC) || XEC(m->status, 0x3f) == 29)
+ return true;
+ }
+
/* See Family 17h Models 10h-2Fh Erratum #1114. */
if (c->x86 == 0x17 &&
c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0
2026-02-28 14:08 [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0 Yazen Ghannam
@ 2026-03-02 16:16 ` Mario Limonciello
2026-03-03 14:00 ` Yazen Ghannam
0 siblings, 1 reply; 3+ messages in thread
From: Mario Limonciello @ 2026-03-02 16:16 UTC (permalink / raw)
To: Yazen Ghannam, linux-edac; +Cc: linux-kernel, tony.luck, x86, Bert Karwatzki
On 2/28/26 8:08 AM, Yazen Ghannam wrote:
> User has observed multiple L3 cache deferred errors logs after recent
> kernel rework of deferred error handling. [1]
>
> Upon inspection, the errors are determined to be bogus due to
> inconsistent status values. Also, user verified that bogus MCA_DESTAT
> values are present on the system even with an older kernel. [2] The
> errors seem to be garbage values present in the MCA_DESTAT of some L3
> cache banks. These were implicitly ignored before the recent kernel
> rework because these do not generate a deferred error interrupt.
>
> A later revision of the rework patch was merged for v6.19. This
> naturally filtered out most of the bogus error logs. However, a few
> signatures still remain. [3]
>
> Add the remaining bogus signatures to the MCE filter function. Minimize
> the scope of the filter to the reported CPU family/model/stepping so
> that similar issues are not implicitly masked on other systems.
>
> Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de # [1]
> Link: https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de # [2]
> Link: https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de # [3]
> ---
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> arch/x86/kernel/cpu/mce/amd.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index da13c1e37f87..7a94492aa50f 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -604,6 +604,18 @@ bool amd_filter_mce(struct mce *m)
> enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
> struct cpuinfo_x86 *c = &boot_cpu_data;
>
> + /*
> + * Bogus L3 cache deferred errors on Cezanne A0.
> + *
> + * Case #1: PCC bit set. This is not valid for deferred errors.
> + * Case #2: XEC 29. This is not a valid error code.
> + */
> + if (c->x86 == 0x19 && c->x86_model == 0x50 && c->x86_stepping == 0x0 &&
> + bank_type == SMCA_L3_CACHE && (m->status & MCI_STATUS_DEFERRED)) {
> + if ((m->status & MCI_STATUS_PCC) || XEC(m->status, 0x3f) == 29)
> + return true;
> + }
> +
> /* See Family 17h Models 10h-2Fh Erratum #1114. */
> if (c->x86 == 0x17 &&
> c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0
2026-03-02 16:16 ` Mario Limonciello
@ 2026-03-03 14:00 ` Yazen Ghannam
0 siblings, 0 replies; 3+ messages in thread
From: Yazen Ghannam @ 2026-03-03 14:00 UTC (permalink / raw)
To: Mario Limonciello
Cc: linux-edac, linux-kernel, tony.luck, x86, Bert Karwatzki
On Mon, Mar 02, 2026 at 10:16:31AM -0600, Mario Limonciello wrote:
[...]
> > ---
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Thanks Mario.
Bert,
There are other options to prevent these from being reported. And
they don't require a kernel patch.
"ignore_ce" parameter: Will disable the MCA polling timer.
"dont_log_ce" parameter: Will keep the MCA polling timer but will not
report errors that don't have a usable address.
You can set either of these through sysfs or on the kernel command line.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-03 14:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-28 14:08 [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0 Yazen Ghannam
2026-03-02 16:16 ` Mario Limonciello
2026-03-03 14:00 ` Yazen Ghannam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox