All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Bert Karwatzki <spasswolf@web.de>, Borislav Petkov <bp@alien8.de>,
	Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-acpi@vger.kernel.org,
	x86@kernel.org, rafael@kernel.org, qiuxu.zhuo@intel.com,
	nik.borisov@suse.com, Smita.KoralahalliChannabasappa@amd.com
Subject: spurious mce Hardware Error messages in next-20250912
Date: Mon, 15 Sep 2025 03:00:09 +0200	[thread overview]
Message-ID: <20250915010010.3547-1-spasswolf@web.de> (raw)
In-Reply-To: 20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com

On my MSI Alpha 15 (amd64) laptop running debian stable(trixie) and  
kernel next-20250912 I noticed the following mce error message in demsg:

[   T10] mce: [Hardware Error]: Machine check events logged
[   T10] [Hardware Error]: Corrected error, no action required.
[   T10] [Hardware Error]: CPU:0 (19:50:0) MC11_STATUS[-|CE|-|AddrV|-|-|-|UECC|-|Poison|-]: 0x8400aa4800a90139
[   T10] [Hardware Error]: Error Addr: 0x006637a200000020
[   T10] [Hardware Error]: IPID: 0x000700b040000000
[   T10] [Hardware Error]: L3 Cache Ext. Error Code: 41
[   T10] [Hardware Error]: cache level: L1, tx: GEN, mem-tx: DRD
[   T10] mce: [Hardware Error]: Machine check events logged
[   T10] [Hardware Error]: Corrected error, no action required.
[   T10] [Hardware Error]: CPU:0 (19:50:0) MC14_STATUS[-|CE|-|AddrV|PCC|-|SyndV|UECC|-|Poison|-]: 0x8724ac0800000000
[   T10] [Hardware Error]: Error Addr: 0x002bf52e00000020
[   T10] [Hardware Error]: IPID: 0x000700b040000000, Syndrome: 0x0000000000000042
[   T10] 
[   T10] [Hardware Error]: L3 Cache Ext. Error Code: 0
[   T10] [Hardware Error]: cache level: RESV, tx: INSN

The messages start about 333.34s after boot and usually appear 327.68s appart
(Yes, these timings are reproducible!):
$ dmesg | grep mce
[  333.338334] [     T10] mce: [Hardware Error]: Machine check events logged
[  333.338354] [     T10] mce: [Hardware Error]: Machine check events logged
[  661.018322] [     T10] mce: [Hardware Error]: Machine check events logged
[  661.018347] [     T10] mce: [Hardware Error]: Machine check events logged
[  988.698305] [     T10] mce: [Hardware Error]: Machine check events logged
[  988.698329] [     T10] mce: [Hardware Error]: Machine check events logged
[ 1316.378283] [     T10] mce: [Hardware Error]: Machine check events logged
[ 1316.378311] [     T10] mce: [Hardware Error]: Machine check events logged
[ 1644.058284] [     T10] mce: [Hardware Error]: Machine check events logged
[ 1644.058303] [     T10] mce: [Hardware Error]: Machine check events logged

As these messages do not appear in v6.17-rc5 I bisected the issue 
(from v6.17-rc5 to next-20250912) and found this as the first bad commit:

cf6f155e848b ("x86/mce: Unify AMD DFR handler with MCA Polling")

Are these error messages a new error that was not reported previously or
are these error messages a sign that the new code erroneously reports errors?

Hardware used:
$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller
04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15)
06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01)
07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03)
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01)
08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller
08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 80
model name	: AMD Ryzen 7 5800H with Radeon Graphics
stepping	: 0
microcode	: 0xa50000c
cpu MHz		: 2145.090
cache size	: 512 KB
physical id	: 0
siblings	: 16
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso ibpb_no_ret spectre_v2_user tsa
bogomips	: 6388.44
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]


Bert Karwatzki

             reply	other threads:[~2025-09-15  1:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  1:00 Bert Karwatzki [this message]
2025-09-15 17:55 ` spurious mce Hardware Error messages in next-20250912 Yazen Ghannam
2025-09-15 21:03   ` Bert Karwatzki
2025-09-15 21:43     ` Bert Karwatzki
2025-09-16  9:10       ` Borislav Petkov
2025-09-16 14:07         ` Yazen Ghannam
2025-09-16 20:27           ` Bert Karwatzki
2025-09-17  7:13             ` Bert Karwatzki
2025-09-17 14:41               ` Yazen Ghannam
2025-09-17 15:33                 ` Bert Karwatzki
2025-09-17 19:26                   ` Yazen Ghannam
2025-09-17 21:15                     ` Yazen Ghannam
2025-09-17 22:01                       ` Bert Karwatzki
2025-09-18 10:20                     ` Nikolay Borisov
2025-09-18 21:00                       ` Yazen Ghannam
2025-09-18 21:04                         ` Luck, Tony
2025-09-18 21:14                           ` Yazen Ghannam
2025-09-18 22:07                         ` Bert Karwatzki
2025-10-09 13:20                           ` Yazen Ghannam
2026-02-12 12:50                             ` spurious (?) mce Hardware Error messages in v6.19 Bert Karwatzki
2026-02-13 12:45                               ` Bert Karwatzki
2026-02-16 20:25                               ` Yazen Ghannam
2026-02-19 14:33                                 ` Yazen Ghannam
2026-02-19 15:43                                   ` Bert Karwatzki
2026-02-20 16:49                                     ` Mario Limonciello
2026-02-20 18:24                                       ` Bert Karwatzki
2026-02-23 21:53                                         ` Yazen Ghannam
2026-04-03 14:05                                           ` Borislav Petkov
2026-04-05  8:47                                             ` Bert Karwatzki
2026-04-05 10:46                                               ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250915010010.3547-1-spasswolf@web.de \
    --to=spasswolf@web.de \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.