From: Bert Karwatzki <spasswolf@web.de>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>, Tony Luck <tony.luck@intel.com>,
linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
linux-edac@vger.kernel.org, linux-acpi@vger.kernel.org,
x86@kernel.org, rafael@kernel.org, qiuxu.zhuo@intel.com,
nik.borisov@suse.com, Smita.KoralahalliChannabasappa@amd.com,
spasswolf@web.de
Subject: Re: spurious mce Hardware Error messages in next-20250912
Date: Mon, 15 Sep 2025 23:03:45 +0200 [thread overview]
Message-ID: <45d4081d93bbd50e1a23a112e3caca86ce979217.camel@web.de> (raw)
In-Reply-To: <20250915175531.GB869676@yaz-khff2.amd.com>
Am Montag, dem 15.09.2025 um 13:55 -0400 schrieb Yazen Ghannam:
>
>
> You can change this interval by writing to this file:
> /sys/devices/system/machinecheck/machinecheck0/check_interval
>
> Do the messages follow that setting? IOW, if you set it to '10', do you
> see error messages every 10 seconds?
Yes, if I set this to 10 I see these message every 10 seconds.
> >
> > As these messages do not appear in v6.17-rc5 I bisected the issue
> > (from v6.17-rc5 to next-20250912) and found this as the first bad commit:
> >
> > cf6f155e848b ("x86/mce: Unify AMD DFR handler with MCA Polling")
>
> Could you try another recent linux-next build without the MCA updates?
>
> It looks like 'next-20250911' doesn't include the commit above.
> >
Somehow I cannot find next-20250911 in my linux-next git:
$ git checkout next-202509(TAB TAB)
next-20250901 next-20250902 next-20250905 next-20250908 next-20250912
I'm currently re-cloning linux-next.
> > Are these error messages a new error that was not reported previously or
> > are these error messages a sign that the new code erroneously reports errors?
> >
>
> It could be that the recent code updates broke something. Or there may
> be other kernel changes causing new, spurious errors.
>
> We could also be picking up errors from the hardware that were
> previously ignored. I'll ask our hardware folks if this is a case we
> should address.
Perhaps these are errors which were not reported previously, when I check the
L3 cache error count I get this (these error_counts seem to be persistent across
reboots and also do not increase when I get an mce error message):
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_0/l3_cache_0/error_count
0
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_1/l3_cache_1/error_count
0
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_2/l3_cache_2/error_count
9
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_3/l3_cache_3/error_count
0
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_4/l3_cache_4/error_count
72
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_5/l3_cache_5/error_count
0
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_6/l3_cache_6/error_count
3165
root@lisa:~# cat /sys/devices/system/machinecheck/machinecheck0/l3_cache_7/l3_cache_7/error_count
72
Bert Karwatzki
next prev parent reply other threads:[~2025-09-15 21:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 1:00 spurious mce Hardware Error messages in next-20250912 Bert Karwatzki
2025-09-15 17:55 ` Yazen Ghannam
2025-09-15 21:03 ` Bert Karwatzki [this message]
2025-09-15 21:43 ` Bert Karwatzki
2025-09-16 9:10 ` Borislav Petkov
2025-09-16 14:07 ` Yazen Ghannam
2025-09-16 20:27 ` Bert Karwatzki
2025-09-17 7:13 ` Bert Karwatzki
2025-09-17 14:41 ` Yazen Ghannam
2025-09-17 15:33 ` Bert Karwatzki
2025-09-17 19:26 ` Yazen Ghannam
2025-09-17 21:15 ` Yazen Ghannam
2025-09-17 22:01 ` Bert Karwatzki
2025-09-18 10:20 ` Nikolay Borisov
2025-09-18 21:00 ` Yazen Ghannam
2025-09-18 21:04 ` Luck, Tony
2025-09-18 21:14 ` Yazen Ghannam
2025-09-18 22:07 ` Bert Karwatzki
2025-10-09 13:20 ` Yazen Ghannam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45d4081d93bbd50e1a23a112e3caca86ce979217.camel@web.de \
--to=spasswolf@web.de \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=bp@alien8.de \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=nik.borisov@suse.com \
--cc=qiuxu.zhuo@intel.com \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).