linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: Yazen Ghannam <yazen.ghannam@amd.com>, Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org,  linux-next@vger.kernel.org,
	linux-edac@vger.kernel.org,  linux-acpi@vger.kernel.org,
	x86@kernel.org, rafael@kernel.org,  qiuxu.zhuo@intel.com,
	nik.borisov@suse.com,  Smita.KoralahalliChannabasappa@amd.com,
	spasswolf@web.de
Subject: Re: spurious mce Hardware Error messages in next-20250912
Date: Tue, 16 Sep 2025 22:27:35 +0200	[thread overview]
Message-ID: <9488e4bf935aa1e50179019419dfee93d306ded9.camel@web.de> (raw)
In-Reply-To: <20250916140744.GA1054485@yaz-khff2.amd.com>

Am Dienstag, dem 16.09.2025 um 10:07 -0400 schrieb Yazen Ghannam:
> On Tue, Sep 16, 2025 at 11:10:55AM +0200, Borislav Petkov wrote:
> > On Mon, Sep 15, 2025 at 11:43:26PM +0200, Bert Karwatzki wrote:
> > > After re-cloning linux-next I tested next-20250911 and I get no mce error messages
> > > even if I set the check_interval to 10.
> > 
> > Yazen, I've zapped everything from the handler unification onwards:
> > 
> > 28e82d6f03b0 x86/mce: Save and use APEI corrected threshold limit
> > c8f4cea38959 x86/mce: Handle AMD threshold interrupt storms
> > 5a92e88ffc49 x86/mce/amd: Define threshold restart function for banks
> > 922300abd79d x86/mce/amd: Remove redundant reset_block()
> > 9b92e18973ce x86/mce/amd: Support SMCA corrected error interrupt
> > fe02d3d00b06 x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
> > cf6f155e848b x86/mce: Unify AMD DFR handler with MCA Polling
> > 53b3be0e79ef x86/mce: Unify AMD THR handler with MCA Polling
> > 
> > until this is properly sorted out, now this close to the merge window.
> > 
> > Thanks, Bert, for reporting!
> > 
> 
> No problem, thanks Boris.
> 
> Bert, can you please try the following patch on next-20250912?
> 
> I expect that you will see the "debug" message, but the regular MCA
> logging should be gone.
> 

Applied your patch on next-20250912, these are now the only messages
I get from mce:

[  333.337544] [      C0] mce: DEBUG: CPU0 Bank:11 Status:0x8700aa0800000000
[  333.337556] [      C0] mce: DEBUG: CPU0 Bank:14 Status:0x8724aa0800000000
[  661.017608] [      C0] mce: DEBUG: CPU0 Bank:11 Status:0x8424aa4800a9413b
[  661.017619] [      C0] mce: DEBUG: CPU0 Bank:14 Status:0x8700aa0800000000
[  988.697243] [      C0] mce: DEBUG: CPU0 Bank:11 Status:0x8700aa0800000000
[  988.697250] [      C0] mce: DEBUG: CPU0 Bank:14 Status:0x8724ab8800000000
[ 1316.377571] [      C0] mce: DEBUG: CPU0 Bank:11 Status:0x8700a28800000000
[ 1316.377582] [      C0] mce: DEBUG: CPU0 Bank:14 Status:0x8400aa4800a7413c


> Also, we haven't been able to reproduce this issue yet. So thank you for
> your help. It's much appreciated.
> 
> Thanks,
> Yazen
> 

It could still be a hardware error, I'm also going to run memtest86+. 

Bert Karwatzki

  reply	other threads:[~2025-09-16 20:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  1:00 spurious mce Hardware Error messages in next-20250912 Bert Karwatzki
2025-09-15 17:55 ` Yazen Ghannam
2025-09-15 21:03   ` Bert Karwatzki
2025-09-15 21:43     ` Bert Karwatzki
2025-09-16  9:10       ` Borislav Petkov
2025-09-16 14:07         ` Yazen Ghannam
2025-09-16 20:27           ` Bert Karwatzki [this message]
2025-09-17  7:13             ` Bert Karwatzki
2025-09-17 14:41               ` Yazen Ghannam
2025-09-17 15:33                 ` Bert Karwatzki
2025-09-17 19:26                   ` Yazen Ghannam
2025-09-17 21:15                     ` Yazen Ghannam
2025-09-17 22:01                       ` Bert Karwatzki
2025-09-18 10:20                     ` Nikolay Borisov
2025-09-18 21:00                       ` Yazen Ghannam
2025-09-18 21:04                         ` Luck, Tony
2025-09-18 21:14                           ` Yazen Ghannam
2025-09-18 22:07                         ` Bert Karwatzki
2025-10-09 13:20                           ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9488e4bf935aa1e50179019419dfee93d306ded9.camel@web.de \
    --to=spasswolf@web.de \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).