linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: <x86@kernel.org>, Tony Luck <tony.luck@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <linux-edac@vger.kernel.org>,
	<Smita.KoralahalliChannabasappa@amd.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>, <linux-acpi@vger.kernel.org>,
	Yazen Ghannam <yazen.ghannam@amd.com>
Subject: [PATCH v4 15/22] x86/mce: Move machine_check_poll() status checks to helper functions
Date: Tue, 24 Jun 2025 14:16:10 +0000	[thread overview]
Message-ID: <20250624-wip-mca-updates-v4-15-236dd74f645f@amd.com> (raw)
In-Reply-To: <20250624-wip-mca-updates-v4-0-236dd74f645f@amd.com>

There are a number of generic and vendor-specific status checks in
machine_check_poll(). These are used to determine if an error should be
skipped.

Move these into helper functions. Future vendor-specific checks will be
added to the helpers.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-11-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * Change log_poll_error() to should_log_poll_error().
    * Keep code comment.

 arch/x86/kernel/cpu/mce/core.c | 88 +++++++++++++++++++++++-------------------
 1 file changed, 48 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4f6af060a395..bed059c6c808 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -714,6 +714,52 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
+/*
+ * Newer Intel systems that support software error
+ * recovery need to make additional checks. Other
+ * CPUs should skip over uncorrected errors, but log
+ * everything else.
+ */
+static bool ser_should_log_poll_error(struct mce *m)
+{
+	/* Log "not enabled" (speculative) errors */
+	if (!(m->status & MCI_STATUS_EN))
+		return true;
+
+	/*
+	 * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
+	 * UC == 1 && PCC == 0 && S == 0
+	 */
+	if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
+		return true;
+
+	return false;
+}
+
+static bool should_log_poll_error(enum mcp_flags flags, struct mce_hw_err *err)
+{
+	struct mce *m = &err->m;
+
+	/* If this entry is not valid, ignore it. */
+	if (!(m->status & MCI_STATUS_VAL))
+		return false;
+
+	/*
+	 * If we are logging everything (at CPU online) or this
+	 * is a corrected error, then we must log it.
+	 */
+	if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
+		return true;
+
+	if (mca_cfg.ser)
+		return ser_should_log_poll_error(m);
+
+	if (m->status & MCI_STATUS_UC)
+		return false;
+
+	return true;
+}
+
 /*
  * Poll for corrected events or events that happened before reset.
  * Those are just logged through /dev/mcelog.
@@ -765,48 +811,10 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (!mca_cfg.cmci_disabled)
 			mce_track_storm(m);
 
-		/* If this entry is not valid, ignore it */
-		if (!(m->status & MCI_STATUS_VAL))
+		/* Verify that the error should be logged based on hardware conditions. */
+		if (!should_log_poll_error(flags, &err))
 			continue;
 
-		/*
-		 * If we are logging everything (at CPU online) or this
-		 * is a corrected error, then we must log it.
-		 */
-		if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
-			goto log_it;
-
-		/*
-		 * Newer Intel systems that support software error
-		 * recovery need to make additional checks. Other
-		 * CPUs should skip over uncorrected errors, but log
-		 * everything else.
-		 */
-		if (!mca_cfg.ser) {
-			if (m->status & MCI_STATUS_UC)
-				continue;
-			goto log_it;
-		}
-
-		/* Log "not enabled" (speculative) errors */
-		if (!(m->status & MCI_STATUS_EN))
-			goto log_it;
-
-		/*
-		 * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
-		 * UC == 1 && PCC == 0 && S == 0
-		 */
-		if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
-			goto log_it;
-
-		/*
-		 * Skip anything else. Presumption is that our read of this
-		 * bank is racing with a machine check. Leave the log alone
-		 * for do_machine_check() to deal with it.
-		 */
-		continue;
-
-log_it:
 		mce_read_aux(&err, i);
 		m->severity = mce_severity(m, NULL, NULL, false);
 		/*

-- 
2.49.0


  parent reply	other threads:[~2025-06-24 14:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides Yazen Ghannam
2025-06-25 13:22   ` Nikolay Borisov
2025-06-24 14:15 ` [PATCH v4 03/22] x86/mce/amd: Add default names for MCA banks and blocks Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 04/22] x86/mce/amd: Fix threshold limit reset Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 05/22] x86/mce/amd: Rename threshold restart function Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device() Yazen Ghannam
2025-06-25 14:57   ` Nikolay Borisov
2025-06-24 14:16 ` [PATCH v4 07/22] x86/mce/amd: Remove smca_banks_map Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank Yazen Ghannam
2025-06-25 16:52   ` Nikolay Borisov
2025-06-27 11:14     ` Nikolay Borisov
2025-06-30 12:57       ` Yazen Ghannam
2025-08-25 13:59     ` Borislav Petkov
2025-06-24 14:16 ` [PATCH v4 09/22] x86/mce: Cleanup bank processing on init Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
2025-06-26  8:03   ` Nikolay Borisov
2025-06-30 12:58     ` Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 11/22] x86/mce: Define BSP-only init Yazen Ghannam
2025-06-25 11:04   ` Nikolay Borisov
2025-06-25 11:26     ` Borislav Petkov
2025-06-24 14:16 ` [PATCH v4 12/22] x86/mce: Define BSP-only SMCA init Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 13/22] x86/mce: Do 'UNKNOWN' vendor check early Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
2025-06-27 11:02   ` Nikolay Borisov
2025-06-30 13:00     ` Yazen Ghannam
2025-06-24 14:16 ` Yazen Ghannam [this message]
2025-06-24 14:16 ` [PATCH v4 16/22] x86/mce: Unify AMD THR handler with MCA Polling Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 17/22] x86/mce: Unify AMD DFR " Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 18/22] x86/mce/amd: Support SMCA Corrected Error Interrupt Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 19/22] x86/mce/amd: Remove redundant reset_block() Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 20/22] x86/mce/amd: Define threshold restart function for banks Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 21/22] x86/mce: Handle AMD threshold interrupt storms Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 22/22] x86/mce: Save and use APEI corrected threshold limit Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250624-wip-mca-updates-v4-15-236dd74f645f@amd.com \
    --to=yazen.ghannam@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).